Amazon SageMaker Pipelines is a fully-managed service that allows customers to define and orchestrate their model building steps as workflows. Today, we are happy to introduce a new step type that allows machine learning engineers to run data processing applications using open source frameworks such as Apache Spark, Presto, and Hive on Amazon EMR clusters.