StreamSets Optimizes Cloud Costs Through Support for Amazon EMR Serverless

New capability allows organizations to process large amounts of data in a cost-effective manner

StreamSets, a Software AG company, today announced its support of Amazon EMR Serverless, the latest Amazon Web Services (AWS) deployment option that makes it easy for data analysts and engineers to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers. This new integration is now available to StreamSets users to process and analyze large amounts of data in a cost-effective, scalable, and serverless manner.

Amazon EMR Serverless simplifies the deployment and management of large data processing workloads by automatically provisioning and scaling infrastructure based on the workload demand.

This comprehensive platform allows users to submit Apache Spark jobs to a managed EMR cluster that automatically scales up and down based on the job’s requirements. This more efficient model enables users to only pay for the services they need without having to manage any servers or clusters themselves. Added benefits of Amazon EMR Serverless include automatic software updates, high availability, and built-in security features. Users can also develop and run Spark applications through Amazon EMR Studio, a web-based integrated development environment (IDE) that makes it easy for data scientists and data engineers to develop, visualize, and debug data engineering and data science applications written in R, Python, Scala, and PySpark.

“Implementing this capability is the next essential step in our mission to provide organizations with the tools to effectively modernize data integration and operations, offering a more seamless and efficient method for data engineers to do their jobs,” said Dima Spivak, COO of Products at StreamSets.

Users who leverage StreamSets’ data integration platform with Amazon EMR Serverless receive the benefits of:

  • Scalability: Amazon EMR Serverless allows for auto-scaling of compute resources based on the workload demand, processing large amounts of data without worrying about over-provisioning or under-provisioning of compute resources.
  • Cost-efficiency: With Amazon EMR Serverless, data engineers only pay for the compute resources they use, without having to manage any servers or clusters themselves. StreamSets helps optimize data integration pipelines and reduce the costs of managing infrastructure.
  • Compatibility: Amazon EMR Serverless supports Apache Spark, a popular open-source big data processing framework. StreamSets provides seamless integration with Spark-based data processing pipelines, allowing data engineers to easily move data between alternate data sources and destinations.
  • Ease of use: Amazon EMR Serverless streamlines the deployment and management of big data processing workloads by automatically provisioning and scaling infrastructure based on the workload demand. StreamSets provides a user-friendly interface for data engineers to create, test, and manage data integration pipelines without fear of affecting the underlying infrastructure.

To learn more about StreamSets support for Amazon EMR Serverless, read the recent blog post about the new capability or visit www.streamsets.com.

About StreamSets
StreamSets, a Software AG company, eliminates data integration friction in complex hybrid and multi-cloud environments to keep pace with need-it-now business data demands. Our platform lets data teams unlock data—without ceding control— to enable a data-driven enterprise. Resilient and repeatable pipelines deliver analytics-ready data that improve real-time decision-making and reduce the costs and risks associated with data flow across an organization. That is why some of the largest companies in the world trust StreamSets to power millions of data pipelines for modern analytics, smart applications, and hybrid integration.