StreamSets Expands Databricks Partnership with New Connector for Databricks Delta Lake Integration

Databricks Partner Integration Gallery Now Features StreamSets Cloud Integration Connector to Enable Users to Easily Ingest, Integrate and Monitor Data in Delta Lake

StreamSets®, provider of the industry’s first DataOps platform, today announced an expansion of its partnership with Databricks by participating in Databricks’ newly launched Data Ingestion Network. As part of the expanded partnership, StreamSets is offering additional functionality in StreamSets with a new connector for Databricks Delta Lake. With it, users can configure their pipelines to write data from any source moving in batch or streaming mode directly into Delta Lake. Now, data teams can deliver more data in a shorter time frame, driving BI, analytics and, ultimately, digital transformation.

Today, companies require systems for diverse data applications like real-time monitoring, machine learning and data science — and that can process unstructured data like text, images, video and audio. A decade ago, data lakes replaced data warehouses as the best repositories for this raw data; however, they neither support transactions nor enforce data quality. In addition, they lack consistency, making it almost impossible to mix batch and streaming jobs and appends and reads.

Leveraging the best of data warehouses and data lakes, data lakehouses remedy their limitations, but friction ingesting fresh data remains. With this partnership, Databricks users will now be able to capitalize on the new data lakehouse paradigm without the friction previously encountered. They can easily connect into the StreamSets platform and leverage out-of-the-box connectors to load batch, change data capture (CDC) or streaming data from any source (such as relational data, on-premises data lakes and warehouses, and cloud applications) into Databricks Delta Lake. With StreamSets, data engineers can easily build and operate data pipelines for modern and legacy data sources to migrate to a data lakehouse platform and continuously refresh with relevant data.

Specifically, the new StreamSets connector for Databricks Delta Lake enables several key benefits for even greater operational control over the full life cycle of data:

  • Faster migration to the cloud with fewer data engineering resources
  • Drag-and-drop interface to simplify data movement from multiple disparate sources
  • Improved management of operations and performance for cloud data lakes with Delta Lake
  • Change-data-capture capability from several data sources into Delta Lake
  • Built-in Kubernetes containerization and native cloud scaling

Combined with Delta Lake, the connector also makes it possible to unify batch and streaming data to support the timeliness of transactional operations, ensuring ACID compliance.

“Along with Apache Spark, the use of Databricks’ Delta Lake is rapidly expanding in the market,” said Pankaj Dugar, vice president of business development at Databricks. “With StreamSets’ extended support for Delta Lake, small and midsize companies now have an easy way to ingest data from their cloud-based service into Databricks’ Delta Lake so they can maximize their analytics efforts with fresh data in their data lakehouse.”

“This connector is another step forward in our alliance with Databricks to deliver more data, faster, to drive analytics — which is critical to the survival and success of today’s organizations,” said Jobi George, general manager of Cloud Business at StreamSets. “We’re excited to continue our work with Databricks to drive innovation in the industry.”

About DataOps
Analytics has modernized in our always-on, always-changing world. How you deliver data to drive analytics has to modernize, too. DataOps is a set of practices and technologies that operationalizes data management and integration to ensure resilience and agility despite ceaseless change. It combines the DevOps principles of continuous delivery with the ability to tame data drift (unexpected and undocumented changes to data). By embedding these principles, DataOps makes it possible to deliver the continuous data needed to drive modern analytics and digital transformation.

About StreamSets
StreamSets built the industry’s first multi-cloud DataOps platform for modern data integration, helping enterprises to continuously flow big, streaming and traditional data to their data science and data analytics applications. The platform uniquely handles data drift, those frequent and unexpected changes to upstream data that break pipelines and damage data integrity. The StreamSets DataOps Platform allows for execution of any-to-any pipelines, ETL processing and machine learning with a cloud-native operations portal for the continuous automation and monitoring of complex multi-pipeline topologies.

Founded in 2014, StreamSets is backed by top-tier Silicon Valley venture capital firms, including Battery Ventures, New Enterprise Associates (NEA), and Accel Partners. For more information, visit www.streamsets.com.