StreamSets announced it has integrated Microsoft SQL Server 2019 Big Data Clusters with its DataOps platform.
Microsoft SQL Server 2019 Big Data Clusters combines and instance of the Microsoft SQL Server 2019 relational database with an instance of the Apache Spark in-memory computing framework and the Hadoop Distributed File System (HDFS) to run Big Data applications on a Kubernetes cluster either in a local data center or public cloud. Microsoft SQL Server 2019 Big Data Clusters provides a single interface through which organizations can take advantage of a virtual data layer, also known as a data hub, to access both structured and unstructured data where it resides without having to move data between databases.
Jobi George, general manager for StreamSets Cloud, says StreamSets in alliance with Microsoft will enable IT teams to design and operationalize data pipelines for Big Data workloads using visual tools without having to write code.
Rather than relying on legacy extract, transform and load (ETL) tools, the StreamSets DataOps makes it easier for IT teams to ingest and process data at scale from a wide variety of data sources, says George.
StreamSets has designed it’s approach to DataOps to foster agility using many of the same principles advanced by DevOps practitioner, adds George. Rather than waiting weeks for a database administrator to construct a schema to expose a set of data pipelines, DataOps enables data pipelines to be created much faster, notes George. That approach will also IT teams responsible for managing data to keep pace with increased demand for access to data coming from developers that have embraced DevOps to accelerate the rate at which applications are being developed, adds George.
The DataOps platform from StreamSets also provides the monitoring tools IT organizations require to instrument the entire DataOps process, says George.
It’s not clear yet to what degree organizations will formally embrace DataOps processes. It’s clear organizations, especially as they embrace artificial intelligence (AI) applications will need to manage massive amounts of data more efficiently. However, the degree to which they will simply embrace a new platform that comes with embedded tools for managing that data versus making a conscious decision to embrace a set of well-defined DataOps practices is going to vary widely.
Whatever the path selected, existing platforms for managing data are not up to the task at hand. More organizations are embracing data hubs and data virtualization to derive as much value from their data as possible. As data becomes managed like a business asset, the need to be able to analyze structured and unstructured data simultaneously becomes quickly apparent. StreamSets, which supports multiple Big Data platforms, is betting organizations will want to apply a single set of processes for managing data across all the Big Data platforms they may have on-premises or in the cloud.
None of this means the need for storage administrators or database administrators is going away any time soon. However, the nature of the tasks those individuals perform in this next era of Big Data will be considerably different from here on out.