With Openflow, Snowflake fundamentally simplifies data integration: extraction and loading happen directly as part of the Snowflake platform — no external ETL tools required. This significantly reduces integration effort and streamlines the entire pipeline management process.
Table of Contents
Openflow is Snowflake's new solution for enterprises, offering an open, extensible, and secure data integration platform for real-time and bidirectional data movement within Snowflake. It closes the gap by enabling the extraction and ingestion of data directly into Snowflake, which previously relied heavily on third-party tools like Fivetran, Airbyte, or other ETL tools. This added complexity to the tech and knowledge stack and increased integration effort due to the need for multiple tools or custom scripts. By staying within Snowflake, Openflow minimizes context switching and simplifies the complete data pipeline management. Built on Apache NiFi, Openflow leverages its benefits while being enhanced with Snowflake's security, governance, and ease of use. In this article, we explain everything you need to know about Openflow.
The Foundation: Apache NiFi
Snowflake Openflow is based on Apache NiFi2.0, which is an open-source data integration tool designed to automate the flow of data between systems. It offers a reliable, managed, and user-friendly framework for moving and transforming data and is based on flow-based programming. One of NiFi's standout features is its intuitive web-based interface, which allows users to design data workflows using a simple drag-and-drop method using different kinds of processor tasks. It is proven at massive scale, built for any type of data, gives you the freedom to integrate any system as source or destination and provides observability and security at every step of every data pipeline.
Apache NiFi serves as both an ingestion and transformation tool, designed to handle data integration tasks across diverse systems by automating data flow. Therefore, it is closest to a traditional ETL tool, however, it also has many similarities with streaming tools.
Ingestion: Thanks to its comprehensive connectors, NiFi is equipped to ingest data from a wide range of sources, including databases, files, web APIs, IoT devices, message queues, and more. It features numerous processors specifically designed for fetching and receiving data from external systems, making it a powerful tool for data collection.
Transformation: NiFi provides robust capabilities for data transformation. With its extensive library of processors, it can perform tasks such as converting, reformatting, enriching, filtering, aggregation, or splitting data, and much more. Users can configure and design each workflow to transform data based on specific business requirements and needs.
These dual capabilities make NiFi highly versatile in data integration scenarios, allowing it to efficiently ingest, process, transform, and route data within complex architectures and over the whole data pipeline covering traditional Extract, Transform and Load (ETL) tasks.
Snowflake supercharges these capabilities with security, governance, and ease of use. Openflow is designed as cloud-native and as a fully managed service, which runs in Snowflake containers or your own VPCs.
Snowflake Openflow
Snowflake states Openflow as "a fully managed, global data integration service that connects any data source to any data destination with hundreds of processors and turnkey connectors, supporting structured and unstructured text, images, audio, video, and sensor data" built on Apache NiFi2.0.
Key Characteristics of Openflow
Open & extensible: Build data pipelines from any source to any target including non-Snowflake targets, which gives you endless flexibility and freedom to integrate with any system as source or destination. From anywhere to anywhere.
Comprehensive Data Integration: Openflow enables you to seamlessly ingest and process all types of data from any source into Snowflake at multi GB/sec rates, ensuring efficient high-speed data integration. Whether it’s structured, unstructured, multimodal, streaming or batch formats, including Kafka into Polaris and Iceberg tables.
Hybrid data estates: Openflow can be deployed in your own Virtual Private Clouds (VPCs) or Snowflake Container Services (SPCS) as a fully managed service. Even on-prem deployments are on the roadmap.
High Level Architecture
From a high-level view, Openflow consists of a Control Plane and a Data Plane. While the Data Plane executes the actual data processing tasks; both planes interact seamlessly to ensure that data is processed according to defined rules and flow designs, with the control plane orchestrating the operations that the data plane performs.
Because Openflow supports hybrid deployment the Data Plane can be deployed into Snowflake’s Container Services (SPCS) or in your own Virtual Private Cloud (VPC). Also, on-prem deployment are on the roadmap. This can make sense for situations where data must remain within a specific network due to compliance requirements or latency concerns.
The Openflow Control Plane is responsible for managing and overseeing the flow configurations and the overall orchestration of data processes. It includes components like managing data planes and runtimes, the connector catalog and all services related to observability like tracking the performance, health, and stats of the entire data flow.
The Data Plane handles the actual flow and processing of data. It deals with how data is collected, transformed, and transmitted across the systems. Therefore, processors can be used like tasks for transformation, filtering, enrichment, and routing of data. To interact with external systems, Connectors are used, which enable communication with databases, messaging systems, file storage, and more.
Openflow ships with almost 300 processors, across multiple categories like data extraction, data transformation, data loading, routing and mediation, monitoring and reporting as well as utility functionality.
Since Runtimes serve as the location where actual work is executed, multiple Runtimes are likely necessary, depending on the setup, team, and project structures.
Supported Connectors
Out of the box a wide range of connectors is supported, which can be discovered and installed from Control Plane. No matter if you have your data on databases, SaaS or Streaming Solutions or other multimodal data sources. Because Openflow is open and extensible it is even possible to write and integrate your own types of connectors.
Integrating Openflow Into Snowflake Platform Architecture
By design Openflow is not only capable of managing the ingestion part of your data from external sources, but also supports complete data pipelines. However, while it is best suited for data ingestion, it can also trigger transformation processes, such as Cortex or SQL processors, to fulfil workloads directly on Snowflake independent of the data estate to which you have deployed it (SPCS, VPC, on-prem). Furthermore, it is suitable not only for the data transfer of moving data into (ingress) Snowflake, but also out of (egress) Snowflake, to supply other data sources with data. It has the freedom to integrate with any system as source or destination.
Openflow serves as an enabler for a broad set of use cases,
such as collecting structured and unstructured data into a centralized system,
facilitating collaboration between AI agents through near real-time bi-directional data flows,
replicating change data capture (CDC) from OLTP systems e.g. traditional reporting,
or ingesting real-time events from streaming services for near real-time analytics,
and so much more.
In our reference architecture using Snowflake Services only, Openflow fits perfectly as Snowflake’s extraction and ingest component as well as a data service to provide data to other systems because of its wide range of connectors. It excels in extracting data from systems while allowing flexibility in filtering, aggregating, and pre-processing the data at high scale for all kinds of data.
b.telligent Reference Architecture With Snowflake Services
Loading Patterns
To efficiently load data into Snowflake using Openflow, Snowflake outlines several loading patterns:
Snowpipe Auto-Ingest: Files are loaded into your cloud storage, such as Amazon S3. Snowpipe then receives notifications from a queueing service to load the data into tables.
Snowpipe REST API: Files are uploaded to an internal Snowflake stage, and the Snowpipe REST API is called to load the data into tables.
COPY INTO Command: Files are transferred from either an external or internal stage directly into Snowflake tables.
INSERT Statement: Using the PutDatabaseRecord processor from Openflow, data is directly loaded into tables. However, this method is not intended for larger datasets.
Snowpipe Streaming: Data will be loaded into Snowflake tables using the Snowpipe Streaming API.
Snowflake Processor PutIcebergTable: Write flowfiles directly into Iceberg Tables using a configurable catalog for managing namespaces and tables.
The Look and Feel of Openflow
Using Openflow’s drag-and-drop interface, building a fully functional data flow that delivers business value with remarkable speed is exceptionally simple.
The accompanying screenshot demonstrates a fully operational data pipeline in three simple steps:
First, the InvokeHTTP processor calls a REST API, generating a flow file.
This file is ingested to Snowflake's internal named stage using the PutSnowflakeInternalStageFile processor.
Lastly, a COPY INTO command moves the data into a Snowflake table for downstream processing.
Conclusion
OpenFlow transforms data movement processes by unifying all data integration into a single platform. It offers extensibility and interoperability, enabling connections to any data source and destination, opening up almost limitless data movement possibilities. It is also a promising alternative to established extraction and ingestion tools for loading data into Snowflake but can also work as a traditional ETL tool. Additionally, it serves as a data service to transfer your data to various downstream systems to provide Snowflake's data to external systems.
However, it is more than just NiFi as a Service. It’s cloud-native with elastic scaling, provides a connector ecosystem including native connectors as well as Snowflake-optimized connectors, leverages AI use cases through handling multimodal data, offers the ability to trigger Snowflake’s Cortex AI services, and much more.
If you are also facing the challenge of data integration in order to generate real added value for your company, we will be happy to support you. Contact us for an initial meeting to discuss your use case without obligation.
Want To Learn More? Contact Us!
Your contact person
Helene Fuchs
Domain Lead Data Platform & Data Management
Your contact person
Pia Ehrnlechner
Domain Lead Data Platform & Data Management
Who is b.telligent?
Do you want to replace the IoT core with a multi-cloud solution and utilise the benefits of other IoT services from Azure or Amazon Web Services? Then get in touch with us and we will support you in the implementation with our expertise and the b.telligent partner network.
Exasol is a leading manufacturer of analytical database systems. Its core product is a high-performance, in-memory, parallel processing software specifically designed for the rapid analysis of data. It normally processes SQL statements sequentially in an SQL script. But how can you execute several statements simultaneously? Using the simple script contained in this blog post, we show you how.
Many companies with SAP source systems are familiar with this challenge: They want to integrate their data into an Azure data lake in order to process them there with data from other source systems and applications for reporting and advanced analytics. The new SAP notice on use of the SAP ODP framework has also raised questions among b.telligent's customers. This blog post presents three good approaches to data integration (into Microsoft's Azure cloud) which we recommend at b.telligent and which are supported by SAP.
First of all, let us summarize the customers' requirements. In most cases, enterprises want to integrate their SAP data into a data lake in order to process them further in big-data scenarios and for advanced analytics (usually also in combination with data from other source systems).
As part of their current modernization and digitization initiatives, many companies are deciding to move their data warehouse (DWH) or data platform to the cloud. This article discusses from a technical/organizational perspective which aspects areof particularly important for this and which strategies help to minimize anyrisks. Migration should not be seen as a purely technical exercise. "Soft" factors and business use-cases have a much higher impact.