Skip to main content

Data products shared and utilized by many individuals or domains are considered particularly valuable and useful. But how are data made available, shared and jointly utilized? This is one of the questions answered in this blog post under the keyword "data sharing", with a focus on the technical aspects of a data-mesh architecture involving native AWS services.

What is meant by the term "data sharing"?

So far, no generally accepted definition of data sharing has been established in the data community, but aspects such as recipients of shared data as well as the object and purpose of sharing overlap in the definitions currently found. Here it turns out that software manufacturers, in particular, publish proposals for defining the term, thus giving these definitions a technological hue.

If essential aspects are taken into account, data sharing can be described as a process in which data are made available to a defined group of recipients for further processing and analysis. The group of recipients can consist of an internal organization (teams, departments) or external parties across enterprise boundaries. Machines can also be included in the group of recipients. The main goals of data sharing are:

 

  • Promotion and acceleration of collaboration between different data consumers
  • Reduction of process expenditure, such as multiple collection and processing of data records
  • Designation of data producers' responsibilities vis-à-vis data consumers
  • Promotion of knowledge acquisition and, if necessary, refinement of own data through inclusion of additional data points

Data sharing with a view to data meshes

Data sharing refers to making data available to different teams, departments and even external organizations. A data mesh, on the other hand, is a socio-technical concept which aims to overcome the challenges of scaling organizations in data-intensive, complex environments, here you can read more. An examination of the relationships between the objectives of a data mesh (scaling of data-intensive/complex organizations) and data sharing (provision of data) suggests that implementation of data sharing goes hand-in-hand with production of a data mesh.

Depending on an organization's objectives, however, data sharing can also be implemented without introduction of a data-mesh architecture. Companies can use centralized data platforms, data lakes, or data warehouses to enable data sharing. In this case, data sharing is based on a more traditional, centralized approach to data management, in which a dedicated data team is responsible for storage, quality, and accessibility of data.

How Amazon's web services enable data sharing and locate sharing services in a data mesh

Over the years, Amazon Web Services (AWS) has introduced a number of data-sharing services allowing data to be stored, processed and, in particular, shared. To date, some of the best-known AWS services which either directly allow or at least support data sharing include AWS Data Exchange, Redshift Data Sharing, and AWS Lake Formation. A new addition is Amazon DataZone.

There are various ways to use the data sharing services mentioned above – for example, in a data mesh. Shown below is a variant of implementing a data-mesh architecture with native AWS services. In this architecture, data producers make their data available to data consumers via a central governance platform.

Over the years, Amazon Web Services (AWS) has introduced a number of data-sharing services allowing data to be stored, processed and, in particular, shared. To date, some of the best-known AWS services which either directly allow or at least support data sharing include AWS Data Exchange, Redshift Data Sharing, and AWS Lake Formation. A new addition is Amazon DataZone.

There are various ways to use the data sharing services mentioned above – for example, in a data mesh. Shown below is a variant of implementing a data-mesh architecture with native AWS services. In this architecture, data producers make their data available to data consumers via a central governance platform.


In accordance with: https://aws.amazon.com/blogs/architecture/lets-architect-architecting-a-data-mesh/

Amazon's afore-mentioned Redshift and DataZone sharing services are listed in the AWS data-mesh architecture shown here. Although AWS Lake Formation is not explicitly listed in the displayed architecture, this service is used as an integrated service via Amazon DataZone. The individual services and their role in the data mesh are described below:

Redshift Data Sharing:

Redshift Data Sharing forms part of the Amazon Redshift data warehouse service. This data release feature allows organizations to share data (known as live data) in Redshift clusters without a need to copy or move data.

Data products can be shared in a data mesh using Redshift clusters. A relationship must be established here between the producer cluster and consumer cluster. The producer cluster supplies data, while the consumer cluster receives or accesses the shared data. After access to shared data has been obtained, external tables referring to the data in the producer cluster can be created in the consumer cluster and then processed further.

AWS Lake Formation:

AWS Lake Formation allows users to easily perform tasks related to establishing and managing data lakes. Included here are setup of security and access checks, data cleansing and normalization as well as data cataloguing.

In a data-mesh architecture, AWS Lake Formation can be used to create a data lake serving as a foundation for different domains. Via AWS Lake Formation, each domain can gain access to the data lake as well as Redshift data shares. Defining data-access policies allows control over which data each domain can access, and makes it possible to ensure any team only has access to the data it needs. Through integration with AWS Glue, AWS Lake Formation also provides data cleansing and transformation functions for creating data products.

Amazon DataZone:

Amazon DataZone is a data-management facility which incorporates existent services and functions such as AWS Glue and AWS Lake Formation. The facility allows data producers user-friendly management and regulation of access to data for scalable implementation of governance and compliance. Via an integrated data-analysis platform, data producers and consumers can subscribe to data, view meta-information as well as share and analyze data.

In a data-mesh architecture, Amazon DataZone can serve as an integrated and centralized platform for data discovery, access control and data governance. Domains can manage their resources such as quality-assured data assets, AWS accounts and data sources in a virtual space created specifically for them. In addition, data can be furnished with a business context, making the data understandable and discoverable. Automated publication and utilization of data are supported by defined workflows ensuring that specified access rights are observed during such publication and utilization. In addition, Amazon DataZone projects allow users to create virtual, isolated areas where user groups can analyze defined data sets using released analysis tools.

 

 

Whether a traditional, central data platform or a data mesh is planned, clients wanting to implement data sharing often face technical and organizational challenges such as orienting processes and responsibilities according to SLAs of shared data, as well as ensuring data protection, interoperability and data quality. Here we can provide you with support as well as promote cooperation and knowledge acquisition at your organization. Do not hesitate to contact us for an initial, non-binding chat!

We look forward to talking to you

 

 

Your Contact
Tobias Lange
Consultant
Tobias has a passion for data architectures and is dedicated to developing innovative data solutions with a people-first approach that focuses on usability and empowerment. In his spare time, he enjoys playing padel tennis and is always up for a coffee-making conversation.
#DataArchitecture#DataPlatform#NeverStopLearning