Lakehouse & Delta Lake
Based on the open-delta format, the data warehouse can be set up in the cloud directly on the data lake. This eliminates data silos, inconsistencies between DWH and data lake, as well as exploding costs and extra effort to load data into closed systems. Data Lake + Data Warehouse = Lakehouse. With Spark as the underlying engine, open-source technology is used to enable high-performance and cost-effective data processing.
DWH & data engineering
In a cloud DWH, orchestration of jobs and loading processes plays a decisive role in addition to the actual databases. Databricks offers workflows enabling dynamic control of all processes within the platform. A delta live tables framework is available for simple declaration of data pipelines, while auto-scaling job clusters ensure that workloads can run cost-effectively.
At least as important as data availability is the enterprise's organizational framework which regulates the handling of data – data governance. The platform's integrated tool, Unity Catalog, enables unified control of access by users throughout the lakehouse, providing data lineage for structured and unstructured data across all layers as well as BI and AI assets. This allows traceability which simplifies compliance with legal regulations and finding of data. In addition, Unity Catalog and systematic collection of metadata enable additional functionalities such as audit logging, delta sharing and monitoring of all resources in the lakehouse.
Data science & AI
Efficient use of artificial intelligence (AI) is increasingly playing a major role at enterprises across all sectors – with Databricks and Spark, data scientists can use the language of their choice (Python, R, Scala or SQL) within the platform. An additional range of features available to users with Managed Mlflow simplifies all development stages in the machine-learning lifecycle - including mapping of generative AI and LLM use cases.