Data warehouse automation (Part 1: Introduction)

The automation of repeatedly recurring tasks is one of the most fundamental principles of the modern world. Henry Ford recognised resulting advantages, such as a falling error rate, shorter production cycles and consistent, uniform quality. These very advantages can be applied in data warehouse initiatives.

The advantages

Using data warehouse automation tools (DWA tools), the storage, maintenance and expansion of company-wide data can be performed with just a few clicks of the mouse, whereas previously it would have taken most of the entire development time. Modern DWA tools go a step further and offer profound impact and lineage functionalities as well as automated regression and quality tests, efficient loading routines, simplified deployments between environments and extensive generation of documentation.

To implement the aforementioned options, the information is stored in a database and managed by the DWA tool as metadata by means of source data, DWH structures and loading processes.

In the following figure, the fundamental topic areas of automation are presented in clear fashion once again:

The 6 facets of BI/DWH automation

DWA trends

Various DWA tools have already manifested themselves on the market. Essentially, two different trends can be distinguished that are followed (Eckerson, 2015, S. 5 ff): data-driven and model-driven approaches.

Data-driven DWA tools are used to automate heavily the physical creation of data warehouse structures. The developer can move around the data in a simplified manner and have SQL statements, procedures, scripts and other database structures generated automatically. On the other hand, the data-orientated presentation is not so suitable for departments, for which reason communication and coordination with the developers is performed mainly by means of the prototypical development of result sets:

Development interface data-driven DWA tool (QOSQO quipu)

The result sets can be viewed using BI tools or Excel.

Particularly in the case of well structured and easily accessible data, purely data-driven DWA tools can be used effectively and offer added value.

Model-driven DWA tools, conversely, pursue the objective of elevating complex data to a higher abstraction level and visualising it. In this way, it is aimed to allow both developers and departments to work with the data in a simplified manner. Using the GUI provided, data management processes can be developed intuitively, while pre-manufactured components are available for such tasks as creating dimensions, fact tables and hierarchies.

Development interface model-driven DWA tool (Magnitude Kalido)

In addition to the purely data and model-driven approaches, there are other hybrid DWA tools in which both approaches are supported. These include companies such as WhereScape, which, with RED, allows data-driven management of the DWH processes, but additionally, with 3D, offers a 3D model view. In 3D, processing steps can be defined and subsequently synchronised with the RED repository:

Development interface hybrid DWA tools (Wherescape 3D)

Development interface hybrid DWA tools (Wherescape RED)

Which DWA tool is ultimately the suitable one should be established based on other factors in addition to the conceptual differences of the individual tools described. Based on the areas presented in figure 1, where automation can be of benefit, other factors will be discussed in more detail in future blog entries. The aim is to provide an overview that can serve as a decision-making aid.