How To: Who Should Check ETL Applications?
After years of experience of developing ETL applications, I can say that they are generally tested less rigorously than transaction systems.
You can find tangible know-how, tips & tricks and the point of view of our experts here in our blog posts
What makes Multi-Touch Attribution (MTA) so unique? Unlike traditional models such as first- or last-click attribution, MTA provides a comprehensive view of the customer journey. Modern tools like Google Analytics 4, BigQuery, and advanced techniques like Markov Chains enable data-driven decisions — efficiently and transparently. Here's how it works.
After years of experience of developing ETL applications, I can say that they are generally tested less rigorously than transaction systems.
I’d like to once again delve into the issue of memory usage. Other blogs like “SAP HANA – no more memory? Implement early unload!” have shown the importance of properly configuring SAP BW objects for the intended usage. Briefly: anything not queried regularly and immediately should be unloaded early to storage. For this you set the “early unload priority” attribute in the HANA database table. Since this setting is not a part of the ABAP transport system (CTS), the developer or operation must always ensure that the configuration is correct. Otherwise, only data needed for staging from the data acquisition layer perform well in RAM, which burdens the system and is unnecessarily expensive.
In the previous article of this series, a practical and effective approach to using partition pruning was explained in detail. This easy-to-implement method can significantly optimize query times. However, as is often the case, some details need to be taken into account to ensure the efficient and effective use of the presented method. In this regard, we echo Theodor Fontane, who observed as far back as the 19th century that the magic always lies in the details.
In these times of digitalisation is it particularly important to be able to draw on reliable databases in order to eliminate errors at the source and facilitate a focused and precise way of working. The staging area is a solution for this type of challenge in today’s world.
Designers and architects often underestimate the need for a staging area for the database environment as they consider it a waste of space, effort and development time. Developing staging certainly requires space and effort, but this pays off over the whole life cycle of the database.
After outlining the conventional methods for storing historical data in the first post of this blog series, I would like to introduce a more effective approach to partitioning a historical table in this second part.
R is one of the most popular open source programming languages for predictive analytics. One of its upsides is the abundance of modeling choices provided by more than 10000 user-created packages on the Comprehensive R Archive Network (CRAN). On the downside, package-specific syntax choices (which are a much bigger problem in R than in e.g. in Python) impede the employment of new models. The caret package attempts to streamline the process of creating predictive models by providing a uniform interface to various training and prediction functions. Caret’s data preparation- , feature selection- and model tuning functionalities facilitate the process of building and evaluating predictive models. This blog post focuses on model tuning and selection and shows how to tackle common model building challenges with caret.
The automation of repeatedly recurring tasks is one of the most fundamental principles of the modern world. Henry Ford recognised resulting advantages, such as a falling error rate, shorter production cycles and consistent, uniform quality. These very advantages can be applied in data warehouse initiatives.
We now know how we can select the correct data, which type of tables we should use with lookups and how we can ensure that we only read through relevant datasets.
In practice it is still often the case that you must select a large and/or non-defined amount of data from the database, which should then be aggregated in accordance with specific rules for the high-performance reading.
As described in the previous blog entry, the Oracle Data Integrator (ODI) offers an integrated solution for keeping a history of data with the SCD (slowly changing dimension) methodology. Upon closer consideration and when an integration quantity is loaded practically into a target table using the integration knowledge module (IKM) SCD, it is noticeable that the ODI uses certain default values for the end of the validity period of the dataset.
In this article, I propose a way for physical organization of historical tables, which makes it possible to effectively use partition pruning to optimize query performance. The way is specifically designed for data warehouses, therefore it presumes relatively complicated data loads yet productive selections.
After we have dealt with the relevant selection techniques and with the various types of internal tables, the most important performance optimisations are initially ensured for the lookups, in our BW transformations.
However, this does not completely cover the topic: Because until now we have assumed that only the relevant information will be searched in our lookup tables. But how can we ensure this?
The good performance of a HANA database stems from the systematic orientation to an in-memory database, as well as using modern compression and Columnstore algorithms. This means that the database has to read comparatively less data when calculating aggregations for large quantities of data and can also perform this task exceptionally quickly even in the central memory.
However, one of these benefits may very quickly be rendered moot if the design of the data model is below par. As such, major benefits in terms of runtime and agility may become null and void for both the HANA database, as well as the users.