Skip to main content

Data migration with Exasol – from DWH to high-performance cloud platform

 

Having tested Exasol's "lift and shift" migration approach, b.telligent consultant Simon Faatz explains step by step how to migrate to the cloud with confidence.

The owners of the "Left-Hand Store" are frustrated. Yesterday, they sent a discount code to selected customers. Today, they want to find out whether the campaign has paid off. If they discover that sales of left-handed can openers have increased, they will send the discount code to all their other customers. But – the shop owners don't know whether the campaign has worked because they are still waiting for the report. The database is overloaded and is taking much too long; the data marts are late and waiting times for the reporting tool are excessive. Under these circumstances, there is little point carrying out A-B-testing or other time-critical analyses, let alone attempting to create future campaigns based on report findings.

Many companies are familiar with the problem: their database, originally a lean and agile data model, has grown into a data warehouse behemoth. The opportunities that modern data warehouses have to offer, at least in theory, are therefore a long way off. The daily loading time is becoming increasingly tight, not to mention impromptu loading during the day. Quite apart from the slow data processing, it is becoming increasingly complex and expensive to integrate new sources. At some point, every performance optimization reaches its physical limits.

Ideally, scoring models or the data marts upon which they are based will contain more than just data from a CRM system and the current stock levels. Weather data, for example, can help to promote seasonal products; mass data from a dedicated app can be used to tailor products more precisely to customer needs, and a link to Twitter can reveal trends that are relevant to the target group.

In order to make use of all these sources and data, a company needs a performant data warehouse with a good architecture, or at the very least, a modern, scalable technology. Exasol is ideally suited for this purpose and comes as standard with a set of tools to simplify migration.

The first step: deploying Exasol in the cloud

Granted: it can be complex and expensive to migrate a database that has grown over time to include many sources, data and reports. Fortunately, Exasol provides step-by-step support for its users in migrating and adapting their existing data warehouse.

First, select the cloud of your choice. AWS, Azure or the Google Cloud are currently available. For this article we have chosen Azure as an example. It just takes a few clicks to get the Exasol database up and running once you have created your Azure account.

Have you thought of everything? How to successfully move data and views

The next step is to move your existing data to the cloud. Before you can do this, however, you must set up the necessary connections. Exasol supports all common database interfaces as standard. We will use the JDBC driver for our migration. This establishes a connection to the source database.

It is now time to deal with our structure. Before you can start loading the data, the DDLs for our table definitions and views must be created in the new cloud DWH. Create a database schema first. To create the DDLs automatically, I simply use the migration script from the Exasol Git repository, which uses the metadata from the source. In the Git repository you will find more scripts and instructions for all popular databases from MySQL to Postgres and SAP Hana. This will save you a lot of coding work! Nevertheless, once you have completed your preliminary project or POC, it goes without saying that you should test it thoroughly.

Keeping the show on the road: ETL and reporting

If you already have an ETL tool, you can move its connection to the new cloud database and thereby retain the existing "old" ETL logic.

If you haven’t used an ETL tool before, or if you wish to keep your "old" database, you will find Exasol's "virtual schemas" extremely useful. These are an abstraction layer that allows you to integrate external data sources into the Exasol platform. The data source is displayed as a separate schema that you can query on the Exasol platform through regular SQL commands. This optimizes performance significantly – and speeds up analysis of the data.

The virtual schema is particularly worthwhile for companies that want to use many standardized and short-lived data sources. The benefits of the cloud make themselves felt here too: there is great flexibility in terms of DWH size and performance. Exploring new data sources has never been faster – or easier.

Conclusion: POC

Before deciding to migrate to the cloud, a company should implement a POC and test all important components. Exasol is not only an excellent, high-performance database in the cloud but it also provides a host of helpful tools, as this article has shown:

  • Moving to the cloud is fast and uncomplicated. Existing data sources can be connected to Exasol very quickly via connections.
  • The structure (DDL) was created directly from the source. This allows the data to be migrated directly in the next step.
  • All logics were retained. The reporting tool was migrated to the new database without the need for further adjustments.

This greatly reduces both the cost and work involved in carrying out a migration compared to alternative solutions.

What makes moving to the cloud with Exasol particularly appealing is the option of scaling (up/out) and the improved performance provided by a self-optimizing database. The ETL and reporting tool are largely unaffected by the migration and can carry on as normal using the same structure once the connections have been replaced.

 

Has this piqued your curiosity about working with Exasol and migrating your DWH to the cloud? If you have any questions, my colleagues and I would be happy to answer them.

Your Contact
Simon Faatz-Riccò
Senior Consultant
Simon is a passionate full stack BI developer and has been consulting along the entire data value chain for over seven years. He is always happy to see well-engineered CI/CD pipelines and logical data models and is delighted whenever a DWH rejects incorrect data.
#AutomateTheBoringStuff #FullStackBI #dbt