Blog Posts: Data Science & AI

Nahaufnahme von Händen auf einer Laptop-Tastatur

Text Embeddings and Vector Search: Optimizing Retrieval in RAG Systems

3.11.2025

Text Embeddings and Vector Search: Optimizing Retrieval in RAG Systems

You’ve implemented RAG – but what comes next? Embeddings are the foundation of any RAG system. In this post, we’ll walk you through a CV matching use case to show how analyzing text embeddings can make vector search more effective — and retrieval in GenAI projects more accurate, explainable, and fair.

Snowflake Document AI – Easily Extract Data From Unstructured Documents

7.3.2025

8.5.2025

Snowflake Document AI – Easily Extract Data From Unstructured Documents

With Snowflake Document AI, information can be easily extracted from documents, such as invoices or handwritten documents, within the data platform. Document AI is straightforward and easy to use: either via a graphical user interface, via code in a pipeline or integrated into a Streamlit application. In this article, we explain the feature, describe how the integration into the platform works and present interesting application possibilities.

Neural Averaging Ensembles for Tabular Data With TensorFlow 2.0

28.10.2019

8.5.2025

Neural Averaging Ensembles for Tabular Data With TensorFlow 2.0

Neural Networks for Tabular Data: Ensemble Learning Without Trees

Neural networks are applied to just about any kind of data (images, audio, text, video, graphs, ...). Only with tabular data, tree-based ensembles like random forests and gradient boosted trees are still much more popular. If you want to replace these successful classics with neural networks, ensemble learning may still be a key idea. This blog post tells you why. It is complemented by a notebook in which you can follow the practical details.

4.9.2024

8.5.2025

Sizing and Scaling Azure AI Search

Azure AI Search, Microsoft’s top serverless option for the retrieval part of RAG, has unique sizing, scaling, and pricing logic. While it conceals many complexities of server based solutions, it demands specific knowledge of its configurations.

30.7.2024

8.5.2025

Efficient Distance Joins in Polars

Polars: Develop Faster, Execute Faster

Polars, the Pandas challenger written in Rust, is much faster, not only in executing the code, but also in development. Pandas has always suffered from an API that "grew historically" in many places. Polars is completely different: it ensures significantly faster development, since its API is designed to be logically consistent from the outset, carefully maintaining stringency with every release (sometimes at the expense of backwards compatibility). Polars can often easily replace Pandas: for example, in Ibis Analytics projects and, of course, for all kinds of daily data preparation tasks. Polars’ superior performance is also helpful in interactive environments like Power BI.

13.6.2024

8.5.2025

How Mature Is Your ML Approach?

Machine Learning Operations (MLOps) is a practice for collaboration and communication between data scientists and operations professionals to help manage production Machine Learning (ML) lifecycles. It involves the principles of DevOps in the ML lifecycle to streamline and automate the process from model development to deployment and monitoring. The intention of MLOps is to develop faster deployment and scaling of ML models in a structured and efficient manner.

Automated Image Processing: A Standard Architecture

3.4.2024

8.5.2025

Automated Image Processing: A Standard Architecture

The PoC has been made, a model ready for production has been trained, and the showcase has inspired all stakeholders. But in order for business cases to be realized with the model, it (and the related processing) must be embedded in the existent (cloud) landscape.

8.6.2022

8.5.2025

LightGBM On Vertex AI

In the Google cloud, Vertex AI is the MLOps framework. It is very flexible, and you can basically use any modelling framework you like. However, some frameworks are a bit easier to use than others: Tensorflow, XGBoost and Scikit-Learn are supported with some prebuilt images which are very helpful. This blog post will show how you can train and deploy models which are not generated by another framework. We will use a LightGBM model as an example, but the workflow can easily be transferred to any other modelling package.

8.11.2021

8.5.2025

How To Install Ray Under Windows

Ray enjoys a growing popularity in the machine learning community. Getting it up and running under Windows can be tricky however. This blog tells you how.

Data Science & AI

Text Embeddings and Vector Search: Optimizing Retrieval in RAG Systems

Snowflake Document AI – Easily Extract Data From Unstructured Documents

Neural Averaging Ensembles for Tabular Data With TensorFlow 2.0

Neural Networks for Tabular Data: Ensemble Learning Without Trees

Sizing and Scaling Azure AI Search

Efficient Distance Joins in Polars

Polars: Develop Faster, Execute Faster

How Mature Is Your ML Approach?

Automated Image Processing: A Standard Architecture

LightGBM On Vertex AI

How To Install Ray Under Windows

Munich

Berlin

Cluj

Dusseldorf

Frankfurt

Hamburg

Nuremberg

Vienna

Zurich

Zurich

Nürnberg

Munich

Basel

Cluj

Vienna – Postal address

Vienna – Visitor address

Frankfurt

Düsseldorf

Hamburg

Berlin