Data Science & AI

Nahaufnahme von Händen auf einer Laptop-Tastatur
Quantile Regression With Gradient Boosted Trees
Quantile Regression With Gradient Boosted Trees

Quantile Regression With Gradient Boosted Trees

When we do simple descriptive data exploration, we are seldom content with analyzing mean values only. More often, we take a more detailed look at the distribution, have a look at histograms, quantiles, and the like. Mean values alone often lead to erroneous conclusions, and keep important information hidden. But if this is the case, why do we forget about this as soon as we build predictive models? These usually aim only at mean values - and they lie.

Large Language Models – An Overview Of The Model Landscape
Large Language Models – An Overview Of The Model Landscape

Large Language Models – An Overview Of The Model Landscape

Since the release of ChatGPT and the attention drawn to large language models, we have seen a rapid increase in releases of more models and a rapidly evolving market associated with a use of LLMs. A model's suitability for utilization in a business context depends heavily on the respective use case. In this blog post, we shall take a closer look at the currently most important models, and compare them on the basis of enterprise-relevant criteria so that you can maintain a better overview.

Data Science for Kids: How To Win at “Guess Who?”
Data Science for Kids: How To Win at “Guess Who?”

Data Science for Kids: How To Win at “Guess Who?”

The other day, I played "Guess Who?", the classic game for children from about 6 to 9 years, with my six-year-old son. While we were playing, we both tried to work out the best way to win the game. This article series is the result of our search for an effective game plan. Part 1 is aimed at the whole family. OK - let's find out how to win!

Deliver Projects Faster With Python Ibis Analytics
Deliver Projects Faster With Python Ibis Analytics

Deliver Projects Faster With Python Ibis Analytics

If successful proof of concept (PoC) for a data-analysis pipeline is to be followed by production, this often proves to be a long road. Ibis makes it possible to simplify this process and thus add value faster.

After successful local development of a data-analysis pipeline in Python, the code often needs to be rewritten to allow operation in production mode. But does it really have to be that way? Programmed by Wes McKinney, lead author of Python Pandas library, the Python Ibis library provides a fascinating solution for balancing data processing between the production and development environments, thus enabling analytics teams to achieve production faster. This blog post of ours shows how it works.

Brief Guide to Using Generative AI and LLMs
Brief Guide to Using Generative AI and LLMs

Brief Guide to Using Generative AI and LLMs

Ever since ChatGPT was introduced in late 2022, we have all been thrilled about the possibilities of generative AI and large language models (LLMs). What intrigues people is the incredible ease of generating high quality texts and getting responses to questions, code fragments, etc. You simply write a prompt, which is a text input, feed it to the ChatGPT’s API, and voilà, you have a response.

We are still very much in a generative AI hype cycle, where the benefits of a technology are typically overstated. For businesses, it is important to avoid the attendant pitfalls, and to understand when and how to best use ChatGPT or generative AI solutions. In this blog, we look beyond the hype, and show you an approach to evaluate and implement LLM-based Gen AI use cases.

Caret: A Cornucopia of Functions For Doing Predictive Analytics In R
Caret: A Cornucopia of Functions For Doing Predictive Analytics In R

Caret: A Cornucopia of Functions For Doing Predictive Analytics In R

R is one of the most popular open source programming languages for predictive analytics. One of its upsides is the abundance of modeling choices provided by more than 10000 user-created packages on the Comprehensive R Archive Network (CRAN). On the downside, package-specific syntax choices (which are a much bigger problem in R than in e.g. in Python) impede the employment of new models. The caret package attempts to streamline the process of creating predictive models by providing a uniform interface to various training and prediction functions. Caret’s data preparation- , feature selection- and model tuning functionalities facilitate the process of building and evaluating predictive models. This blog post focuses on model tuning and selection and shows how to tackle common model building challenges with caret.

Recommender Systems – Part 3: Personalized Recommender Systems, ML and Evaluation
Recommender Systems – Part 3: Personalized Recommender Systems, ML and Evaluation

Recommender Systems – Part 3: Personalized Recommender Systems, ML and Evaluation

Algorithms for Personalized Recommendations

Users do not always leave behind enough personalized information along their customer journey. For instance, new customers can be acquired or existing customers might browse an e-commerce website without being logged in. Non-personalized recommendation systems, such as those based on proposals for products frequently purchased together, still offer recommendation opportunities for companies in this case. However, the more individually these are tailored to the customer, the better.

Use of Private Python Packages in Vertex AI - 3
Use of Private Python Packages in Vertex AI - 3

Use of Private Python Packages in Vertex AI - 3

As data scientists, we regularly train different machine-learning models in the cloud. Here you can find out how to structure your model training with the help of Python packages. Although each model has its own specific, intended application, some code snippets are ultimately copied from one project to another. In my case, this code is often for reading data from a database or for a pre-processing step. By allowing frequently used functions to be collected in one place, Python packages are ideal for avoiding this kind of code copying. This offers many advantages in the maintenance and testing of code.

In this blog article, we will see how a Python package can be utilized in GCP and integrated into a Vertex AI training job.

Howto: Splitting Files With Standard Python Scripts
Howto: Splitting Files With Standard Python Scripts

Howto: Splitting Files With Standard Python Scripts

Ready-Made Data Sets Which Explode the Limits

I am frequently confronted with raw data that is provided to me for analysis and which, when uncompressed, can easily encompass files of half a gigabyte or more. Starting from one gigabyte and over, the desktop-supported statistics tools slowly become strained. There are, of course, tool options for only selecting part of the columns, or only loading the first 10,000 lines, etc.

But what should you do when you only want to take a random sample from the data provided? You should never rely on the file being randomly sorted. It may already have gained systematic sequence effects due to processes in the database export. It also may be the case, that you only want to analyse a tenth of a grouping, such as the purchases made by every tenth customer. To this end, the complete file has to be read as otherwise it is impossible to ensure that all of the purchases of the filtered customers are taken into account.