Data Science & AI

Nahaufnahme von Händen auf einer Laptop-Tastatur
Howto: Easy Web Scraping With Python
Howto: Easy Web Scraping With Python

Howto: Easy Web Scraping With Python

Overwhelming Offer in the Webshop

Two weeks ago, a frequently used online mail-order company, whose reminds of a river in South America, called my attention to a campaign by a friendly information email. Namely, three music CDs from a large selection were offered to me for 15€.

As in the past, I still enjoy buying music on physical sound carriers and decided to have a closer look at the offer. It turned out that approx. 9,000 CDs were offered on about 400 pages in the online shop. This shop provides the possibility to sort the offers by popularity or customer ratings. However, if I view the popularity in descending order, I find many titles which do not quite correspond to my age group. On the other hand, if I sort the offers by customer ratings, it turns out that the shop processes the ratings in an unweighted manner. That means that a CD with only one 5 star rating is listed above another CD with 4.9 stars over 1,000 ratings.

High Performance (Mental) Exercise With R
High Performance (Mental) Exercise With R

High Performance (Mental) Exercise With R

This article deals with the following three questions on a high level and very briefly:

  • What does a data-driven person think when he hears contentions?
  • Which tool is more practical for data analyses: R, Python, Java, MATLAB?
  • Can sporting disciplines be the next application area for data analyses and machine learning
Missing Values in Logistic Regression
Missing Values in Logistic Regression

Missing Values in Logistic Regression

In addition to decision trees, logistic regression is the workhorse in the modelling in order to forecast the occurrence of an event. Fortunately, both methods are designed in a way that one can basically use any kind of predictor for the prediction, whether dichotomous categories, multi-level categories or continuous variables on interval scale level.  Especially the logistic regression, however, has no possibility to reasonably deal with missing values. In social science research or market research, one often makes do with limiting analyses to complete data sets.

Looking Into the Data Science Toolbox
Looking Into the Data Science Toolbox

Looking Into the Data Science Toolbox

Let us have a joint look into our toolbox in this blog entry. The topic provides material for more than one blog entry and we will get back to it time and again in this blog.

The Monty Hall Problem in 10 Python Lines
The Monty Hall Problem in 10 Python Lines

The Monty Hall Problem in 10 Python Lines

Background of the Problem

Many will remember the game show "Let´s make a Deal" from the 90ies where candidates had to choose one of three gates. Behind one gate, the prize was always hidden, and behind the other gates lurked blanks, i.e. the Zonk or, in the USA with presenter Monty Hall, goats. At the start, the candidate always chooses a gate behind which he believes the prize to be hidden. Then, the presenter can try to change the candidate´s mind to other gates by offering cash. He can also open gates in order to increase the excitement.

The Basic Ideas behind Recommendation Systems
The Basic Ideas behind Recommendation Systems

The Basic Ideas behind Recommendation Systems

What to consider before starting the Development

Recommendation systems are a crucial part of every digital business model. This blog post concisely answers two foundational questions:

  1. Who should care about recommendation systems and why?
  2. What are the primary flavors of recommendation systems? How much work is it to implement them?

In this article I focus on a solid overview.

R Tips and Tricks - Part 1
R Tips and Tricks - Part 1

R Tips and Tricks - Part 1

R is the Open-Source All-rounder with a Difficult Learning Curve

Approximately three years ago, I switched from a commercial statistics solution (that was similar to SPSS) to R.  I can now say with conviction that I don't need another tool for advanced analytics. Especially in combination with IDE "R-Studio", the software has now reached a level of maturity that allows it to be used in big data science projects without any concerns.

There is, however, no need to delude oneself that one can install R easily and get started immediately. The learning curve is comparatively steep because there are multiple ways to do things due to the variety of packages, amongst other reasons.  Frequently, I was annoyed during my evaluation when I was suddenly tripped up by a trivial step and this meant I had to research how to solve the problem in R before continuing. Therefore, in this introduction (hopefully with many more parts to follow), I would like to present some tips and tricks that I would have appreciated knowing when I started.

A Basket Full of Snakes: Python Modules for Data Science
A Basket Full of Snakes: Python Modules for Data Science

A Basket Full of Snakes: Python Modules for Data Science

Anyone who knows my former blogs knows that I am a big fan of both R and Python in daily work.

As powerful as R is in terms of functionalities for data analysis and modeling, as quickly is the motivation subdued in case of "number crunching" when RAM runs at maximum.

In this context, a nice server installation with a lot of metal (e.g. 96Gig-RAM) works wonders.

As this option is not always available, I have made a virtue of necessity and turned towards the more performant alternative, namely the Python based R alternatives, especially since I have been using Python for ETLs and data preparation for a long time.

Computer Vision 101: How Machines Learn To See
Computer Vision 101: How Machines Learn To See

Computer Vision 101: How Machines Learn To See

Whether in storage, production or customer service – completely different business processes all involve a use of images which need to be analyzed and evaluated. However, manual evaluation of these images is time-consuming and error-prone. These procedures can be automated with the help of computer vision, i.e. machine analysis and processing of images. Thanks to highly mature methodology, machines are now able to carry out even complicated analyses.