Data Science & AI

Nahaufnahme von Händen auf einer Laptop-Tastatur
Boosting For The Naive Bayes Classifier
Boosting For The Naive Bayes Classifier

Boosting For The Naive Bayes Classifier

There are many areas in which neuroscience and machine learning overlap. One of these is the combining of learning during several learning episodes with small success in order to eventually use a merged, stronger, learned model for a particular task. In machine learning, this process is referred to as "boosting". The development of solutions of this kind is a very interesting topic, in particular in the IT industry; thus, a short introduction to machine learning is provided below which presents the basic ideas and the application of the naïve Bayes classifier in R.

Reinforcement Learning, Bayesian Statistics And Tensorflow Probability: A Child's Game (part 1)
Reinforcement Learning, Bayesian Statistics And Tensorflow Probability: A Child's Game (part 1)

Reinforcement Learning, Bayesian Statistics And Tensorflow Probability: A Child's Game (part 1)

Reinforcement learning has a bad reputation for being extremely data-hungry – so data-hungry it can only realistically be trained in simulation-generated data, e.g. in a computer game. We discuss how this can be cured using Bayesian Statistics, using an easily accessible small example. In the second part of this blog series, we see how this can be done in practice using TensorFlow Probability, a hot new tool from Google.

Best Practice for SQL-Statements in Python
Best Practice for SQL-Statements in Python

Best Practice for SQL-Statements in Python

Thanks to a compulsory interface for database connectors, the "Python Database API Specification v2.0, PEP249", all current connectors have been developed so that database connections and the SQLs for data retrieval and data transactions can be started using the same commands. Results are received in more or less the same format everywhere. It is regarding this issue that there seem to be the most severe deviations from the required standardisation.
But this should not scare anyone off from using Python scripts as a flexible method for automating database operations.

Very Best Practice: Working With Paths In Python - Part 2
Very Best Practice: Working With Paths In Python - Part 2

Very Best Practice: Working With Paths In Python - Part 2

The Same Problem: Listing Folders and Drives

In the last blog, we used a recursive function for a solution with less than 10 lines to scan folders and allow file evaluation by modification date and size.

Now I’m going to raise the bar somewhat for this example by showing even better alternatives.

Best Practice: Working With Paths In Python (Part 1)
Best Practice: Working With Paths In Python (Part 1)

Best Practice: Working With Paths In Python (Part 1)

The problem: listing folders and drives

Recently while working on a project, a colleague asked whether one could list the content of drives in Python. Of course, you can. Moreover, since this isn’t at all complicated, I’d like to take this case to illustrate key best practices recommended for working with paths on drives.

Development Of A Powerful Data Science Team
Development Of A Powerful Data Science Team

Development Of A Powerful Data Science Team

Data science has undergone an increasing professionalization and standardization during recent years. The frequently intrinsically motivated data tinkerer and diddler, who fills the niche "analysis" in his business with very high company-internal data and process know-how, is reaching his limits.

Increasing demands, especially in the course of a stronger customer focus across all industries, force businesses to professionalize the structures in the area "data science": This includes knowledge, available data sources and their preparation and data science products already used in the business.

Time Series Analysis Made Easy – Completely Without Analysis Tool
Time Series Analysis Made Easy – Completely Without Analysis Tool

Time Series Analysis Made Easy – Completely Without Analysis Tool

Starting Situation

The controlling division of a telecommunications business is to be supported regarding the forecasting of the monthly development of gross adds figures. "Gross adds" is the key figure which reports the gross new customer growth within a defined period, where the number of lost customers is not taken into account. The key figure "gross adds" is primarily used in the telecommunications industry and reflects the number of newly concluded contracts (postpaid and prepaid).

Reformatting SPSS Value Labels for Output
Reformatting SPSS Value Labels for Output

Reformatting SPSS Value Labels for Output

Use Cases

Categorical variables can be used as original texts in SPSS, which results in a substantial loss of performance in the case of large amounts of data, or as numerical codes with labels. The second way is not only drastically more performant but also the right way because although it makes the code in the SPSS syntax more difficult to read, it also makes it absolutely immune to changes in notation.

Selfmade SPSS Frequency Analyses in R
Selfmade SPSS Frequency Analyses in R

Selfmade SPSS Frequency Analyses in R

I have been an intensive SPSS user since my time as psychology student. Accompanying me across all versions during this time were the simple brief commands in order to show me descriptive statistics. These short commands quickly become second nature and thus enable a fast data viewing.

Currently, my tool focus is on R. It is an excellent alternative but despite extensive experience with this open source tool, I still feel a lack of the usability of SPSS. I simply miss my short commands. However, it is relatively easy to add simple commands similar to SPSS as functions in R yourself.