In the New Year: An ‘ode’ to logistic regression

Data scientists never make new year speeches or look back at the year gone by. The rate of development of Spark, AI, etc. is too fast to let us relax for a moment and reflect inwardly. Maybe the profession is also too rational for a sentimental review of the past. If I were forced to write something dramatic during this period, I’d certainly choose logistic regression!

Logistic regression is our workhorse and fixed star, our building block, common denominator, classic example, and safe harbor. If I were permitted to take three models to an uninhabited island – you guessed it, I’d undoubtedly carry logistic regression as of them!

It shines here in all its beauty:

Logistic regression – the workhorse

Logistic regression is a workhorse that does its job without much of a fuss. The spotlight is on the rock stars of our type, such as recurrent neural networks and gradient boosted trees. They win data science competitions, trigger AI revolutions and, according to Elon Musk, even threaten our civilization. Nevertheless, something in industrial machinery rooms runs silently and eagerly – logistic regression. And for good reason:

Quick prototyping:

The frequency of development cycles in agile data science projects is rising, often with new data sources being acquired and interpreted successively. The low computation complexity of logistic regression helps significantly shorten the modeling process. Moreover, modeling results are used as a benchmark for developing more complex models. This lowers the risk of finishing a project and ending up empty-handed.

Feature engineering and data understanding in the limelight:

Neural networks and decision trees are tempting, because they simultaneously address feature engineering. The appropriate data scientist sits back and lets learning algorithms handle the task. But danger may be lurking just around the corner: data science means gaining knowledge from data, which calls for taking the data apart in the true sense of this phrase. Logistic regression ensures thoroughness and, based on experience, leads to lucrative insights into data-generating processes, model validity, and interdependencies.

Interpretability and communication:

Machine learning of functional correlations lies mostly within the range of predictability and interpretability. Adaptive models generally enhance the prediction outcome, but unfortunately lower its interpretability. Stakeholders are frequently interested in primarily actionable insights, which can be transformed into concrete guidelines. Logistic regression is meaningful in this case: its interpretation of occurrence probabilities facilitates understandable insights into the complex realm of relationships among the variables.

Communication is a key element of each data science project. Complex projects are repeatedly viewed as black boxes, with little confidence. Hence, interpretability of logistic regression boosts the acceptance of project findings.

Basic component of advanced analytics or “If you want to fly, learn to run first”

Logistic regression analysis is ideal for setting up and advancing new data science departments. This methodology facilitates clarification of many concepts, like gradient descent, maximum likelihood estimation, and regulation. Surprisingly, it is very often used to launch machine learning. Its closed interference concept also lets one build a bridge across to traditional statistics.

Logistic regression and AI

Logistic regression recently demonstrated its diversity and didactic ability in the development of neural networks. Deep neural networks can be viewed as serially connected logistic regression paths. Logistic regression itself is a miniscule neural network, as illustrated below:

If we now connected a host of such neurons in a row and let their outputs serve as inputs for other neurons, what we end up with is the basic structure of a neural network:

In the context of a neural network, logistic regression, aka a sigmoid function, takes on the role of an activating function. The arrowheads of the activating functions here have again taken over other functions, which is impressive and remains so. As soon as one understands the functioning of a logistic regression, one quickly comprehends neural networks.

Logistic regression – the fixed star

If you wish to cope with the cycles of hype surrounding data science, AI, or machine learning, you better delve into logistic regression. Trends come and go, but logistic regression will stay. It has been shining for over five decades in the data science sky and lighting our path. This text is dedicated to logistic regression.

In this spirit, your data science colleagues at b.telligent wishes you a HAPPY and EXCITING NEW YEAR!