Tensorflow probability, entropy maximization and dockerizing – the first day of PAW 2019

Data science is the art of separating the signal from the noise.That's why, as a data scientist, I like to go to conferences where the program is presented by genuine subject experts. No sales pitches and no meaningless buzzword bingo for me, thank you; give me in-depth, honest talks by people with years of experience. At Predictive Analytics World, such presentations were not hard to find – on Tenserflow Probability as well as on other fascinating topics.

Expert knowledge and entropy

Dominik Ballreich gave a thought-provoking talk on incorporating expert knowledge in the predictions of complex algorithms. In a Bayesian setting, he showed why it made sense to integrate expert knowledge into model-based forecasts. Relative entropy (better known as Kullback-Leibler divergence) was the tool that made the model-based raw forecast as close to expert knowledge as possible. There was a strong focus on mathematics, dealt with in impressive depth and including a discussion of Kalman and particle filters.

The other fundamental problems that always arise when trying to harness expert knowledge are more marginal: how do I weight the reliability of this knowledge? The proposed solution – to trust in the reliability of the expert's knowledge – is not always enough. The talk did not look in any depth at the even more fundamental question of how to formalize as a Bayesian analysis the imprecise notions that experts frequently put forward. However, the flexibility of the approach suggested here allows for a variety of different formalization options. The mathematical part of the work looks extremely well done, but we still need some good ideas for the interfaces to the non-mathematical part of the equation.

paw-messestand

Dr. Michael Allgöwer, Dr. Sebastian Petry and Max Kurthen at Predictive Analytics World 2019

Urgent dockerizing

Benedikt Mangold’s talk also had a technical focus, although he was less concerned with mathematical methods. Instead, he explained how Docker, Rest APIs, etc. can help to preserve the results of data science PoCs in a way that allows them to be used at any time and their reusability tested for other projects.

The solution presented will only be worthwhile for a small number of companies because it requires fairly substantial technical resources. Not only that, but only companies with a relatively large number of teams working on similar data will be able to transfer the results of previous PoCs. Nevertheless, it is an interesting idea to consider when the usual parameters do not apply. This was highlighted in the post-presentation discussion, which focused strongly on the cost benefits of Docker and Rest APIs as well as the available alternatives.

Tensorflow Probability

I have saved my favorite presentation of the day for the end of this blog post. Sigrid Keydana discussed the question of how to use Tensorflow Probability to quantify the uncertainty of forecasts based on neural networks. I have already dealt with Tensorflow Probability in a previous two-part blog post and Sigrid Keydana's presentation was along similar lines. While I used Tensorflow Probability as a tool for Bayesian inference without involving neural networks, her presentation focused on how Tensorflow Probability can be used to combine Bayesian statistics and neural networks in a meaningful way, particularly neural models that only generate point estimates to enhance a quantification of prediction uncertainty. She discussed several approaches and linked the statistical/conceptual considerations to the technical implementation in Tensorflow Probability. What surprised me was how simple the technical implementation can be. The conceptual part of this work is not so simple, and draws greatly on the research carried out during the past few years, which the presentation dealt with in a wonderfully clear overview. I can recommend Sigrid Keydana's contributions to RStudio's Tensorflow blog to anyone who wishes they had been there. This is some of the best material currently available on Tensorflow Probability, regardless of whether you prefer R or Python.

These are my impressions of the first day of Predictive Analytics World 2019. Tomorrow, my colleague Maximilian Kurthen will report on the second day.

Tensorflow probability, entropy maximization and dockerizing – the first day of PAW 2019

Expert knowledge and entropy

Urgent dockerizing

Tensorflow Probability

Read more: