Customer Journey Analytics: Modelling Based on Markov Chains

Master Thesis honored by Berufsverband Deutscher Markt- und Sozialforscher e. V. (German Market Research Industry Association)

A majority of the German population uses the internet in order to gather information about and purchase products. Thus, approx. 42 billion EUR were turned over via online sales in Germany in 2014 alone. Customer acquisition and loyalty by means of the internet are thus becoming strategic aspects of business management. In 2013/2014, I dealt with the modelling of internet usage behavior in the travel market at the chair for statistics and econometrics at the University Erlangen-Nuremberg in cooperation with the GfK SE Marketing Science. In spring, the resulting thesis was honored as the best thesis of the year by the German Market Research Industry Association. Thus, a better understanding of the internet users facilitates e.g. a more in-depth personalization of websites and also a targeted modulation of advertisement contents. The scientific interest primarily focuses on the actual structure of the user behavior and the underlying determinants.

Not Yet Exploratorily Used Data Basis

The thesis uses a GfK SE data pool so far not used in this context; the data pool records web page contacts within the travel market and enriches them with a large amount of additional information (booking and individual characteristics). In this process, data from observation (GfK Media Efficiency) and survey sources (GfK TravelScope) are used. The model includes not only the mentioned characteristics, but also research by customers who eventually bought a product offline.

Modelling after Extensive Variable Selection

After an extensive data preparation, a Markov Chain is modelled: Starting point for modelling the heterogeneity is the transition probability, which is separately considered for each stage of the journey per user. By means of the use of variable selection procedures, variables with a high potential contribution to explaining the heterogeneity can be identified in the model and the large amount of coefficients potentially to be estimated can be reduced. Stepwise regressions and Lasso estimations are used. The analysis of cross-validation studies shows, however, that the Lasso estimation is more appropriate for the selection of significant covariates in this context.

Findings for Practice

For example, the following findings can be reported:

The customer journey can be modelled by means of a first order Markov Chain - i.e., the currently visited website thus influences the next visited page.
It can be further concluded that it is not the users themselves who differ from other users with respect to their information search behavior; rather, the process of the web journey depends on the actual target, the purchased product. Thus, the information search behavior for a flight significantly differs from a hotel booking or a package holiday.
Within the journeys, one frequently recognizes the constant transition between usually two website categories. This suggests that the customer often purchases a bundle of journey products (e.g. flight and hotel) and, in addition, researches simultaneously.
The distinction of an information process after the online or offline purchase is not as significant as may be assumed; only the duration of the journeys differs - however, their structure is similar.

The findings gained result in important implications for the practice of market research and advertisers. Thus, information on the optimal use of website personalization and cross-selling measures are given. In addition, basics for further research and, in particular, data preparation and pre-processing in the customer journey's environment are created.