Donald Trump and Kosinski's bomb (Part 2)

In the first part of this blog article we learned the basics of Kosinski's method. For example, how Kosinski obtains his data (from Facebook "likes") and how he analyzed it. The OCEAN personality traits play an important role here, although we do need to differentiate between "traits" and temporary "states". Cambridge Analytica has taken advantage of Kosinski's method and used it, for example, in the US presidential election campaign for the benefit of Donald Trump. The question now is: what can we can learn from a Cambridge Analytica questionnaire and how accurately can its results reveal a person's personality?

Personality traits, political orientation and sexual satisfaction

When you complete the Cambridge Analytica questionnaire, you find a few questions that don't belong in an OCEAN questionnaire. This is because the method described just doesn't work with an OCEAN questionnaire. In fact, you can ask almost anything and then use Facebook likes to try to predict something that a half decent questionnaire could tell you. For example, political orientation. Or satisfaction with your life. Or your "financial IQ". Or your taste in music. Or your sexual satisfaction. There are virtually no limits to the imagination. If you'd like to try it for yourself, you'll find a good selection on this University of Cambridge web page. The methodology is always the same: if someone takes an XY test and agrees to make his Facebook likes accessible, I can try to predict the XY characteristic from the test on the basis of the likes. The findings are not always particularly profound. (Likes for Obama, for example, predict that you have Democrat sympathies. Who would have guessed?). If you look at the totality, however, you might find some of the results quite scary: a long list of interpreted likes which, taken as a whole, give a very accurate description of the people to whom they belong.

How accurate is it?

How accurate is it? Enough to give you the creeps? Kosinski's article mentioned above provides a few examples. One of the things he discusses is the OCEAN characteristics we touched on earlier. Kosinski compares the reliability with which he can predict the test results from likes (quantified as a correlation between the test results and the results from an analysis of the likes), with the test-retest reliability of the questionnaire used. For four of the five established personality dimensions he achieves almost half of the test-retest reliability of the questionnaire. For the "openness" personality dimension he even achieves a remarkable 78% of the test-retest reliability. That certainly sounds impressive.

However, you can easily overlook one important point in Kosinski's study. The questionnaire he uses to determine the personality traits is a very small variant containing just 20 questions – significantly fewer than the 100+ questions in a "full" test. Unsurprisingly, the test-retest reliability of the questionnaire, which it uses as a benchmark for its own results, is lower. The table shows a comparison between the results achieved by Kosinski, the test-retest reliability of the questionnaire he uses, and the test-retest reliability of a long OCEAN questionnaire (which you will find here; the original author Robert R. McCrae is one the founding fathers of the OCEAN model). The table shows the three factors that are important to the test-retest reliability (Pearson Correlation Coefficient) of each of the five dimensions of the OCEAN model;

Kosinski's method of Facebook likes (dark bars)
the short questionnaire used by Kosinski (central bars)
a long OCEAN questionnaire as described in the publication by McCrae et al. linked above (light bars)

You would certainly not be misrepresenting Kosinski if you were to conclude that this questionnaire sets the bar as low as possible in terms of the benchmark for his method. If you don't compare the dark bars in the diagram with the middle ones, as Kosinski did in his article, but with the bright ones, it is clear that their accuracy comes nowhere near that of a long OCEAN questionnaire.

Kosinski's results therefore permit a much less precise assessment of personality traits than it might at first appear. Of course, they have not been designed for maximum prediction accuracy but rather as a kind of scientific proof of concept.

Conclusion: be amazed now, feel queasy later

Kosinski's methods did not therefore decide the US election campaign. But they do have the potential to play an important part in future elections. What is really insidious is when these methods are used in combination with fake news, which is what happened during the US presidential election campaign. Nevertheless, my own feeling is that the most insidious part is actually the fake news, even without the psychological targeting.

Incidentally, Michal Kosinski has not only shown that, as he modestly puts it, the bomb exists; he has also uploaded to the internet a complete set of instructions for building it. If you want to build your own version of the bomb at home, you will find the instructions here.