Donald Trump and Kosinski's bomb (Part 1)

The article published in the Swiss magazine "Das Magazin" caused a storm of discussion in the media at the beginning of the year. The discussion focused on one question: had Kosinski's methods, applied by a shadowy company known as Cambridge Analytica, decided the result of the American presidential election? The focus of the debate quickly moved from Kosinski and his research to Cambridge Analytica, and in particular the figure of Alexander Nix. Nix has a reputation as a brilliant salesman. The amount of interest he was able to create in Cambridge Analytica suggests that the reputation is justified.

Was Cambridge Analytica responsible for Donald Trump's victory?

The question of whether Cambridge Analytica's involvement decided the election is easier to answer than it might at first appear. Donald Trump's collaboration with Cambridge Analytica did not start until near the end of the primaries in June. In other words, the collaboration lasted less than half a year. During this time Cambridge Analytica was just one of a number of specialist companies supporting Trump, who were often at loggerheads. This is insufficient time to adequately prepare a campaign of this nature, and consequently not long enough for Cambridge Analytica to have become a key influencing factor.

Will Kosinski's methods decide future elections?

Much more interesting than the influence on the last US election is the place where the discussion originated: Michal Kosinski and his research. What methods has he developed? How effective are they? Behind these questions lies one key question: what role will Kosinski's methods play in future elections – and not just in the USA?

In contrast to Nix, Kosinski is a researcher, and as such, he publishes his work. Reading his work would therefore be a good idea. A good starting point is an article published in 2013 whose title neatly summarizes Kosinski's findings: "Private traits and attributes are predictable from digital records of human behavior". Here, Kosinski describes the data he used and how he analyzed it.

How did Kosinski obtain data with information about the personalities
of ten thousand people?

The data collection method is as simple as it is ingenious: offering a personality test on Facebook. Anyone curious enough to see the results "pays" for them with their data. The test subjects of Kosinski's tests voluntarily donated their data to science. Even if you did not want to do that, you still received your test results.

Anyone who completes this kind of test and also makes their data available (their Facebook "likes", for example) is delivering a more or less detailed insight into their own personality, depending on the actual test.

How can personality be expressed in numbers?

Just what kind of insights are we talking about here? The problem with real psychological test methods is that they are very complex to develop and usually take a long time to administer. What you are actually measuring is often subject to a degree of interpretation. The situation is similar with intelligence. When asked what intelligence is, psychologists like to answer: intelligence is what the intelligence test measures.

Personality tests are like this: there are no universally recognized parameters that can be applied to personality measurement. The closest we have are the Big Five or OCEAN personality traits which both Kosinski and Cambridge Analytica make reference to. Nobody discovered the OCEAN characteristics by carefully dissecting brains or scrupulously observing human beings in their natural habitat but rather by using questionnaires and evaluating them using extensive factor analytical calculations based on a very large number of questions and interviewees. The end result is five questions clusters. Each cluster summarizes questions whose answers are highly correlated (to oversimplify: if I know one answer, I know them all) but are at the same time quite independent of the responses from other clusters.

When you look at these clusters, you find that there are content-related relationships between questions belonging to the same cluster. For example, the first cluster contains questions that all have something to do with a degree of openness for new experiences. The "O" in OCEAN stands for "openness". The other four dimensions were also arrived at in the same way. In addition to openness, there is conscientiousness, extraversion, agreeableness, and emotional stability (this dimension used to be called "neuroticism" – hence the "N" in OCEAN). These descriptions of the clusters are therefore interpretations; they cannot be calculated, nor can they be proven to be correct. They are merely plausible.

Stable personality trait or snapshot of a current mood?

You cannot really prove that "openness", for example, is the correct interpretation of certain answers. What you can demonstrate empirically, however, is the stability of the results you get when you measure a person's personality according to the OCEAN model. This is why the five personality dimensions of the OCEAN test are referred to as traits: they are stable personality traits that hardly change at all over time.

The Cambridge Analytica website also uses the term "traits". You can have your own personality measured there relatively quickly in about 50 questions. However, this is where the claims of "scientific procedure" differ from genuine scientific theory. To seriously measure a trait and achieve stability, you have to make use of a large number of questions. There would certainly be some variation in my answers to questions in the web questionnaire such as "I waste my time" or "I find it difficult to get down to work" from one day to the next, or between Monday morning and Friday afternoon. This effect is reduced in psychological scale construction by employing several slightly different questions with similar content. These variations are not arbitrary, but must be examined empirically to ensure that they actually cover similar content. Only a questionnaire containing over 100 questions will come close to producing reliable measurements that will give comparable results if repeated six months later. Only then can you start to predict actual behavior, since the broader scales contain enough questions to enable the detection of subfacets.

What Cambridge Analytica is doing here is measuring states rather than traits. They are therefore measuring the current state of mind of the person answering the questions as facets of their long-term personality structure. If you target your advertising on this basis, you run the risk of addressing moods that disappeared some time ago. This does not mean that it cannot actually influence opinions or voting behavior. One should not, however, be misled by the claim that what is happening here is the result of many years of psychological research.

What information can you gather from a Cambridge Analytica questionnaire? How accurately can the results obtained describe a person's personality? And what does all of this mean for election campaigns in the future? I'll answer all of these questions in the second part of this blog.