Categorical variables can be used as original texts in SPSS, which results in a substantial loss of performance in the case of large amounts of data, or as numerical codes with labels. The second way is not only drastically more performant but also the right way because although it makes the code in the SPSS syntax more difficult to read, it also makes it absolutely immune to changes in notation.
In SPSS, it is preconfigured whether the numerical codes, the labels or both are displayed as label in the result outputs e.g. the FREQUENCIES command. Everything has pros and cons...
- The labels by themselves are best if the output is embedded into a document as complete table.
- Codes with labels simultaneously facilitate the explorative data analysis and the development of the syntax as, on the one hand, one can directly extract the codes e.g. for filter conditions but, on the other hand, immediately sees the meaning next to it, as well. However, if one copies the result e.g. into Excel for further work steps, both is combined in one cell and can only be separated manually using formulas.
- Thus, the codes by themselves are best for further processing, but otherwise this format is not really suitable for anything.
Workaround in 'Options'
One can switch between the different formats in the options. Under Edit ->Options->Output, there is the field 'structure labelling' on the left. Here, one can switch between labels, values/names and both via pull-downs for the variable names and the variable values.
Best Practice Using Syntax
It is rather laborious to call up this menu item each time in order to change the settings as required. It is easier to use option commands directly in the syntax.
*** Bei Werten: *** Wechsel auf "nur Codes": SET TNUMBER VALUES. *** Wechsel auf "nur Beschriftungen": SET TNUMBER LABELS. *** Wechsel auf beides: SET TNUMBER BOTH. *** Bei Variablen *** Wechsel auf "nur Spaltenname": SET TVAR NAMES. *** Wechsel auf "nur Beschriftungen": SET TVAR LABELS. *** Wechsel auf beides: SET TVAR BOTH.
Thus, one can quickly switch between two notations for one single output in a current syntax:
FREQ spalteA spalteB spalteC. SET TNUMBERS BOTH. FREQ spalte_special. SET TNUMBERS CODE. FREQ spalteD spalteE spalteF.
Here is a specific example for automobile brands. The column 'brand' in the dataset contains automobile brands as numerical codes with labels.
SET TNUMBERS VALUES. FREQU marke. SET TNUMBERS BOTH. FREQU marke. SET TNUMBERS LABELS. FREQ marke.
The code executed above leads to the following three alternative output formats: