Jeromy Anglim's Blog: Psychology and Statistics


Friday, October 23, 2009

Syntax Tips for Efficient Variable Selection in SPSS

This post discusses a few tricks for efficiently using syntax in SPSS. The suggestions aim to make variable selection more efficient and less error-prone. Specifically, an example is given of how to efficiently run a reliability analysis on a set of scales. The post is aimed at researchers using SPSS who are just starting to learn about the importance of Syntax.

OVERVIEW
The Problem: 
I have previously discussed how selecting variables for analysis can be a slow and error-prone process. This is particularly the case in psychology, where analyses often require 50 or more variables to be selected. Unless the variables happen to be in a sequential order, manually selecting variable names from the standard SPSS selection menu can be a slow and error-prone process. And If you want to re-run the analysis or change an old analysis slightly, the process needs to be repeated.


A Solution:
One solution used by many more experienced SPSS users involves, first creating a syntax template, and then creating a a string of variable names. The details are as follows:

1) Create Syntax Template
This step involves setting up the core elements of the syntax required for an analysis. This can easily be done by going to the menu for the require analysis (e.g., reliability analysis, factor analysis, descriptive statistics, compute statement, etc.). Put in a couple of arbitrary variables and set up all the options that desired. Then, press "Paste" on the dialogue box. A syntax template is then produced where the only remaining requirement is to substitute the arbitrary variables with the variables which wanted for the analysis.

2) Create String of Variable names
The string of variable names can be generated in a variety of ways.

  • One way is to have lists of variable names stored in sets in a text file. These can then be copy and pasted when required. 
  • A second and probably better way is to have something like an Excel spreadsheet where all variable names are stored in a Table, and columns are present that allow for the selection of subsets of variables. See the approach that I use here.
Depending on how the string of variable names is stored and the requirements of the SPSS syntax, you may have to apply a few extra steps.
  • If your variable names are stored in a column in Excel, you may want to re-arrange them into a line separated by spaces. One way to do this is to paste the string into word and replace (Control + H) New Lines (^p) with Space (" ").
  • If you are putting the variables into a COMPUTE statement you may need to separate the variables with commas. Paste in the variable names, highlight the variable names, and replace (Control + H; or Edit - Replace) Space with a comma and space.

CASE STUDY - RELIABILITY ANALYSIS
Aim: The aim of this analysis is to run a reliability analysis on 5 scales of a 50 item version of the IPIP.

1) Create Syntax Template
1.1 Go to the Menu: Analyze - Scale - Reliability (Put in a couple of variables; it does not matter which); Specify any options (e.g., Statistics - Scale if Item Deleted).
1.2 Press Paste.
This should yield syntax that looks something like this:
RELIABILITY
  /VARIABLES=personality1 personality2 personality3
  /SCALE('ALL VARIABLES') ALL
  /MODEL=ALPHA
  /SUMMARY=TOTAL.
Notes:
  • "personality1", "personality2", and "personality3" are just arbitrary variables that were in my data file. In order for the analysis to be correct, the appropriate variables need to replace these words.
  • SPSS keywords are in capital. The above syntax represents one command starting with a command name and ending with a dot. Categories of options begin with a forward slash and an option name.
  • To learn more about the syntax, press F1 when your cursor is on the syntax.
  • The reason for adopting the above process is to assist you if you have not already learnt the reliability syntax.
1.3 Tidy Up The Syntax Ready for Pasting
Delete the arbitrary variables. You may also want to reduce the number of lines taken by the syntax.
RELIABILITY  /VARIABLES=
  /SCALE('Extraversion') ALL  /MODEL=ALPHA  /SUMMARY=TOTAL.

2) Create String of Variable Names
You can get the string of variable names from anywhere that they are easily and reliably stored.
For example, if you had a compute statement that you had already used, you could get the variable names from there. Eg.,
COMPUTE ipipReversedExtraversionMean = mean(ipipReversed1, 
 ipipReversed6, ipipReversed11, ipipReversed16, 
 ipipReversed21, ipipReversed26, ipipReversed31, 
 ipipReversed36, ipipReversed41, ipipReversed46).

The above compute statement could be used to copy and paste the variable names into the RELIABILITY syntax. You would of course have to remove all the commas (i.e., highlight and go to edit - replace: find "," replace with nothing). This would yield something like this:
RELIABILITY  /VARIABLES=ipipReversed1 
 ipipReversed6 ipipReversed11 ipipReversed16 
 ipipReversed21 ipipReversed26 ipipReversed31 
 ipipReversed36 ipipReversed41 ipipReversed46
  /SCALE('ALL VARIABLES') ALL  /MODEL=ALPHA  /SUMMARY=TOTAL.

This could then be run (i.e., Highlight and press Run or Control + R) to produce some output like this:

The only remaining step would be to repeat the above process for the remaining scales.

A more general option for variable selection is to store the variable names in some form of database an extract them from there. See this post for ideas about storing variable names in an Excel Table and then extracting the variable names based on some filter.

In particular it can be useful to store psychological tests as Excel tables and use these for variable selection. The following are an assortment of tips on doing this:

  • Set up the table so that each row is an item
  • Columns should include: Item Number, Item Text, Whether the item should be reversed, Scale, Subscale, and Variable Name. There may be other columns that may be useful, such as whether an item should be retained if you are refining the scale.
  • If you set up the cells as a table it is easier to filter and sort while maintaining table integrity.
  • Sometimes it is useful to generate a variable from what you already have. For example, if you wanted to create the variable name for the above items, you could write a function in Excel in an extra column: ="ipipReversed"&[Cell with the item Number]. 

Related Posts