FAQ/missing - CBU statistics Wiki

Revision 8 as of 2010-05-18 14:20:38

Clear message
location: FAQ / missing

How do I handle missing data in SPSS?

Here are two macros for replacing missing values in SPSS. Suppose we have 50 variables labelled in consecutive columns aq1 to aq50. The below macro will identify only complete cases.

compute ind=1.
exe.

define !inmiss ( !pos !tokens(1)
                          / !pos !tokens(1)) .
!do !i=!1 !to !2.
if missing(!concat(aq,!i)) ind=ind*0.
!doend.
!enddefine.

!inmiss 1 50.
exe.
USE ALL.
COMPUTE filter_$=(ind=1).
VARIABLE LABEL filter_$ 'ind=1 (FILTER)'.
VALUE LABELS filter_$  0 'Selected' 1 'Not Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE .

The below macro will replace the missing values with the variable mean

define !inmiss ( !pos !tokens(1)
                          / !pos !tokens(1)) .
!do !i=!1 !to !2.
rmv /!concat(aq,!i,a)=smean(!concat(aq,!i)).
compute !concat(aq,!i,a) = rnd(aq,!1,a).
!doend.
!enddefine.

!inmiss 1 50.
exe.

As the items are dichotomous hence can only take two values we could consider rounding up the inputed means so that they take values that can actually occur. For 50 variables called aq1a to aq50a the below syntax rounds up their inputed values and places the results in variables y1 to y50.

do repeat r=aq1a to aq50a /y = y1 to y50.
compute y=rnd(r).
end repeat.
exe.

[:FAQ/emalgm: More complex approaches to missing values] Multivariate procedures usually only use complete cases giving an accompanying loss of power. There are two ways to address this: estimating missing values using existing data (as used above where we used the variable means) or using random effect models.

Howell gives a comprehensive overview and illustration of these rechniques[http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/Missing.html here.]

Howell suggests that a better way to estimate missing values on a variable us by using a more complex approach than variable means, namely the EM. This can be used under analyse>missing value analysis from version 13 of SPSS or using PROC MIANALYSE in SAS or stand-alone freeware which can be dowloaded from [ http://www.stat.psu.edu/~jls/misoftwa.html here.] All these procedures produce a 'filled in' (or imputed) data set with values estimated using the original data replacing the original missing values. The analysis can then be carried out using this filled-in data set.

Howell does point out that there is no 'unique' estimate of missing values and that multiple imputations or 3-5 data sets 'filled-in' with the estimates of missing values should be analysed to assess the consistency of the results but doesn't suggest in most cases, notably for analysis of variance, how these results should be combined and, for this reason, prefers using random effects models for missing values in analysis of variance.

Random effect models use all cases irrespective of whether they contain missing values and, so, have a unique solution. These are available for use in most statistical packages such as SPSS (MIXED), SAS (MIXED) and R (LME). They are useful for analysis of variance where it is wished to generalise results from the factors considered.