FAQ/missing - CBU statistics Wiki

Revision 9 as of 2010-05-18 14:30:22

Clear message
location: FAQ / missing

How do I handle missing data in SPSS?

Missing values are problematic in multivariate analyses because they reduce the number of cases as cases with any incomplete information are automatically dropped. One simplistic approach to this problem is to 'fill in' the missing values using variable means. The below illustrates how to use macros to perform this approach in SPSS and assumes missing values are missing completely at random so the missing values are not likely to differ in value from those that are recorded.

There are, however, more complex approaches (namely the EM algorithm and mixed random effect models) to handling missing values which are detailed [:FAQ/emalgm: here.] These approaches have gained popularity and are now available to use in most statistical packages.

Below are two macros for replacing missing values with variable means in SPSS. Suppose we have 50 variables labelled in consecutive columns aq1 to aq50. The below macro will identify only complete cases.

compute ind=1.
exe.

define !inmiss ( !pos !tokens(1)
                          / !pos !tokens(1)) .
!do !i=!1 !to !2.
if missing(!concat(aq,!i)) ind=ind*0.
!doend.
!enddefine.

!inmiss 1 50.
exe.
USE ALL.
COMPUTE filter_$=(ind=1).
VARIABLE LABEL filter_$ 'ind=1 (FILTER)'.
VALUE LABELS filter_$  0 'Selected' 1 'Not Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE .

The below macro will replace the missing values with the variable mean

define !inmiss ( !pos !tokens(1)
                          / !pos !tokens(1)) .
!do !i=!1 !to !2.
rmv /!concat(aq,!i,a)=smean(!concat(aq,!i)).
compute !concat(aq,!i,a) = rnd(aq,!1,a).
!doend.
!enddefine.

!inmiss 1 50.
exe.

As the items are dichotomous hence can only take two values we could consider rounding up the inputed means so that they take values that can actually occur. For 50 variables called aq1a to aq50a the below syntax rounds up their inputed values and places the results in variables y1 to y50.

do repeat r=aq1a to aq50a /y = y1 to y50.
compute y=rnd(r).
end repeat.
exe.