Differences between revisions 5 and 7 (spanning 2 versions)

The EM algorithm and mixed (random effects) model approaches to missing values

Multivariate procedures usually only use complete cases giving an accompanying loss of power. There are two ways to address this: estimating missing values using existing data (e.g. using the variable means) or using random effect models.

Howell gives a comprehensive and accessible overview and illustration of all these techniques [http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/Missing.html here.] This is well worth a read for getting a feel for the issues involved and how they can be addressed!

Howell, in particular, suggests that a better way to estimate missing values on a variable is by using a more complex approach than variable means, namely the EM algorithm. This can be used under analyse>missing value analysis from version 13 of SPSS or using PROC MI and PROC MIANALYSE in SAS or stand-alone freeware (NORM) which can be downloaded from [http://www.stat.psu.edu/~jls/misoftwa.html here.] The EM algorithm produces a 'filled in' (or imputed) data set with values estimated using the original data replacing the original missing values. The analysis can then be carried out using this filled-in data set. Note each missing data estimate in addition to using parameter estimates based on the original data also adds in a random error term which means we get different missing values each time we perform the estimation to account for sampling variability.

To account for sampling variability Howell points out that multiple imputations are required. In practice this means that multiple 'filled in' data sets (typically 3 to 5 data sets) should be analysed to assess the consistency of the results across missing value estimates. Howell doesn't, unfortunately, suggest other than for a multiple regression analysis, how these results should be combined and, for this reason, prefers using random effects models for missing values in analysis of variance. He does notice in his example that the F tests on each of three imputed data sets from a repeated measures analysis of variance are very similar. One might consider an average F ratio, for example, averaged across F ratios obtained for each imputed data set.

Random effect models, unlike the standard 'fixed effects' analysis of variance, use all cases irrespective of whether they contain missing values and therefore have a unique solution. These are available for use in most statistical packages such as SPSS (MIXED), SAS (MIXED) and R (LME). They are particularly useful for analysis of variance where it is wished to generalise results from the factors considered.

In the unusual situation where missingness is due to an impossibility of an event occurring e.g. asking a person about their sibling's occupation when they have no siblings or are not 'in touch' with them then a more dummy adjustment procedure (Cohen and Cohen, 2003) may suffice (Allison, 2002). This procedure simply uses the variable mean to fill in the missing value but then includes a variable as a covariate in the analysis taking a value of 0 except where a missing value occurs where it takes a value of '1'.

References

Allison P (2002) Monograph on Missing Data (Sage paper # 136) .

Cohen, J and Cohen, P (2003) Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Lawrence Erlbaum:London.

Howell, D.C. (2008) The analysis of missing data. In Outhwaite, W. & Turner, S. Handbook of Social Science Methodology. London: Sage.

Schafer and Olson (1998)Multiple imputation for multivariate missing-data problems: A data analyst’s perspective, Multivariate Behavioral Research, 33545–571.

-  ⇤ ← Revision 5 as of 2010-05-18 14:39:24 → 
  Size: 2303
  Editor: PeterWatson
  Comment:
+   ← Revision 7 as of 2010-05-18 15:13:11 → ⇥
  Size: 3719
  Editor: PeterWatson
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 5:
-Howell gives a comprehensive overview and illustration of all these techniques 
[http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/Missing.html here.]
+Howell gives a comprehensive and accessible overview and illustration of all these techniques 
[http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/Missing.html here.] This is well worth a read for getting a feel for the issues involved and how they can be addressed!
 Line 8:
-Howell, in particular, suggests that a better way to estimate missing values on a variable is by using a more complex approach than variable means, namely the EM algorithm. This can be used under analyse>missing value analysis from version 13 of SPSS or using PROC MIANALYSE in SAS or stand-alone freeware which can be downloaded from 
[http://www.stat.psu.edu/~jls/misoftwa.html here.] The EM algorithm produces a 'filled in' (or imputed) data set with values estimated using the original data replacing the original missing values. The analysis can then be carried out using this filled-in data set.
+Howell, in particular, suggests that a better way to estimate missing values on a variable is by using a more complex approach than variable means, namely the EM algorithm. This can be used under analyse>missing value analysis from version 13 of SPSS or using PROC MI and PROC MIANALYSE in SAS or stand-alone freeware (NORM) which can be downloaded from 
[http://www.stat.psu.edu/~jls/misoftwa.html here.] The EM algorithm produces a 'filled in' (or imputed) data set with values estimated using the original data replacing the original missing values. The analysis can then be carried out using this filled-in data set. Note each missing data estimate in addition to using parameter estimates based on the original data also adds in a random error term which means we get different missing values each time we perform the estimation to account for sampling variability.
 Line 11:
-Howell does point out that there is no 'unique' estimate of missing values and that multiple imputations are required. In practice this means that multiple 'filled-in' data sets (typically 3 to 5 data sets) should be analysed to assess the consistency of the results across missing value estimates. Howell doesn't, unfortunately, suggest in most cases, notably for analysis of variance, how these results should be combined and, for this reason, prefers using random effects models for missing values in analysis of variance. He does notice in his example that the F tests on each of three imputed data sets from a repeated measures analysis of variance are very similar. One might consider an average f ratio, for example, averaged across F ratios obtained for each imputed data set.
+To account for sampling variability Howell points out that multiple imputations are required. In practice this means that multiple 'filled in' data sets (typically 3 to 5 data sets) should be analysed to assess the consistency of the results across missing value estimates. Howell doesn't, unfortunately, suggest other than for a multiple regression analysis, how these results should be combined and, for this reason, prefers using random effects models for missing values in analysis of variance. He does notice in his example that the F tests on each of three imputed data sets from a repeated measures analysis of variance are very similar. One might consider an average F ratio, for example, averaged across F ratios obtained for each imputed data set.
 Line 14:
+In the unusual situation where missingness is due to an impossibility of an event occurring e.g. asking a person about their sibling's occupation when they have no siblings or are not 'in touch' with them then a more dummy adjustment procedure (Cohen and Cohen, 2003) may suffice (Allison, 2002). This procedure simply uses the variable mean to fill in the missing value but then includes a variable as a covariate in the analysis taking a value of 0 except where a missing value occurs where it takes a value of '1'. 

__References__

Allison P (2002) Monograph on Missing Data (Sage paper # 136) .

Cohen, J and Cohen, P (2003) Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Lawrence Erlbaum:London.

Howell, D.C. (2008) The analysis of missing data. In Outhwaite, W. & Turner, S.
Handbook of Social Science Methodology. London: Sage.

Schafer and Olson (1998)Multiple imputation for multivariate missing-data
problems: A data analyst’s perspective, ''Multivariate Behavioral Research'', '''33'''545–571.

MRC CBU Wiki

Quick Links

Search Wiki

Page Tools

The EM algorithm and mixed (random effects) model approaches to missing values