Differences between revisions 7 and 8

Which output criteria should I use when using the casewise statistics option with the Normal discriminant method in SPSS?

The discriminant procedure, which is optimal when the groups follow a Normal distribution, uses the Mahalanobis distance (d) as a summary measure of the difference both between the groups and within each group. The Mahalanobis distance can also be [:FAQ/mahal: used] as a means of identifying multivariate outliers.

There are two conditional probabilities outputted in tables using the casewise statistics option in SPSS. These are P(G=g conditional on D=d) and P(D>d conditional on G=g) for the predicted group with the former also outputted for the other groups. Here g represents the group of interest and d represents the outputted Mahalanobis distance from a group centre for a particular case.

In particular P(G=g conditional on D=d) is the posterior probability of a case falling in the predicted group (for which this probability is a maximum) based on that case's Mahalanobis distance. The Mahalanobis distance for a particular case represents how typical that case is with respect to other cases in the group, g. In particular it measures the standardised distance of a case from the centre of the group.

For a particular group, g, the P(G=g conditional on D=d) equals in SPSS

$$ P(g) \frac{\mbox{exp}(\mbox{-0.5 }d(g)^{text{2})}{\sum_text{groups} \mbox{P(group)} \mbox{exp}(\mbox{-0.5 }d(group)}text{2})} $$

where d(g) is the Mahalanobis distance for the g-th group and P(g) is the proportion of cases who are in group g.

Further, since the Mahalanobis distance based on p predictors follows a chi-square distribution on p degrees of freedom if it is equal to zero, it follows the group typicality probability of observing a case from a particular group having a greater mahalanobis distance than the observed case is

P (D>d conditional on G=g) = 1 - $$\chi^text{2}(d(g),p)$$.

The interpretation of both of the above probabilities stems from the intuitive result that the further away a case is from a particular group centre (ie the more atypical it is with respect to a group) the less likely it is to be predicted to be in that group.

-  ⇤ ← Revision 7 as of 2009-01-30 10:48:21 → 
  Size: 2163
  Editor: PeterWatson
  Comment:
+   ← Revision 8 as of 2009-01-30 10:50:35 → ⇥
  Size: 2239
  Editor: PeterWatson
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 12:
-$$ \frac{\mbox{exp}(\mbox{-0.5 }d(g)^text{2})}{\sum_text{groups} \mbox{exp}(\mbox{-0.5 }d(group)^text{2})} $$
+$$ P(g) \frac{\mbox{exp}(\mbox{-0.5 }d(g)^text{2})}{\sum_text{groups} \mbox{P(group)} \mbox{exp}(\mbox{-0.5 }d(group)^text{2})} $$
 Line 14:
-where d(g) is the Mahalanobis distance for the g-th group.
+where d(g) is the Mahalanobis distance for the g-th group and P(g) is the proportion of cases who are in group g.

MRC CBU Wiki

Quick Links

Search Wiki

Page Tools

Which output criteria should I use when using the casewise statistics option with the Normal discriminant method in SPSS?