FAQ/NDcasestat - CBU statistics Wiki

Revision 15 as of 2009-01-30 11:25:25

Clear message
location: FAQ / NDcasestat

Which output criteria should I use when using the casewise statistics option with the Normal discriminant method in SPSS?

The discriminant procedure, which is optimal when the groups follow a Normal distribution, uses the Mahalanobis distance (d) as a summary measure of the difference both between the groups and within each group. The Mahalanobis distance can also be [:FAQ/mahal: used] as a means of identifying multivariate outliers.

There are two conditional probabilities outputted in tables using the casewise statistics option in SPSS. These are P(G=g conditional on D=d) and P(D>d conditional on G=g) for the predicted group with the former also outputted for the other groups. Here g represents the group of interest and d represents the outputted Mahalanobis distance from a group centre for a particular case.

In particular P(G=g conditional on D=d) is the posterior probability of a case falling in the predicted group (for which this probability is a maximum) based on that case's Mahalanobis distance. The Mahalanobis distance for a particular case represents how typical that case is with respect to other cases in the group, g. In particular it measures the standardised distance of a case from the centre of the group.

For a particular group, g, the P(G=g conditional on D=d) equals in SPSS

$$ \frac{\mbox{P(g) exp}(\mbox{-0.5 d(g)}text{2})}{\sum_text{groups} \mbox{P(group) } \mbox{exp}(\mbox{-0.5 d(group)}text{2})} $$

where d(g) is the Mahalanobis distance for the g-th group, P(g) is the proportion of cases who are in group g and exp() is the exponential function.

The posterior group probability, P(G=g given D=d), of being in a particular group given the case features is usually of interest in a clinical setting where the focus is on identifying cases who fall into risk groups from a set of measures, perhaps, partly obtained from a questionnaire e.g. dementia, cancer. The magnitude of the predicted group posterior probability is an indication of sharpness with a probability near 1 indicating a clear prediction and one nearer 1/ng indicating no clear prediction, where ng is the total number of groups.

Further, since the Mahalanobis distance based on p predictors follows a chi-square distribution on p degrees of freedom if it is equal to zero, it follows the group typicality probability of observing a case from a particular group having a greater mahalanobis distance than the observed case is

P (D>d conditional on G=g) = 1 - $$\chi^text{2}(d(g),p)$$.

The above probability is only outputted for the predicted group in SPSS, although it can be requested for other groups. It is not as commonly quoted, in clinical studies at least, as the posterior group probability.

The interpretation of both of the above probabilities stems from the intuitive result that the further away a case is from a particular group centre (ie the more atypical it is with respect to a group) the less likely it is to be predicted to be in that group.

(Details of these probabilities are also given in SPSS manual 7.5 which is available in the CBU stack area)