== Discriminant analysis thresholds ==

If a single measurement is used for clinical diagnosis we can estimate threshold scores on this measure to suggest the likelihood of an individual being from a patient group. Let us suppose we have pilot data comprising the measurement and a group rating of “control” or “patient”. We can use this data to derive percentile score thresholds for various probabilities of being from patient group and assess its performance. 

To do this we perform a discriminant analysis. There are two discriminant analyses in SPSS: the Normal discriminant (ND), located under classify:discriminant, and binary logistic regression (LR). Both of these relate the measurement score (independent variable) to group (dependent variable). 

The score with probability, p, of being from a patient group may be estimated as
{{{
SCORE FOR P =  (LOG(P/1-P) - CONSTANT)/B 
where constant and B are the terms in the outputted discriminant function. 
}}}
The discriminant function is called the canonical discriminant for ND and presented in a regression equation in LR’s output.

The Normal discriminant method is, however, not good at handling skewed predictor scores and can give misleading thresholds if outliers are present. This is illustrated using the data below which contains an obvious outlier.
       
                     


||||<33% style="TEXT-ALIGN: center"> Score || Control || Patient ||
||||<33% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> 1 ||3  || 0 ||
||||<33% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> 2 || 2  || 0 ||
||||<33% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> 3 ||1  || 1 ||
||||<33% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> 6 ||1  || 1 ||
||||<33% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> 9 ||1  || 2 ||
||||<33% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> 10 ||1  || 3 ||
||||<33% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> 12 ||0  || 4 ||
||||<33% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> 300 ||0  || 1 ||

The above equation and output, from fitting the discriminant functions, gives the following scores associated with the probability of coming from a patient group, p:

||||||||||||||||||||||<10% style="TEXT-ALIGN: center">   ||||||||||||||||||<90% style="TEXT-ALIGN: center">Probability in Patient Group ||
|||||||||||||||||||||| Model || 0.1 || 0.2 || 0.3 || 0.4 || 0.5 || 0.6 || 0.7 || 0.8 || 0.9 ||
|||||||||||||||||||||| ND ||-116.83 ||-66.14 ||-32.46 ||-4.84 || 20.5 || 45.84 || 73.46 || 107.14 || 157.83 ||
|||||||||||||||||||||| LR || 1.55 || 3.36 || 7.42 || 5.56 || 6.47 || 7.37 || 8.36 || 9.57 || 11.38 ||

The Normal discriminant (ND) gives percentile scores which do not reflect the data, caused by the presence of the outlier. The logistic procedure gives much more reasonable results and is more robust to the outliers. Both methods give similar answers if the outlier is removed.