FAQ/roc - CBU statistics Wiki

Plotting ROC curves

The Receiver Operating Characteristic (ROC) curve is a graph which illustrates just how well a set of predictor variables, measured on various cases, predict the group to which that case belongs.

In the example data below is collected to assess how well a cases's test score and sex can predict if the case is a control or a patient (the group).

Group	Score	Sex
1	12	1
1	15	2
1	23	1
1	16	2
1	10	2
0	24	1
0	34	1
0	21	1
0	25	2
0	9	2

Binary logistic regression can be used to produce estimates of group membership based on test score and sex and compared to the observed "true" group using a classification table. (Correct and incorrect classification probabilities may be obtained from this table. Two of these diagnostics may then be plotted by a ROC curve (available in the graph menu) using the predicted group membership probabilities using score and sex as predictors which can be outputted from the logistic regression procedure. The syntax in the box below does the ROC analysis. The area under the ROC curve is also used as a discrimination diagnostic. The area under the curve ranges from 0.50 to 1.00. The nearer to 1 the better the discrimination. There are rules of thumb based on deciles. These are reproduced in the table.

Area		Point system
0.50-0.60		Fail
0.60-0.70		Poor
0.70-0.80		Fair
0.80-0.90		Good
0.90-1.00		Excellent

Hosmer and Lemeshow (2000) suggest areas under the ROC curve of 0.70 to 0.80 are 'acceptable', 0.80 to 0.90 'excellent' and 0.9 or above 'outstanding'. They point out an area under the ROC curve of 0.50 suggests no discrimination between the outcome groups as this corresponds to chance e.g. simply tossing a coin to decide group membership.

There is no ROC analysis for more than two groups but an assessment of fit could be carried out by obtaining a classification table or predicted versus observed groups from a multinomial or ordinal logistic regression procedure.

LOGISTIC REGRESSION VAR=group
  /METHOD=ENTER score sex
  /SAVE PRED (pred)
  /CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

ROC pred by group(1)
/MISSING = EXCLUDE
/PLOT = CURVE
/PRINT = SE COORDINATES.

The bigger the area under the ROC curve the better the prediction. This may also be done by inputting specificities and sensitivites into a SPSS macro. The area under the ROC curve is also equal to the nonparametric nonoverlap of all pairs (NAP) criterion (Parker and Vannest (2009)). The above code also generates a standard error for each area under the curve which can be used to perform a z-test to compare two areas under ROC curves using z = (difference in areas) / sqrt(se1² + se2²) where sei is the standard error for the i-th ROC area as used for example by the calculatorMedCalc.

Parker and Vannest also illustrate with a simple worked example that the area under the ROC curve may be equivalently derived from the output from a Mann-Whitney test.

Reference

Hosmer DW and Lemeshow SL (2000). Applied Logistic Regression. 2nd Edition. Wiley:New York. In CBSU library. A third edition is due to be published in 2013.

Parker RI and Vannest KJ (2009). An improved effect size for single case research: Non-overlap of all pairs (NAP). Behavior Therapy 40(4) 357-367.

MRC CBU Wiki

Quick Links

Search Wiki

Page Tools

Plotting ROC curves