## Kappa statistic evaluation in SPSS

Kappa and Phi are recommended as inter-rater agreement indices with Grant MJ, Button CM and Snooker B (2017) recommending their use for binary data (see here).

SPSS syntax available:

**Note:** Reliability as defined by correlation coefficients (such as Kappa) requires variation in the scores to achieve a determinate result. If you have a program which produces a determinate result when the scores of one of the coders is constant, the bug is in that program, not in SPSS. Each rater must give at least two ratings.

There is also a weighted kappa which allows different weights to be attached to misclassifications. Warrens (2011) shows that weighted kappa is an example of a more general test of randomness. This paper by Von Eye and Von Eye (2005) gives a comprehensive insight into kappa and variants of it. These include a variant by Brennan and Prediger (1981) (computed using either this on-line calculator, which also computes Cohen's kappa, or this spreadsheet) which enables kappa to attain the maximum value of '1' comparing to a uniform distribution when the number of category ratings is not fixed. Von Eye and Von Eye's paper suggests, however, that this measure can give a misleadingly high value if the raters give different numbers of category ratings.

Bornmann and Daniel (2009) use bootstraps (see example R code here) to estimate the standard errors of both Cohen's and Brennan and Prediger kappas and find very close agreement between the two. On-line calculators for kappa are also available.

This PDF document gives an overview of kappa which was downloaded from a website page hosted by the University of York located here. Kappa can also be computed using the wkappa function in R.

References

Bornmann, L and Daniel, H-D (2009). The luck of the referee draw:the effect of exchanging reviews. *Learned Publishing* **22** 117–125

Brennan RL, & Prediger DJ (1981). Coefficient kappa: Some uses, misuses, and alternatives. *Educational and Psychological Measurement* **41** 687–699.

Grant MJ, Button CM and Snooker B (2017). An Evaluation of Interrater Reliability Measures on Binary Tasks Using d-Prime. *Applied Psychological Measurement* **41(4)** 264–276.

von Eye A & von Eye M (2005). Can One Use Cohen's Kappa to Examine Disagreement? *Methodology* **1(4)** 129–142.

Warrens MJ (2011). Chance-corrected measures for 2 × 2 tables that coincide with weighted kappa. *British Journal of Mathematical and Statistical Psychology* **64(2)** 355–365.