== Kappa statistic evaluation in SPSS ==

Kappa and Phi are recommended as inter-rater agreement indices with Grant MJ, Button CM and Snooker B (2017) recommending their use for binary data (see [[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5978587/ | here).]]

SPSS syntax available:

 * [[FAQ/kappa/kappans|Non-square tables where one rater does not give all possible ratings]]

 * [[FAQ/kappa/multiple|More than 2 raters]]

 * [[FAQ/ad|An inter-rater measure based on Euclidean distances]]


'''Note:''' Reliability as defined by correlation coefficients (such as Kappa)
requires variation in the scores to achieve a determinate result. If you
have a program which produces a determinate result when the scores of one
of the coders is constant, the bug is in that program, not in SPSS. Each rater must give at least two ratings.

 * [[FAQ/kappa/magnitude|Benchmarks for suggesting what makes a high kappa]]

There is also a weighted kappa which allows different weights to be attached to misclassifications. Warrens (2011) shows that weighted kappa is an example of a more general test of randomness. This [[attachment:kappa.pdf|paper]] by Von Eye and Von Eye (2005) gives a comprehensive insight into kappa and variants of it. These include a variant by Brennan and Prediger (1981) (computed using either this [[http://justusrandolph.net/kappa/|on-line calculator]], which also computes Cohen's kappa, or this [[attachment:bpkappa.xls|spreadsheet]]) which enables kappa to attain the maximum value of '1' comparing to a uniform distribution when the number of category ratings is not fixed. Von Eye and Von Eye's paper suggests, however, that this measure can give a misleadingly high value if the raters give different numbers of category ratings.  

[[attachment:bdkap.pdf|Bornmann and Daniel (2009)]] use bootstraps (see example R code [[FAQ/BPkapCIR| here]]) to estimate the standard errors of both Cohen's and Brennan and Prediger kappas and find very close agreement between the two. On-line calculators for kappa are also [[http://www.medcalc.org/manual/kappa.php | available.]]

This [[attachment:kappayork.pdf | PDF document]] gives an overview of kappa which was downloaded from a website page hosted by the University of York located [[http://www-users.york.ac.uk/~mb55/msc/clinimet/week4/kappash2.pdf | here]]. Kappa can also be computed using the wkappa function in R.

__References__

Bornmann, L and Daniel, H-D (2009). The luck of the referee draw:the effect of exchanging reviews. ''Learned Publishing'' '''22''' 117–125

Brennan RL, & Prediger DJ (1981). Coefficient kappa: Some uses, misuses, and alternatives. ''Educational and Psychological Measurement'' '''41''' 687–699.

Grant MJ, Button CM and Snooker B (2017). An Evaluation of Interrater Reliability Measures on Binary Tasks Using d-Prime. ''Applied Psychological Measurement'' '''41(4)''' 264–276. 

von Eye A & von Eye M (2005). Can One Use Cohen's Kappa to Examine Disagreement? ''Methodology'' '''1(4)''' 129–142. 

Warrens MJ (2011). Chance-corrected measures for 2 × 2 tables that coincide with weighted kappa. ''British Journal of Mathematical and Statistical Psychology'' '''64(2)''' 355–365.