2413
Comment:
|
2414
|
Deletions are marked like this. | Additions are marked like this. |
Line 21: | Line 21: |
[attachment:bdkap.pdf Bornmann and Daniel (2009)] use bootstraps (see example r code [:FAQ/BPkapCIR here]) to estimate the standard errors of both Vohen's and Brennan and Prediger kappas and find very close agreement between the two. | [attachment:bdkap.pdf Bornmann and Daniel (2009)] use bootstraps (see example r code [:FAQ/BPkapCIR: here]) to estimate the standard errors of both Vohen's and Brennan and Prediger kappas and find very close agreement between the two. |
Kappa statistic evaluation in SPSS
SPSS syntax available:
- [:FAQ/kappa/kappans:Non-square tables where one rater does not give all possible ratings]
- [:FAQ/kappa/multiple:More than 2 raters]
- [:FAQ/ad:An inter-rater measure based on Euclidean distances]
Note: Reliability as defined by correlation coefficients (such as Kappa) requires variation in the scores to achieve a determinate result. If you have a program which produces a determinate result when the scores of one of the coders is constant, the bug is in that program, not in SPSS. Each rater must give at least two ratings.
- [:FAQ/kappa/magnitude:Benchmarks for suggesting what makes a high kappa]
There is also a weighted kappa which allows different weights to be attached to misclassifications. Warrens (2011) shows that weighted kappa is an example of a more general test of randomness. This [attachment:kappa.pdf paper] by Von Eye and Von Eye (2005) gives a comprehensive insight into kappa and variants of it. These include a variant by Brennan and Prediger (1981) (computed using either this [http://justusrandolph.net/kappa/ on-line calculator], which also computes Cohen's kappa, or this [attachment:bpkappa.xls spreadsheet]) which enables kappa to attain the maximum value of '1' comparing to a uniform distribution when the number of category ratings is not fixed. Von Eye and Von Eye's paper suggests, however, that this measure can give a misleadingly high value if the raters give different numbers of category ratings.
[attachment:bdkap.pdf Bornmann and Daniel (2009)] use bootstraps (see example r code [:FAQ/BPkapCIR: here]) to estimate the standard errors of both Vohen's and Brennan and Prediger kappas and find very close agreement between the two.
References
Bornmann, L and Daniel, H-D (2009). The luck of the referee draw:the effect of exchanging reviews. Learned Publishing 22 117–125
Brennan RL, & Prediger DJ (1981). Coefficient kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement 41 687–699.
von Eye A & von Eye M (2005). Can One Use Cohen's Kappa to Examine Disagreement? Methodology 1(4) 129–142.
Warrens MJ (2011). Chance-corrected measures for 2 × 2 tables that coincide with weighted kappa. British Journal of Mathematical and Statistical Psychology 64(2) 355–365.