Diff for "FAQ/kappa" - CBU statistics Wiki

Differences between revisions 1 and 38 (spanning 37 versions)

Kappa statistic evaluation in SPSS

Kappa and Phi are recommended as inter-rater reliability diagnostics including for binary daya (see here).

SPSS syntax available:

Note: Reliability as defined by correlation coefficients (such as Kappa) requires variation in the scores to achieve a determinate result. If you have a program which produces a determinate result when the scores of one of the coders is constant, the bug is in that program, not in SPSS. Each rater must give at least two ratings.

Benchmarks for suggesting what makes a high kappa

There is also a weighted kappa which allows different weights to be attached to misclassifications. Warrens (2011) shows that weighted kappa is an example of a more general test of randomness. This paper by Von Eye and Von Eye (2005) gives a comprehensive insight into kappa and variants of it. These include a variant by Brennan and Prediger (1981) (computed using either this on-line calculator, which also computes Cohen's kappa, or this spreadsheet) which enables kappa to attain the maximum value of '1' comparing to a uniform distribution when the number of category ratings is not fixed. Von Eye and Von Eye's paper suggests, however, that this measure can give a misleadingly high value if the raters give different numbers of category ratings.

Bornmann and Daniel (2009) use bootstraps (see example R code here) to estimate the standard errors of both Cohen's and Brennan and Prediger kappas and find very close agreement between the two. On-line calculators for kappa are also available.

This PDF document gives an overview of kappa which was downloaded from a website page hosted by the University of York located here. Kappa can also be computed using the wkappa function in R.

References

Bornmann, L and Daniel, H-D (2009). The luck of the referee draw:the effect of exchanging reviews. Learned Publishing 22 117–125

Brennan RL, & Prediger DJ (1981). Coefficient kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement 41 687–699.

von Eye A & von Eye M (2005). Can One Use Cohen's Kappa to Examine Disagreement? Methodology 1(4) 129–142.

Warrens MJ (2011). Chance-corrected measures for 2 × 2 tables that coincide with weighted kappa. British Journal of Mathematical and Statistical Psychology 64(2) 355–365.

-  ⇤ ← Revision 1 as of 2006-07-18 13:48:11 → 
  Size: 3009
  Editor: pc0082
  Comment:
+   ← Revision 38 as of 2019-09-24 14:22:05 → ⇥
  Size: 2991
  Editor: PeterWatson
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-Describe FAQ/kappa here.
+== Kappa statistic evaluation in SPSS ==
 Line 3:
-[CUT AND PASTE THE SYNTAX BELOW AND ADJUST DATA INPUT AS REQUIRED]
+Kappa and Phi are recommended as inter-rater reliability diagnostics including for binary daya (see [[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5978587/ | here).]]
 Line 5:
-{{{
* example data input template
*          
*                        Rater 2
*                  Mild Moderate Severe 
*         Mild      5      5       0
*Rater 1  Moderate  3      6       0
*         Severe    1      1       0
+SPSS syntax available:
-Line 14:
+Line 7:
-set format f10.5.
data list free
/ r1 r2 freq.
+ * [[FAQ/kappa/kappans|Non-square tables where one rater does not give all possible ratings]]
-Line 18:
+Line 9:
-begin data
1 1 5
2 1 3
3 1 1
1 2 5
2 2 6
3 2 1
end data.
+ * [[FAQ/kappa/multiple|More than 2 raters]]
-Line 27:
+Line 11:
-*
* Syntax used for rectangular tables to compute kappa.
* (David Nichols, ASSESS Newsletter 1996)
* (recommended on P.104 of SPSS Reference manual, 1990):   
*
*
* Program uses Cohen's Kappa for agreement between a pair of raters
* for a two way rectangular table of ratings (ie at least 2 ratings given by both raters)
*
*
* Gives kappa and the asymptotic standard error of Everitt(1996)
* P.292 Making Sense of Statistics in Psychology
*
*
preserve.
set printback=off mprint=off.
save outfile='kap0.sav'.
define kapparec (vars=!tokens(2) /num=!tokens(1) ).
count ms__=!vars !num (missing).
select if ms__=0.
matrix.
get x /var=!vars.
get ff /var=!num.
compute c=mmax(x).
compute y=make(c,2,0).
compute w=make(c,1,0).
compute sume=make(c,1,0).
compute ans=make(1,3,0).
loop i=1 to nrow(x).
loop k=1 to c.
do if x(i,1)=k.
compute y(k,1)=y(k,1)+ff(i,1).
end if.
do if x(i,2)=k.
compute y(k,2)=y(k,2)+ff(i,1).
end if.
do if (x(i,1) eq k and x(i,2) eq k).
compute w(k,1)=w(k,1)+ff(i,1).
end if.
end loop.
end loop.
loop k=1 to c.
compute sume(k,1)= y(k,1) * y(k,2)  / csum(ff).
end loop. 
compute kstat= ( csum(w) - csum(sume) ) / (csum(ff) - csum(sume)).
loop k=1 to c.
compute ans(1,1)=(csum(ff)-csum(sume)) / csum(ff).
compute ans(1,1)=ans(1,1)-(y(k,1)+y(k,2))*(csum(ff)-csum(w)) / (csum(ff))**2.
compute ans(1,1)=(w(k,1) / csum(ff))*ans(1,1)*ans(1,1).
compute ans(1,2)=ans(1,2)+ans(1,1).
end loop.
loop k=1 to c.
loop j=1 to c.
loop i=1 to nrow(x).
do if (x(i,1) eq k and x(i,2) eq j and x(i,1) ne x(i,2)).
compute ans(1,3)=ans(1,3)+ff(i,1)/csum(ff)*((y(k,2)/csum(ff))+(y(j,1)/csum(ff)))**(2).
end if.
end loop.
end loop.
end loop.
compute ans(1,3)=ans(1,3)*(1-(csum(w)/csum(ff)))**2.
compute ase=(csum(w)*csum(sume))/(csum(ff)*csum(ff)).
compute ase=ase-2*(csum(sume)/csum(ff))+(csum(w)/csum(ff)).
compute ase=ase**2.
compute ase=ans(1,3)-ase.
compute ase=ans(1,2)+ase.
compute ase=sqrt(ase*(1/(csum(ff)*(1-(csum(sume)/csum(ff)))**4))).
compute z=kstat/ase.
compute sig=1-chicdf(z**2,1).
save {kstat,ase,z,sig} /outfile='ka__tmp3.sav'
     /variables=kstat,ase,z,sig.
end matrix.
get file='ka__tmp3.sav'.
formats all (f11.8).
variable labels kstat 'Kappa' /ase 'ASE' /z 'Z-Value' /sig 'P-Value'.
report format=list automatic align(center)
  /variables=kstat ase z sig
  /title "Estimated Kappa, Asymptotic Standard Error,"
         "and Test of Null Hypothesis of 0 Population Value".
get file='kap0.sav'.
!enddefine.
restore.
+ * [[FAQ/ad|An inter-rater measure based on Euclidean distances]]
-Line 110:
+Line 13:
-kapparec vars=r1 r2 num=freq.
}}}
+'''Note:''' Reliability as defined by correlation coefficients (such as Kappa)
requires variation in the scores to achieve a determinate result. If you
have a program which produces a determinate result when the scores of one
of the coders is constant, the bug is in that program, not in SPSS. Each rater must give at least two ratings.

 * [[FAQ/kappa/magnitude|Benchmarks for suggesting what makes a high kappa]]

There is also a weighted kappa which allows different weights to be attached to misclassifications. Warrens (2011) shows that weighted kappa is an example of a more general test of randomness. This [[attachment:kappa.pdf|paper]] by Von Eye and Von Eye (2005) gives a comprehensive insight into kappa and variants of it. These include a variant by Brennan and Prediger (1981) (computed using either this [[http://justusrandolph.net/kappa/|on-line calculator]], which also computes Cohen's kappa, or this [[attachment:bpkappa.xls|spreadsheet]]) which enables kappa to attain the maximum value of '1' comparing to a uniform distribution when the number of category ratings is not fixed. Von Eye and Von Eye's paper suggests, however, that this measure can give a misleadingly high value if the raters give different numbers of category ratings.  

[[attachment:bdkap.pdf|Bornmann and Daniel (2009)]] use bootstraps (see example R code [[FAQ/BPkapCIR| here]]) to estimate the standard errors of both Cohen's and Brennan and Prediger kappas and find very close agreement between the two. On-line calculators for kappa are also [[http://www.medcalc.org/manual/kappa.php | available.]]

This [[attachment:kappayork.pdf | PDF document]] gives an overview of kappa which was downloaded from a website page hosted by the University of York located [[http://www-users.york.ac.uk/~mb55/msc/clinimet/week4/kappash2.pdf | here]]. Kappa can also be computed using the wkappa function in R.

__References__

Bornmann, L and Daniel, H-D (2009). The luck of the referee draw:the effect of exchanging reviews. ''Learned Publishing'' '''22''' 117–125

Brennan RL, & Prediger DJ (1981). Coefficient kappa: Some uses, misuses, and alternatives. ''Educational and Psychological Measurement'' '''41''' 687–699.

von Eye A & von Eye M (2005). Can One Use Cohen's Kappa to Examine Disagreement? ''Methodology'' '''1(4)''' 129–142. 

Warrens MJ (2011). Chance-corrected measures for 2 × 2 tables that coincide with weighted kappa. ''British Journal of Mathematical and Statistical Psychology'' '''64(2)''' 355–365.

MRC CBU Wiki

Quick Links

Search Wiki

Page Tools

Kappa statistic evaluation in SPSS