P-rep, its definition and problems associated with its use
Killeen (2005) introduced a probability which he interpreted as the probability that the standardized difference between two groups (Cohen's d) is in the the same direction over an infinite number of (replicated) experiments which he termed p-rep. In terms of discriminability Irwin (2009) also proves that p-rep is equivalent to the area under a ROC curve.
The usefulness of p-rep has been subsequently discussed and criticised by various authors. Iverson et al (2008) discuss here extending p-rep to take into account prior beliefs about the hypothesis being tested. There are no less than six papers in issue 2 of volume 15 (2010) of Psychological Methods which discuss the pros and cons of Killeen's paper.
In particular Cummings (2010) criticises p-rep in its current form for relying on the flawed accept-rejection dichotomy associated with hypothesis testing and notes that a leading journal (Psychological Science) no longer requires p-rep to be given. He also notes that p-rep in its present form may be expressed as merely a transformation of the p-value.
Serlin (2010) echoes Senn (2002) by going further in saying that p-rep is flawed because it makes the incorrect assumption that you can provide epistemological support for a theory. In fact he states that philosophically you can only ever disprove a theory by finding a case which disproves it (proof by contradiction). He concludes p-rep does not provide much in the way of statistical or scientific illumination.
Trafimow et al (2010) further show that p-rep is unlikely to be close to the true replication probability unless the population effect magnitude and sample size are uncommonly large.
(Note: Pdf formats of the Psychological Methods papers available free to CBSU users via ScienceDirect)
Cummings G (2010) Replication, p-rep, and confidence intervals: comment prompted by Iverson, Wagenmakers, and Lee (2010); Lecoutre, Lecoutre, and Poitevineau (2010); and Maraun and Gabriel (2010) Psychological Methods, 15, 192-198.
Irwin RJ (2009) Equivalence of the statistics for replicability and area under the ROC curve. British Journal of Mathematical and Statistical Psychology 62 485-487.
Iverson GJ, Wagenmakers EJ and Lee MD (2010) A model-averaging approach to replication: The case of p-rep. Psychological Methods, 15, 172-181.
Killeen PR (2005) An alternative to null hypothesis significance tests. Psychological Science, 16, 345-353.
Senn SN (2002) Letter to the editor. Statistics in Medicine, 21, 2437-2444.
Serlin RC (2010) Regarding p-rep: comment prompted by Iverson, Wagenmakers, and Lee (2010); Lecoutre, Lecoutre, and Poitevineau (2010); and Maraun and Gabriel (2010) Psychological Methods, 15, 203-208.
Trafimow D, MacDonald JA, Rice S and Clason DL (2010) How often is p-rep close to the true replication probability? Psychological Methods, 15(3), 300-307.