Differences between revisions 8 and 9

Synopsis of the Graduate Statistics Course 2007

The Anatomy of Statistics: Models, Hypotheses, Significance and Power
- Experiments, Data, Models and Parameters
- Probability vs. Statistics
- Hypotheses and Inference
- The Likelihood Function
- Estimation and Inferences
- Maximum Likelihood Estimate (MLE)
- Schools of Statistical Inference
  - Ronald Aylmer FISHER
  - Jergy NEYMAN and Egon PEARSON
  - Rev. Thomas BAYES
- R A Fisher: P values and Significance Tests
- Neyman and Pearson: Hypothesis Tests
- Type I & Type II Errors
- Size and Power
Exploratory Data Analysis (EDA)
- What is it?
- Skew and kurtosis: definitions and magnitude rules of thumb
- Pictorial representations - in particular histograms, boxplots and stem and leaf displays
- Effect of outliers
- Power transformations
- Rank transformations
Categorical Data Analysis
- The Naming of Parts
- Categorical Data
- Frequency Tables
- The Chi-Squared Goodness-of-Fit Test
- The Chi-squared Distribution
- The Binomial Test
- The Chi-squared test for association
- Simpson, Cohen and McNemar
- SPSS procedures that help
  - Frequencies
  - Crosstabs
  - Chi-square
  - Binomial
- Types of Data
  - Quantitative
  - Qualitative
  - Nominal
  - Ordinal
- Frequency Table
- Bar chart
- Cross-classification or Contingency Table
- Simple use of SPSS Crosstabs
- Goodness of Fit Chi-squared Test
- Chance performance and the Binomial Test
- Confidence Intervals for Binomial Proportions
- Pearson’s Chi-squared
- Yates’ Continuity Correction
- Fisher’s Exact Test
- Odds and Odds Ratios
- Log Odds and Log Odds ratios
- Sensitivity and Specificity
- Signal Detection Theory
- Simpson’s Paradox
- Measures of agreement: Cohen's Kappa
- Measures of change: McNemar’s Test
- Association or Independence: Chi-squared test of association
- Comparing two or more classified samples
Regression
- What is it?
- Expressing correlations (simple regression) in vector form
- Scatterplots
- Assumptions in regression
- Restriction of range of a correlation
- Comparing pairs of correlations
- Multiple regression
- Least squares
- Residual plots
- Stepwise methods
- Synergy
- Collinearity
Between subjects analysis of variance
- What is it used for?
- Main effects
- Interactions
- Simple effects
- Plotting effects
- Implementation in SPSS
- Effect size
- Model specification
- Latin squares
- Balance
- Venn diagram depiction of sources of variation
The General Linear Model and complex designs including Analysis of Covariance
- GLM and Simple Linear Regression
- The Design Matrix
- Least Squares
- ANOVA and GLM
- Types of Sums of Squares
- Multiple Regression as GLM
- Multiple Regression as a sequence of GLMs in SPSS
- The two Groups t-test as a GLM
- One-way ANOVA as GLM
- Multi-factor Model
  - Additive (no interaction)
  - Non-additive (interaction)
- Analysis of Covariance
  - Simple regression
    - 1 intercept
    - 1 slope
  - Parallel regressions
    - multiple intercepts
    - 1 slope
  - Non-parallel regressions
    - multiple intercepts
    - multiple slopes
- Sequences of GLMs in ANCOVA
Power analysis
- Hypothesis testing
- Boosting power
- Effect sizes: definitions, magnitudes
- Power evaluation methods:description and implementation using an examples
  - nomogram
  - power calculators
  - SPSS macros
  - spreadsheets
  - power curves
  - tables
  - quick formula
Repeated Measures and Mixed Model ANOVA
- Two sample t-Test vs. Paired t-Test
- Repeated Measures as an extension of paired measures
- Single factor Within-Subject design
- Sphericity
- Two (or more) factors Within-Subject design
- Mixed designs combining Within- and Between-Subject factors
- Mixed Models, e.g. both Subjects & Items as Random Effects factors
- The ‘Language as Fixed Effects’ Controversy
- Testing for Normality
- Single degree of freedom approach
Latent variable modelling – factor analysis and all that!
- Path diagrams – a regression example
- Comparing correlations
- Exploratory factor analysis
- Assumptions of factor analysis
- Reliability testing (Cronbach’s alpha)
- Fit criteria in exploratory factor analysis
- Rotations
- Interpreting factor loadings
- Confirmatory factor models
- Fit criteria in confirmatory factor analysis
- Equivalence of correlated and uncorrelated models
- Cross validation as a means of assessing fit for different models
- Parsimony : determining the most important items in a factor analysis
What to do following an ANOVA
- Why do we use follow-up tests?
- Different ways to follow up an ANOVA
- Planned vs. Post Hoc Tests
- Choosing and Coding Contrasts
- Handling Interactions
- Standard Errors of Differences
- Multiple t-tests
- Post Hoc Tests
- Trend Analysis
- Unpacking interactions
- Multiple Comparisons: Watch your Error Rate!
- Post-Hoc vs A Priori Hypotheses
- Comparisons and Contrasts
- Family-wise (FW) error rate
- Experimentwise error rate
- Orthogonal Contrasts or Comparisons
- Planned Comparisons vs. Post Hoc Comparisons
- Orthogonal Contrasts/Comparisons
- Planned Comparisons or Contrasts
- Contrasts in GLM
- Post Hoc Tests
- Control of False Discovery Rate (FDR)
- Simple Main Effects

-  ⇤ ← Revision 8 as of 2007-09-21 16:08:36 → 
  Size: 17755
  Editor: IanNimmoSmith
  Comment:
+   ← Revision 9 as of 2007-09-21 16:16:41 → ⇥
  Size: 5749
  Editor: IanNimmoSmith
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 168:
-. '''Post-hoc tests, multiple comparisons, contrasts and handling interactions'''
+. '''What to do following an ANOVA'''
 Line 170:
-: Post-hoc tests, Multiple comparisons, Contrasts and handling Interactions
What to do following an ANOVA
Ian Nimmo-Smith

http://imaging.mrc-cbu.cam.ac.uk/statswiki/StatsCourse2006
Aims and Objectives
Why do we use follow-up tests?
Different ways to follow up an ANOVA
Planned vs. Post Hoc Tests
Contrasts and Comparisons
Choosing and Coding Contrasts
Handling Interactions
Example: Priming experiment
Between subjects design
Priming factor
Number correct
Example: Priming experiment (II)
ANOVA: F(3,36)=6.062
P=0.002**
So what?
Why Use Follow-Up Tests?
The F-ratio tells us only that the experiment had a positive outcome
i.e. group means were different.
It does not tell us specifically which group means differ from which.
We need additional tests to find out where the group differences lie.
How?
A full toolkit 
A: Standard Errors of Differences
B: Multiple t-tests
C: Orthogonal Contrasts/Comparisons
D: Post Hoc Tests
E: Trend Analysis
F: Unpacking interactions
A: Standard Errors as Yardsticks
A: Standard Errors from SPSS
A: Plotting Standard Errors
A: Plotting Standard Errors of the Differences
A: Plotting 95% Confidence Intevals of Differences
B: Multiple t-tests
B: ‘LSD’ option
B: ‘LSD’ option
The problem with doing several null-hypothesis tests
Each test is watching out for a rare event with prevalence a [Type I Error Rate].
The more test you do the more likely you are to observe a rare event.
If there are N tests of Size a then the expected number of Type I Errors is N.a
With a = 0.05 we can expect 5/100 test will reject their Null Hypotheses ‘by chance’.   
This phenomenon is known as Error Rate Inflation.
Multiple Comparisons: Watch your Error Rate!
Post-Hoc vs A Priori Hypotheses
Comparison
between a pair of conditions/means
Contrast
between two or more conditions/means
Type I Error Rates
Per Comparison (PC) error rate ()
probability of making a Type I Error on a single Comparison
Family-wise (FW) error rate
probability of making at least one Type I error in a family (or set) of comparisons (also known as Experimentwise error rate)
PC  FW  c or 1-(1-)c
where c is the number of comparisons
Problem of Multiple Comparisons
A numerical example of Error Rate inflation
Suppose C independent significant tests with size .
And suppose all the null hypotheses are true.
The probability (*) of at least one significant result (Type I error) is bigger.
*=1-(1- )C
=0.05
C=6 (Say, comparable to all contrasts between 4 conditions)
*=0.26 !!!
So the Familywise Error Rate is 26%, though each individual test has Error Rate 5%.
What is to be done?
Various Approaches
Orthogonal Contrasts or Comparisons
Planned Comparisons vs. Post Hoc Comparisons
Orthogonal Contrasts/Comparisons
Hypothesis driven
Planned a priori
Usually accepted that nominal significance can be followed (i.e. no need for adjustment)
Rationale: we are really interested in each comparison/contrast on its own merits. We have no wish to make pronoucements at the Familywise level.
A Priori (= Planned)
Typically there is a rationale which identifies a small number of (sub)-hypotheses which led to the formulation and design of the experiment.
These correspond to Planned or A Priori comparisons
So long as there is no overlap (non-orthogonality) between the comparisons
Post Hoc Tests
Not Planned (no hypothesis) 
also known as a posteriori tests
E.g. Compare all pairs of means
Planned Comparisons or Contrasts
Basic Idea: 
The variability explained by the Model is due to subjects being assigned to different groups. 
This variability can be broken down further to test specific hypotheses about ways in which groups might differ. 
We break down the variance according to hypotheses made a priori (before the experiment).
Rules When Choosing Contrasts
Independent: contrasts must not interfere with each other (they must test unique hypotheses).
Simplest approach compares one chunk of groups with another chunk
At most K-1: You should always end up with one less contrast than the number of groups.
How to choose Contrasts?
In many experiments we have one or more control groups.
The logic of control groups dictates that we expect them to be different from some of the groups that we’ve manipulated.
The first contrast will often be to compare any control groups (chunk 1) with any experimental conditions (chunk 2).
Contrast 1
Between subject experiment
One-way ANOVA
Control vs. Experimental
Control - (Semantic+Lexical+Phobic)/3
(1,-1/3,-1/3,-1/3) or (3,-1,-1,-1)

Contrasts 2 and 3
Phobic versus Non-phobic priming
(Semantic+Lexical)/2 - Phobic
(0,1,1,-2)
Semantic vs Lexical
Semantic - Phobic
(0,1,-1,0)
One-way ANOVA contrasts
GLM Univariate syntax
Output from contrast analysis
Rules for Coding Planned Contrasts
Rule 1
Groups coded with positive weights will be compared to groups coded with negative weights.
Rule 2
The sum of weights for a contrast must be zero.
Rule 3
If a group is not involved in a contrast, assign it a weight of zero.
More Rules …
Rule 4
For a given contrast, the weights assigned to the group(s) in one chunk of variation should be equal to the number of groups in the opposite chunk of variation.
Rule 5
If a group is singled out in a comparison, then that group should not be used in any subsequent contrasts.
Partitioning the Variance
Contrasts in GLM (I)
Off-the-shelf only via the menus
but can use ‘Special’ via syntax
‘Deviation’
compare each level with average of preceding
(1,-1,0,0)
(1,1,-2,0)
(1,1,1,-3)
Contrasts in GLM (II)
‘Simple’
compare each level with either the first or the last
(1,-1,0,0)		(1,0,0,-1)
(1,0,-1,0)	or	(0,1,0,-1)
(1,0,0,-1)   		(0,0,1,-1)
Contrasts in GLM (III)
‘Helmert’ and ‘Repeated’
compare each level with mean of previous or subsequent levels
(1,-1,0, 0)		(3,-1,-1,-1)
(2,-1,-1,0)	or	(0,2,-1,-1)
(3,-1,-1,-1)   		(0,0,1,-1)
Contrasts in GLM (IV)
‘Polynomial’
Divide up effects into Linear, Quadratic, Cubic … contrasts
Appropriate when considering a Trend Analysis over time or some other covariate factor
Non-orthogonal a priori contrasts
To correct or not to correct?
Typically, if the number of planned comparisons is small (up to K-1), Bonferroni type corrections are not applied.
Post Hoc Tests (I)
Compare each mean against all others.
In general terms they use a stricter criterion to accept an effect as significant.
Hence, control the Familywise error rate.
Simplest example is the Bonferroni method: divide the desired Familywise Error Rate a by the number of comparisons c and use a/c as the Bonferroni Corrected Per Comparison Error Rate a*. 
With 2 means, a=0.05, c=1 then a*=0.050
With 3 means, a=0.05, c=3 then a*=0.017
With 4 mean, a=0.05, c=6 then a*=0.008
Post Hoc Tests (II)
What to include?
All experimental treatments vs. a control treatment
All possible pairwise comparisons
All possible contrasts
These all need different handling.
Simple t
Fisher's Least Significant Difference: LSD
unprotected
Fisher's protected t
Only proceed if the omnibus F test is significant.
Controls familywise error rate, but may miss out needles in the haystack.
The significance of the overall F
An overall significant F test is not a pre-requisite for doing planned comparisons.
There still remains a considerable amount of over earlier statements to the contrary.
Not least in the minds of some reviewers
Bonferroni formulae
Seeking -familywise of 
Bonferroni t or Dunn's test 
Set -per-comparison = /c
Bonferroni correction
conservative
Dunn-Sidak
Set -per-comparison = 1-(1-)1/c
improved (exact, less conservative) Bonferroni correction which usually makes very little difference
Multi-stage Bonferroni procedures (I)
For controlling family-wise error rate with a set of comparisons less than all pairwise comparisons.
Partitions target family-wise  in different ways amongst the comparisons.
Multi-stage Bonferroni procedures (II)
Holm procedure
Larzelere and Mulaik procedure
Can be applied to subsets of correlation from a correlation matrix.
Limitations of Bonferroni
These procedures are based on ‘worst case’ assumption that all the tests are independent.
Beware SPSS’s ‘Bonferroni adjusted’ P value
http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/SpssBonferroni
The problem: SPSS has sought to preserve the ‘Look at the P Value to find out if the Test indicates the Null Hypothesis should be rejected’
SPSS quotes artificial ‘Bonferroni Adjusted P Values’ rather than advising of the appropriate Bonferroni Corrected a
Can end up with oddities like P=1!!!!
(if c>1 then SPSS Bonferroni Adjusted P is set to 1)
What to do about SPSS and Bonferroni t
Avoid!
Either
use LSD and work out the Bonferroni corrected a yourself
Or
use Sidak adjusted P’s




Studentized Range Statistic
Based on dependence of the ‘range’ of a set of means on the number of means that are being considered.
Larger number of means correlates with bigger range
Tables of Q(r, k, df)
r = number of means current range
k = number of means overall
df = degrees of freedom in mean square error
Newman-Keuls Procedure
Critical difference is a function of r and k (and the degrees of freedom)
Strong tradition but some recent controversy.
A 'layered' test which adjusts the critical distance as a function of the number of means in the range be considered at each stage.
The family-wise error rate may not be actually be held at 
Constant critical distance procedures
Tukey's Test: HSD
Honestly Significant Difference
Like Newman-Keuls, only use the largest (outmost) critical distance throughout.
Ryan's Procedure: REGWQ
Ryan: r= k /r  
Einot and Gabriel: r=1-(1- )r/k
Scheffé's test
All possible comparisons
Most conservative
Post Hoc Tests: Options in SPSS
SPSS has 18 types of Post Hoc Test!
Post Hoc Tests: Recommendations
Field (2000):
Assumptions met: REGWQ or Tukey HSD.
Safe Option: Bonferroni (but note the problem with SPSS’s approach to adjusted P values.
Unequal Sample Sizes: Gabriel’s (small), Hochberg’s GT2 (large).
Unequal Variances: Games-Howell. 

Control of False Discovery Rate (FDR)
Recent alternative to controlling FW error rate
FDR is the expected proportion of false rejections among the rejected null hypotheses
If we have rejected a set of R null hypotheses, and V of these are wrongly rejected, then the FDR= V/R (FDR=0 if R=0).
We will know R but don’t know V.
False Discovery Rate (FDR)
An alternative approach to the trade-off between Type I and Type II errors
Logic
a type I error becomes less serious the more the number of genuine effects there are in the family of tests
Now being used in imaging and ERP studies
FDR Example
Suppose we do 10 t-tests and observe their P values:
0.021, 0.001, 0.017, 0.041, 0.005, 0.036, 0.042, 0.023, 0.07, 0.1 
Sort P values in ascending order
0.001, 0.005, 0.017, 0.021, 0.023, 0.036, 0.041, 0.042, 0.07, 0.1
Compare with 10 the prototypical P-values scaled to a namely (1/10)a, (2/10)a, ..., (10/10)a, 
 0.005 0.010 0.015 0.020 0.025 0.030 0.035 0.040 0.045 0.050
 Get the differences:
 -0.004 -0.005  0.002  0.001 -0.002  0.006  0.006  0.002  0.025  0.050
The largest observed P-value which is smaller than its corresponding prototype is 0.23
0.021, 0.001, 0.017, 0.041, 0.005, 0.036, 0.042, 0.023, 0.07, 0.1
The five underlined tests for which P<0.023 are declared significant with FDR a=0.05.
Unpacking Interactions
Components of Interaction
doing sub-ANOVA or contrasts
E.g. if factors A and B interact, then look at 2 by 2 ANOVA for a pair of levels of A combined with a pair of levels of B
Simple Main Effects
by doing sub-ANOVAs followed up by multiple comparisons
Cannot use reverse argument to claim presence of interaction
Can use EMMEANS/COMPARE in SPSS
Within Subject Factors? (I)
Problems of ‘sphericity’ re-emerge
Need for hand-calculations as SPSS has an attitude problem
Ask Ian about the ‘mc’ program and other statistical aids in the CBU statistical software collection.
Work is in progress to get these updated for the new statistics wiki.
Within Subject Factors? (II)
Calculate new variables from within-subject contrasts and analyse them separately
Extends the idea of doing lots of paired t-tests
In SPSS can be done by MMATRIX option in GLM using Syntax
GLM and MMATRIX
GLM and MMATRIX
Thank you - That’s All Folks!
Peter Watson





Andy Field:
http://www.cogs.sussex.ac.uk/users/andyf/teaching/statistics.htm
+  * Why do we use follow-up tests?
  * Different ways to follow up an ANOVA
  * Planned vs. Post Hoc Tests
  * Choosing and Coding Contrasts
  * Handling Interactions
  * Standard Errors of Differences
  * Multiple t-tests
  * Post Hoc Tests
  * Trend Analysis
  * Unpacking interactions
  * Multiple Comparisons: Watch your Error Rate!
  * Post-Hoc vs A Priori Hypotheses
  * Comparisons and Contrasts
  * Family-wise (FW) error rate
  * Experimentwise error rate
  * Orthogonal Contrasts or Comparisons
  * Planned Comparisons vs. Post Hoc Comparisons
  * Orthogonal Contrasts/Comparisons
  * Planned Comparisons or Contrasts
  * Contrasts in GLM
  * Post Hoc Tests
  * Control of False Discovery Rate (FDR)
  * Simple Main Effects

MRC CBU Wiki

Quick Links

Search Wiki

Page Tools

Synopsis of the Graduate Statistics Course 2007