Open R

Load all the packages that are needed

Import the data into R&#8217;s workspace with

{{{

n = read.csv("/Users/mcolthea/Desktop/RFolder/Nora/NoraData.csv")

}}}

Now the file called n is the whole raw data file

Check that it looks OK by reading the first 10 rows of the file:

{{{
> head(n, n = 10)
}}}

Looks OK.

Now, we aren&#8217;t going to want to include errors when we analyse RTs. So the first thing we want to do is create a file that only contains the correct responses

{{{
> nc = n[n$accuracy=="correct",]
}}}

This new file nc is the file that just has the correct responses.

What was the overall error rate?

{{{
> (nrow(n)-nrow(nc))/nrow(n)
[1] 0.0519774
}}}

5.2% of responses were errors.

One other thing is typically done before data analysis starts: eliminating any responses where the RT is suspiciously long or suspiciously short, i.e. &#8220;outliers&#8221;. This is often called &#8220;data-trimming&#8221;. Here you are eliminating noise from the data and so making subsequent statistical analyses more sensitive.

You can visualize the outliers in your data by inspecting the quantile-quantile plot for each subject. Here&#8217;s how you do this in R:

First, set things up so that you won&#8217;t have too many plots on each page; to get to the next page, you press Return

{{{
> devAskNewPage(ask=TRUE)
}}}

Now display the quantile-quantile plots

{{{
> qqmath(~RT|Subject.ID, data = nc, layout=c(3,3,4))
}}}

This says there are 3 rows and 3 columns per page, and 4 pages.

Nice clean data. Not many outliers. But there are a few obvious ones.

The best way to look at these is to sort each subject&#8217;s RTs from lowest to highest, and look at these sorted files. They can be sorted in R with this command

{{{
> ncs=nc[order(nc$Subject.ID, nc$RT),]
}}}

which sorts by Subject.ID and within each subject sorts by RT.

Trimming the data is more easily done in Excel, so export this sorted trial.

{{{
>write.csv(ncs,"/Users/mcolthea/Desktop/RFolder/Nora/ncs.csv&#8221;)
}}}

and then open it in Excel. Do the trimming in Excel. Call the trimmed file ncst. Import it to R.

{{{
> ncst = read.csv("/Users/mcolthea/Desktop/RFolder/Nora/ncst.csv")
}}}

What is the proportion of correct responses that have been discarded?

{{{
> (nrow(ncs)-nrow(ncst))/nrow(ncs)
[1] 0.01102503
}}}

1.1% of the data have been trimmed.

Look at the quantile-quantile plots again.

{{{
> devAskNewPage(ask=TRUE)
> qqmath(~RT|Subject.ID, data = ncst, layout=c(3,3,4))
}}}

The obvious outliers are gone. Now data analysis can begin using this trimmed data.

==== Analysing the data ====

What is the mean RT for each condition?

{{{
> tapply(ncst$RT, ncst$itemtype, mean)

   count     mass 
514.0556 523.8704

}}}

Is this mean difference of 9.8 msec significant?

{{{
> ncst.lmer=lmer(RT~itemtype + (1|Subject.ID) + (1|itemtype), data = ncst )
}}}

You can see the result of the analysis by typing:

{{{
>ncst.lmer

and if you do, you get

Linear mixed model fit by REML 
Formula: RT ~ itemtype + (1 | Subject.ID) + (1 | Item
   Data: ncst 
   
  AIC   BIC  logLik   deviance REMLdev
 39988 40018 -19989    39990   39978

Random effects:
 Groups     Name        Variance Std.Dev.
 Subject.ID (Intercept) 3262.700 57.1201 
 itemtype   (Intercept)   14.890  3.8588 
 Residual               9682.982 98.4021 
Number of obs: 3319, groups: Subject.ID, 30; itemtype, 2

Fixed effects:
             Estimate Std. Error t value
(Intercept)   513.274     11.295   45.44
itemtypemass   10.128      6.297    1.61

Correlation of Fixed Effects:
            (Intr)
itemtypemss -0.265
}}}


What does this all mean?

 The t-value is 1.61 which wouldn&#8217;t be significant, but even if it were, we haven&#8217;t taken into account the potential confounding factor of  LOG_Surface.freq. So add that factor to the model

{{{
> ncst.lmerA=lmer(RT~itemtype + (1|Subject.ID) + (1|Item) +LOG_Surface.freq, data = ncst )
}}}

And what about LOG_stem.freq?

{{{
> ncst.lmerB=lmer(RT~itemtype + (1|Subject.ID) + (1|Item) +LOG_Surface.freq+LOG_stem.freq , data = ncst )
}}}

So it looks as if there is no mass/count effect nor a surface frequency effect, but there is a stem frequency effect. Since we have a subset of the items that are matched on stem frequency, analyzing that subset next would be a good move.

Another thing we can do is build a set of different models incorporating different factors.

{{{
> ncst.lmerC=lmer(RT~(1|Subject.ID) + (1|Item) + LOG_stem.freq , data = ncst )
}}}

includes only the stem frequency factor.

{{{
>ncst.lmerD=lmer(RT~(1|Subject.ID) + (1|Item) + LOG_stem.freq +itemtype , data = ncst )
}}}

adds a second factor, item type. Does the second model fit better than the first?

{{{
> anova (ncst.lmerC,ncst.lmerD)

Data: ncst
Models:
ncst.lmerC: RT ~ (1 | Subject.ID) + (1 | Item) + LOG_stem.freq
ncst.lmerD: RT ~ (1 | Subject.ID) + (1 | Item) + LOG_stem.freq + itemtype
           Df   AIC   BIC logLik  Chisq Chi Df Pr(>Chisq)
ncst.lmerC  5 39802 39832 -19896                         
ncst.lmerD  6 39804 39840 -19896 0.3083    1.     0.5787

}}}

No. The insignificant Chi Squared shows that a model that includes item type and stem frequency does not produce a better fit than a model that only includes stem frequency.

Here&#8217;s a model that has surface frequency and item type:

{{{
> ncst.lmerG=lmer(RT~(1|Subject.ID) + (1|Item) +LOG_Surface.freq +itemtype  , data = ncst )
}}}

If we add stem frequency to this model

{{{
> ncst.lmerF=lmer(RT~(1|Subject.ID) + (1|Item) + LOG_stem.freq +LOG_Surface.freq +itemtype  , data = ncst )
}}}

Do we get a better fit? Does model F perform better than model G?

{{{
> anova (ncst.lmerG,ncst.lmerF)
Data: ncst
Models:
ncst.lmerG: RT ~ (1 | Subject.ID) + (1 | Item+ LOG_Surface.freq + itemtype
ncst.lmerF: RT ~ (1 | Subject.ID) + (1 | Item+ LOG_stem.freq + LOG_Surface.freq + 
ncst.lmerF:     itemtype
           Df   AIC   BIC logLik  Chisq Chi Df Pr(>Chisq)    
ncst.lmerG  6 39817 39854 -19903                             
ncst.lmerF  7 39805 39848 -19896 14.448    1.  0.0001441 ***

}}}

Yes. G is a significant improvement over F. Note how the various measures of fit are lower for F. That means the fit is closer.

So adding stem frequency gives a s a better model, but adding item type or surface frequency or both does not improve the fit of the model to the data.