1

Variance of a transformed mean

It is sometimes necessary to transform data to, for example, downweight the influence of outliers, prior to performing any analysis. The reciprocal of reaction times is used for this purpose.

A transformed mean of m, m', with variance s'2 on a sample of size, n, has a backtransformed variance (ie on the original scale) given below obtained using the delta method.

Note: Please ignore the '^' signs in the second column of the below table. These appear to be needed, for some reason, to format the table below.

F(m)

$$\mbox{F}-1$$ (m')

$$\mbox{Variance } \mbox{F}^text{-1}(\mbox{m'})$$

Ln(m)

em'

(e2m' s'2 ) / n

1/m

1/m'

s'2 / (m'4 n)

$$\sqrt{\mbox{m}}$$

m'2

[(2m's')2 ]/n

$$2\mbox{ arcsine } \sqrt{m}$$

$$(\mbox{sin(m'/2}))2$$

( (cos(m'/2)sin(m'/2))2 s'2 ) /n

Note for the arcsine transform, if using packages such as SPSS, the calculation in performed in radians rather than degrees (the default on calculators).

Ordinarily when using power transforms we transform before taking the mean e.g. taking logs of raw data and then taking means of these logged values rather than averaging the raw data first and logging the resultant mean (See the Exploratory Data Analysis Graduate Statistics talk here).

Note that we usually use ln (log to the base e) which is preferred to log10 for interpretability - see here. If this link is broken the details are reproduced here.

As an example of the ease of interpreting the ln (natural log) function suppose we use reading score to predict ln(writing score) and reading score is found to have a regression coefficient of 0.0066305. This indicates that for a ten-unit increase in read, we expect to see about a 6.9% increase in writing score, since exp(.0066305*10) = 1.0685526.

A simpler example of interpretation of ln scores in a regression of y on x

Suppose we regress x on ln(y) and find x has a regression coefficient of 0.06 then

predicted ln(y2) = 0.06*(x+1) and predicted ln(y1) = 0.06*x and the difference in predicted ln(y2) and ln(y1) equals 0.06.

It then follows that ln(y2) - ln(y1) = ln(y2/y1) = 0.06 and so (y2/y1) = exp(0.06) = 1.06 so we have the useful result that a regression coefficient of 0.06 of x on y corresponds to a 6% increase in y for a unit increase in x. This is not the case using log10 since log10(0.06) does not equal 1.06.