FAQ/bootmed - CBU statistics Wiki

You are not allowed to do login on this page. Login and try again.

Clear message
location: FAQ / bootmed

Comparing medians using bootstrapping in SPSS

There is an add-on module available in SPSS 19 and later (which you may need to purchase separately to the base software) for carrying our bootstrapping. the syntax below, however, will compute a 95% bootstrap confidence interval for the difference between two unpaired group medians (in this case males and females) from the two gender skewed data sets both contained in this SPSS file and is usable in any version of SPSS using just the base software.

The syntax below can be run from a SPSS syntax window and evaluates the 95% confidence interval for the difference in the two group medians. It incorporates a bootstrapping macro, the frequencies procedure to produce the medians and the output management syntax (OMS) to send the output from the procedure to SPSS data files which can then be accessed and manipulated to produce the median difference.

Note the filenames containing the input data given above (skewed_data.sav) and output data files are referred to several times in the macro and will need changing as required. The given group variable name (SEX) and response name (time_o2) may also need to be changed as may the given values of the two groups being compared (3 and 4). You can also change the number of bootstrap samples (50) inputted in the below as 'samples=50' to a higher value.

****** oms_bootstrapping******************
***
*** Adapted from an example file from SPSS by Rhiannon Whitaker on 
*** 3/8/04; Further adapted by Daphne Russell and Chris Whitaker on 
*** 4/4/08 and including code from Ray Levesque's website 
*** (www.spsstools.net)

*******************************************
*** This syntax takes a data set 'E:\macroscourse\output\SPSS DEMO FILES\SPSS DEMO FILES\skewed_data.sav' and
*** selects with replacement a variable (in this case time_o2) from
*** all non-missing time_o2 values (N=939). It draws a specified number
*** of samples (using macro mean_bootstrap) and takes medians. The medians
*** of each sample are dumped (using OMS and DESCRIPTIVES) into two 
*** files for males and females separately. These are then combined
*** using the MATCH CASES command.
*** Plots and other features can then be extracted from the combined
*** gender data using usual SPSS syntax.
*************************************************
* one thing to note - always make sure there is at least
* one blank line between comments (not used by SPSS
* and preceded by an asterisk and the SPSS syntax

GET FILE='C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\skewed_data.sav' .

PRESERVE.
SET TVARS NAMES.

* the VIEWER=NO command suppresses tabular output

OMS /DESTINATION VIEWER=NO.

* identifies table of means produced using the descriptives command

OMS
/SELECT TABLES
/IF COMMANDS=['Frequencies'] SUBTYPES=['Statistics']
/DESTINATION FORMAT=SAV OUTFILE='C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\t11.sav'
/COLUMNS DIMNAMES=['Variables' 'Statistics'].

* the macro below has two arguments - number of bootstrap samples
* and the variable to be sampled; notice the macro is defined
* WITHIN the OMS OMSEND sandwich; this is a handy macro
* to run with any SPSS procedure to obtain sampling distributions
* from bootstrapping.
*************************************************************

DEFINE mean_bootstrap (samples=!TOKENS(1)
                      /indvar=!TOKENS(1)).
COMPUTE dummyvar=1.
AGGREGATE
 /OUTFILE='C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\aggrtemp1.sav'
 /BREAK=dummyvar
 /filesize=N.
MATCH FILES FILE=* /TABLE='C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\aggrtemp1.sav'
 /BY dummyvar.
!DO !other=1 !TO !samples.
SET SEED RANDOM.
WEIGHT OFF.
FILTER OFF.
DO IF $casenum=1.
COMPUTE #samplesize=filesize.
COMPUTE #filesize=filesize.
END IF.
DO IF (#samplesize>0 and filesize>0).
COMPUTE sampleWeight=rv.binom(#samplesize,1/#filesize).
COMPUTE #samplesize=#samplesize-sampleWeight.
COMPUTE #filesize=#filesize-1.
ELSE.
COMPUTE sampleWeight=0.
end if.
weight by sampleWeight.
FILTER BY sampleWeight.
FREQUENCIES VARIABLES=!indvar
  /FORMAT NOTABLE
  /PERCENTILES 50.0.
!DOEND.
!ENDDEFINE.

* we select males with non-missing values of time_o2
*************************************************************

GET FILE='C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\skewed_data.sav'.
FILTER OFF.
USE ALL.
select if(sex = 3 & ~MISSING(time_o2)).
exe.
SAVE OUTFILE='C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\skewed_data1.SAV'.
GET FILE='C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5skewed_data1.SAV'.

* five random samples of male time_o2s are requested

mean_bootstrap samples=50 indvar=time_o2.

OMSEND.

GET FILE 'C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\t11.sav'.
SAVE OUTFILE='C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\t12.sav'.
EXE.

* repeat sampling for females
***********************************************

PRESERVE.
SET TVARS NAMES.
OMS /DESTINATION VIEWER=NO

OMS
/SELECT TABLES
/IF COMMANDS=['Frequencies'] SUBTYPES=['Statistics']
/DESTINATION FORMAT=SAV OUTFILE='C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\t13.sav'
/COLUMNS DIMNAMES=['Variables' 'Statistics'].

GET FILE='C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\skewed_data.sav'.
FILTER OFF.
USE ALL.
select if(sex = 4 & ~MISSING(time_o2)).
exe.
SAVE OUTFILE='C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\skewed_data1.SAV'.
GET FILE='C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\skewed_data1.SAV'.

mean_bootstrap samples=50 indvar=time_o2.

OMSEND.

* back-up female time_o2 data before combining with males

GET FILE 'C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\t13.sav'.
SAVE OUTFILE='C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\t14.sav'.
EXE.

* combine male and female time_o2s and tidy files by
* removing superfluous variables and shortening
* variable names produced by OMS
********************************************

GET FILE 'C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\t14.sav'.
MATCH FILES /FILE=*
/RENAME(Command_ Subtype_ Label_ time_o2_Valid time_o2_Missing time_o2_50=d0 d1 d2 d3
time_o2_M time_o2_MedM)
/FILE='C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\t12.sav'
/Rename (Command_  Subtype_ Label_ time_o2_Valid= d4 d5 d6 d7)
/DROP=d0 d1 d2 d3 d4 d5 d6 d7.
EXE.
SAVE OUTFILE='C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\PPT_FILES\CBU_TALKS2003&5\mfcomb.sav'.
exe.

COMPUTE DIFF= TIME_o2_medM - time_o2_50.
FREQUENCIES VARIABLES = DIFF
 /FORMAT NOTABLE
 /PERCENTILES 1.0 2.5 50.0 97.5 99.0 100.0
 /HIST NORMAL.
EXE.