Diff for "FAQ/Messy" - CBU statistics Wiki
location: Diff for "FAQ/Messy"
Differences between revisions 2 and 6 (spanning 4 versions)
Revision 2 as of 2011-02-10 10:33:52
Size: 3710
Editor: PeterWatson
Comment:
Revision 6 as of 2011-02-10 11:20:38
Size: 2853
Editor: PeterWatson
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
||||<50% style="TEXT-ALIGN: center"> '''Date of Birth (DOB)''' || '''Date on System'''||
||||<50% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> 17.02.40 || 12.05.00 ||
||||<50% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> 02.08.73 || 18.01.05 ||
||||||<50% style="TEXT-ALIGN: center"> '''Date of Birth (DOB)''' || '''Date on System'''|| '''Date Left study'''
||||||<50% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> 17.02.40 || 12.05.00 || ||
||||||<50% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> 02.08.73 || 18.01.05 || 24.03.08 ||
Line 9: Line 9:
Supposing we have a list of particular messages which can be categorised into various types and rates of form with rates abbreviated to W (per week), D (per day), Y (per year) and M (per month) and wish to tabulate frequencies per week of each message type. Patient data is often entered in the form of dates. For example date of birth and date that a particular person started a study and, if applicable, left it.
We may wish to work out how long a person was a participant in a study. To do this we first of all need a cut-off date which is either the date when the study ended (studyEdate) for continuing participants or the date of drop out (DateFinished) for people who left the study.
Line 11: Line 12:
||||<50% style="TEXT-ALIGN: center"> '''Type''' || '''Rate'''||
||||<50% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> Appointment || 3-W ||
||||<50% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> appointment || 2-D ||
||||<50% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> chore || 1Y ||
||||<50% style="VERTICAL-ALIGN: top; TEXT-ALIGN: center"> Medication || 2-M ||

Notice that type names start with either capitals or lower case letters (which SPSS would regard as separate categories) and times are expressed both with and without hyphens.

The first thing we need to do is make sure types with the same spelling are all regarded as representing the same group regardless of case. This can be done using the string, substring and upcase commands.
There are various commands for telling SPSS what the number in a column represents a date. The DATE.DMY command defines a date in the spreadsheet.
Line 22: Line 15:
STRING TYPE2 (A10).
COMPUTE TYPE2 = UPCASE(SUBSTR(TYPE,1,5)).
COMPUTE studyEdate = DATE.DMY(30/11/2010).
Line 27: Line 19:
We then need to create a single column representing the number of times a particular message was sent in a week. This is done by decoupling the number from each rate and saving this as a numerical variable. Existing columns of data which represent dates may be flagged to SPSS using various formats (found by clicking the variable view tab and 'Type' next to the column of interest. Data may also be defined as dates using syntax.
Line 30: Line 22:
STRING COUNT(A1).
COMPUTE COUNT=SUBSTR(FREQUENCY,1,1).
EXE.
FORMATS dob DateFinished DateonSystem studyEdate (DATE8).
EXECUTE.
}}}
Line 34: Line 26:
RECODE COUNT (CONVERT) INTO COUNTN. We can the work our duration time in the study using the DATEDIF command which works out the difference in days or months between two dates. We firstly thought need to tell SPSS to use the study end date for people who did not drop out or the date they finished if they did. Since these two dates are entered in separate columns we can do this using the SYSMIS command which tells SPSS if there is no date finished date to use the study end date as the cut-off date. We can now compute the number of days in the study for dropouts and non-dropouts.

{{{
COMPUTE DFMISS = SYSMIS(DATEFINISHED).
IF (DFMISS EQ 1) usertime = DATEDIF(studyEdate, DateonSystem, "months")/12.
IF (DFMISS EQ 1) usertime2 = DATEDIF(studyEdate, DateonSystem, "days").

IF (DFMISS EQ 0) usfintime = DATEDIF(DateFinished, DateonSystem, "months")/12.
IF (DFMISS EQ 0) usfintime2 = DATEDIF(DateFinished, DateonSystem, "days").
}}}

We can also work out the ages of the participants when they entered the study in a similar way using the TRUNC function to round down.

{{{
COMPUTE age = TRUNC(DATEDIF(DateonSystem, DOB, "days")/365.25).
COMPUTE yearsuse = usertime2/365.25.
Line 38: Line 45:
The number of times each particular message is sent per week can then be computed by looking for a particular character string and multiplying the number we have just obtained by a quantity dependent on this string. For example a message signified by '4-W' = 4x7 = 28 messages sent per week.

This is achieved using the DO IF command.
We can now obtain summary statistics for durations of (non-)dropouts and age.
Line 43: Line 48:
DO IF INDEX(UPCASE(FREQUENCY),"D ")>0.
COMPUTE NPW = COUNTN*7.
COMPUTE DAILY =1.
ELSE IF INDEX(UPCASE(FREQUENCY),"W ")>0.
COMPUTE NPW = COUNTN.
COMPUTE DAILY=0.
ELSE IF INDEX(UPCASE(FREQUENCY),"-Y ")>0.
COMPUTE NPW = COUNTN/52.
COMPUTE DAILY=0.
ELSE IF INDEX(UPCASE(FREQUENCY),"YEAR ")>0.
COMPUTE NPW = COUNTN/52.
COMPUTE DAILY=0.
ELSE IF INDEX(UPCASE(FREQUENCY),"M ")>0.
COMPUTE NPW = COUNTN/4.
COMPUTE DAILY=0.
ELSE IF INDEX(UPCASE(FREQUENCY),"OFF")>0.
COMPUTE NPW = COUNTN/52.
COMPUTE DAILY=0.
END IF.
EXE.
DESCRIPTIVES
  VARIABLES=usertime2 USFINTIME2 AGE
  /STATISTICS=MEAN STDDEV MIN MAX .
Line 65: Line 53:
* LOCATE AND ADD MANUALLY (N=12) NON-STANDARD FREQUENCY ENTRIES

SORT CASES BY NPW.
EXE.

* OBTAIN FREQUENCIES

weight by NPW.
frequencies vars=type2.
exe.

weight by NPW.
frequencies vars=Client_ID.
exe.

WEIGHT OFF.

* DEFINE PAGER OR MOBILE MESSAGE USER

COMPUTE pager_client = 1.
exe.


IF (Client_ID eq 2 OR Client_ID eq 6 or Client_ID eq 7 or Client_ID eq 9 or Client_ID eq 10 OR Client_ID eq 12
or Client_ID eq 13 or Client_ID eq 14 or Client_ID eq 17 or Client_ID eq 18 or Client_ID eq 25 OR
Client_ID eq 30 or Client_ID eq 31 or Client_ID eq 32 or Client_ID eq 38 or Client_ID eq 39 or Client_ID eq 40)
pager_client = 0.
exe.
 
* SORT MESSAGE TYPE SEPARATELY FOR PAGER AND MOBILE USE

SORT CASES BY PAGER_CLIENT.
EXE.
SPLIT FILE BY PAGER_CLIENT.

weight by NPW.
frequencies vars=type2.
exe.

WEIGHT OFF.

SPLIT FILE OFF.

 * [:FAQ/Messy/rates: Obtaining frequencies from string variables]

How do I handle messy data in SPSS to produce duration times from dates and frequencies from strings?

Supposing we have a list of individual dates of form below

||||||<50% style="TEXT-ALIGN: center"> Date of Birth (DOB) || Date on System|| Date Left study

17.02.40

12.05.00

02.08.73

18.01.05

24.03.08

Patient data is often entered in the form of dates. For example date of birth and date that a particular person started a study and, if applicable, left it. We may wish to work out how long a person was a participant in a study. To do this we first of all need a cut-off date which is either the date when the study ended (studyEdate) for continuing participants or the date of drop out (DateFinished) for people who left the study.

There are various commands for telling SPSS what the number in a column represents a date. The DATE.DMY command defines a date in the spreadsheet.

COMPUTE studyEdate = DATE.DMY(30/11/2010).
EXE.

Existing columns of data which represent dates may be flagged to SPSS using various formats (found by clicking the variable view tab and 'Type' next to the column of interest. Data may also be defined as dates using syntax.

FORMATS dob DateFinished DateonSystem studyEdate (DATE8).
EXECUTE.

We can the work our duration time in the study using the DATEDIF command which works out the difference in days or months between two dates. We firstly thought need to tell SPSS to use the study end date for people who did not drop out or the date they finished if they did. Since these two dates are entered in separate columns we can do this using the SYSMIS command which tells SPSS if there is no date finished date to use the study end date as the cut-off date. We can now compute the number of days in the study for dropouts and non-dropouts.

COMPUTE DFMISS = SYSMIS(DATEFINISHED).
IF (DFMISS EQ 1) usertime = DATEDIF(studyEdate, DateonSystem, "months")/12.
IF (DFMISS EQ 1) usertime2 = DATEDIF(studyEdate, DateonSystem, "days").

IF (DFMISS EQ 0) usfintime = DATEDIF(DateFinished, DateonSystem, "months")/12.
IF (DFMISS EQ 0) usfintime2 = DATEDIF(DateFinished, DateonSystem, "days").

We can also work out the ages of the participants when they entered the study in a similar way using the TRUNC function to round down.

COMPUTE age = TRUNC(DATEDIF(DateonSystem, DOB, "days")/365.25).
COMPUTE yearsuse = usertime2/365.25.
EXE.

We can now obtain summary statistics for durations of (non-)dropouts and age.

DESCRIPTIVES
  VARIABLES=usertime2 USFINTIME2 AGE
  /STATISTICS=MEAN STDDEV MIN MAX .
  • [:FAQ/Messy/rates: Obtaining frequencies from string variables]

None: FAQ/Messy (last edited 2013-03-08 10:17:24 by localhost)