How do I handle messy data in SPSS to produce duration times from dates and frequencies from strings?
Supposing we have a list of individual dates of form below
Date of Birth (DOB) |
Date on System |
|
17.02.40 |
12.05.00 |
|
02.08.73 |
18.01.05 |
Supposing we have a list of particular messages which can be categorised into various types and rates of form with rates abbreviated to W (per week), D (per day), Y (per year) and M (per month) and wish to tabulate frequencies per week of each message type.
Type |
Rate |
|
Appointment |
3-W |
|
appointment |
2-D |
|
chore |
1Y |
|
Medication |
2-M |
Notice that type names start with either capitals or lower case letters (which SPSS would regard as separate categories) and times are expressed both with and without hyphens.
The first thing we need to do is make sure types with the same spelling are all regarded as representing the same group regardless of case. This can be done using the string, substring and upcase commands.
STRING TYPE2 (A10). COMPUTE TYPE2 = UPCASE(SUBSTR(TYPE,1,5)). EXE.
We then need to create a single column representing the number of times a particular message was sent in a week. This is done by decoupling the number from each rate and saving this as a numerical variable.
STRING COUNT(A1). COMPUTE COUNT=SUBSTR(FREQUENCY,1,1). EXE. RECODE COUNT (CONVERT) INTO COUNTN. EXE.
The number of times each particular message is sent per week can then be computed by looking for a particular character string and multiplying the number we have just obtained by a quantity dependent on this string. For example a message signified by '4-W' = 4x7 = 28 messages sent per week.
This is achieved using the DO IF command.
DO IF INDEX(UPCASE(FREQUENCY),"D ")>0. COMPUTE NPW = COUNTN*7. COMPUTE DAILY =1. ELSE IF INDEX(UPCASE(FREQUENCY),"W ")>0. COMPUTE NPW = COUNTN. COMPUTE DAILY=0. ELSE IF INDEX(UPCASE(FREQUENCY),"-Y ")>0. COMPUTE NPW = COUNTN/52. COMPUTE DAILY=0. ELSE IF INDEX(UPCASE(FREQUENCY),"YEAR ")>0. COMPUTE NPW = COUNTN/52. COMPUTE DAILY=0. ELSE IF INDEX(UPCASE(FREQUENCY),"M ")>0. COMPUTE NPW = COUNTN/4. COMPUTE DAILY=0. ELSE IF INDEX(UPCASE(FREQUENCY),"OFF")>0. COMPUTE NPW = COUNTN/52. COMPUTE DAILY=0. END IF. EXE.
* LOCATE AND ADD MANUALLY (N=12) NON-STANDARD FREQUENCY ENTRIES
SORT CASES BY NPW. EXE.
* OBTAIN FREQUENCIES
weight by NPW. frequencies vars=type2. exe.
weight by NPW. frequencies vars=Client_ID. exe.
WEIGHT OFF.
* DEFINE PAGER OR MOBILE MESSAGE USER
COMPUTE pager_client = 1. exe.
IF (Client_ID eq 2 OR Client_ID eq 6 or Client_ID eq 7 or Client_ID eq 9 or Client_ID eq 10 OR Client_ID eq 12 or Client_ID eq 13 or Client_ID eq 14 or Client_ID eq 17 or Client_ID eq 18 or Client_ID eq 25 OR Client_ID eq 30 or Client_ID eq 31 or Client_ID eq 32 or Client_ID eq 38 or Client_ID eq 39 or Client_ID eq 40) pager_client = 0. exe.
* SORT MESSAGE TYPE SEPARATELY FOR PAGER AND MOBILE USE
SORT CASES BY PAGER_CLIENT. EXE. SPLIT FILE BY PAGER_CLIENT.
weight by NPW. frequencies vars=type2. exe.
WEIGHT OFF.
SPLIT FILE OFF.