3710
Comment:
|
4571
|
Deletions are marked like this. | Additions are marked like this. |
Line 8: | Line 8: |
[:FAQ/Messy/rates: Obtaining frequencies from string variables] |
|
Line 40: | Line 42: |
This is achieved using the DO IF command. | This is achieved using the DO IF command to create the number of times each message is sent per week (NPW) . |
Line 65: | Line 67: |
* LOCATE AND ADD MANUALLY (N=12) NON-STANDARD FREQUENCY ENTRIES | There may be a few cases where a rate has been expressed in a non-standard way (such as 2-D Mon-Thurs) has been entered and these will need to be entered manually. These can be identified by using the SORT CASES command which will sort the number per week (NPW) variable such that the cases at the top of the file will represent the non-standard rates. |
Line 67: | Line 69: |
{{{ | |
Line 69: | Line 72: |
}}} | |
Line 70: | Line 74: |
* OBTAIN FREQUENCIES | We need to do one final thing before we can obtain frequencies for each message type. If we ask for frequencies we will only obtain the number of different messages of a particular type. To take into account repeated messages we use the weight command to tell SPSS how many times per week each message is repeated. |
Line 72: | Line 76: |
{{{ | |
Line 75: | Line 80: |
}}} | |
Line 76: | Line 82: |
We can also obtain the number of messages sent per week per client in the same way. {{{ |
|
Line 79: | Line 87: |
}}} When we have finished we swithc off the weight so that each row now returns to representing just one frequency. {{{ |
|
Line 81: | Line 91: |
}}} | |
Line 84: | Line 95: |
COMPUTE pager_client = 1. exe. |
We can also obtain two-way frequency classifications for subgroups of types of data (e.g. for mobile users and pager users) using the SORT CASES and SPLIT FILE commands which reformat the data so that it duplicates whatever syntax commands we issue. As witht her WEIGHT command we also need to switch this facility off when we have finished using it. |
Line 87: | Line 97: |
IF (Client_ID eq 2 OR Client_ID eq 6 or Client_ID eq 7 or Client_ID eq 9 or Client_ID eq 10 OR Client_ID eq 12 or Client_ID eq 13 or Client_ID eq 14 or Client_ID eq 17 or Client_ID eq 18 or Client_ID eq 25 OR Client_ID eq 30 or Client_ID eq 31 or Client_ID eq 32 or Client_ID eq 38 or Client_ID eq 39 or Client_ID eq 40) pager_client = 0. exe. * SORT MESSAGE TYPE SEPARATELY FOR PAGER AND MOBILE USE |
{{{ |
Line 105: | Line 107: |
Line 107: | Line 108: |
}}} |
How do I handle messy data in SPSS to produce duration times from dates and frequencies from strings?
Supposing we have a list of individual dates of form below
Date of Birth (DOB) |
Date on System |
|
17.02.40 |
12.05.00 |
|
02.08.73 |
18.01.05 |
[:FAQ/Messy/rates: Obtaining frequencies from string variables]
Supposing we have a list of particular messages which can be categorised into various types and rates of form with rates abbreviated to W (per week), D (per day), Y (per year) and M (per month) and wish to tabulate frequencies per week of each message type.
Type |
Rate |
|
Appointment |
3-W |
|
appointment |
2-D |
|
chore |
1Y |
|
Medication |
2-M |
Notice that type names start with either capitals or lower case letters (which SPSS would regard as separate categories) and times are expressed both with and without hyphens.
The first thing we need to do is make sure types with the same spelling are all regarded as representing the same group regardless of case. This can be done using the string, substring and upcase commands.
STRING TYPE2 (A10). COMPUTE TYPE2 = UPCASE(SUBSTR(TYPE,1,5)). EXE.
We then need to create a single column representing the number of times a particular message was sent in a week. This is done by decoupling the number from each rate and saving this as a numerical variable.
STRING COUNT(A1). COMPUTE COUNT=SUBSTR(FREQUENCY,1,1). EXE. RECODE COUNT (CONVERT) INTO COUNTN. EXE.
The number of times each particular message is sent per week can then be computed by looking for a particular character string and multiplying the number we have just obtained by a quantity dependent on this string. For example a message signified by '4-W' = 4x7 = 28 messages sent per week.
This is achieved using the DO IF command to create the number of times each message is sent per week (NPW) .
DO IF INDEX(UPCASE(FREQUENCY),"D ")>0. COMPUTE NPW = COUNTN*7. COMPUTE DAILY =1. ELSE IF INDEX(UPCASE(FREQUENCY),"W ")>0. COMPUTE NPW = COUNTN. COMPUTE DAILY=0. ELSE IF INDEX(UPCASE(FREQUENCY),"-Y ")>0. COMPUTE NPW = COUNTN/52. COMPUTE DAILY=0. ELSE IF INDEX(UPCASE(FREQUENCY),"YEAR ")>0. COMPUTE NPW = COUNTN/52. COMPUTE DAILY=0. ELSE IF INDEX(UPCASE(FREQUENCY),"M ")>0. COMPUTE NPW = COUNTN/4. COMPUTE DAILY=0. ELSE IF INDEX(UPCASE(FREQUENCY),"OFF")>0. COMPUTE NPW = COUNTN/52. COMPUTE DAILY=0. END IF. EXE.
There may be a few cases where a rate has been expressed in a non-standard way (such as 2-D Mon-Thurs) has been entered and these will need to be entered manually. These can be identified by using the SORT CASES command which will sort the number per week (NPW) variable such that the cases at the top of the file will represent the non-standard rates.
SORT CASES BY NPW. EXE.
We need to do one final thing before we can obtain frequencies for each message type. If we ask for frequencies we will only obtain the number of different messages of a particular type. To take into account repeated messages we use the weight command to tell SPSS how many times per week each message is repeated.
weight by NPW. frequencies vars=type2. exe.
We can also obtain the number of messages sent per week per client in the same way.
weight by NPW. frequencies vars=Client_ID. exe.
When we have finished we swithc off the weight so that each row now returns to representing just one frequency.
WEIGHT OFF.
* DEFINE PAGER OR MOBILE MESSAGE USER
We can also obtain two-way frequency classifications for subgroups of types of data (e.g. for mobile users and pager users) using the SORT CASES and SPLIT FILE commands which reformat the data so that it duplicates whatever syntax commands we issue. As witht her WEIGHT command we also need to switch this facility off when we have finished using it.
SORT CASES BY PAGER_CLIENT. EXE. SPLIT FILE BY PAGER_CLIENT. weight by NPW. frequencies vars=type2. exe. WEIGHT OFF. SPLIT FILE OFF.