⇤ ← Revision 1 as of 2007-03-06 14:38:31
Size: 1736
Comment:
|
Size: 1760
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
Line 8: | Line 7: |
For example, let the variable LANG take on three levels (British, French, and German) that were originally coded as 1, 2, or 3 (respectively). To include this categorical variable in a regression model, create an indicator variable for each type of LANG. |
For example, let the variable LANG take on three levels (British, French, and German) that were originally coded as 1, 2, or 3 (respectively). To include this categorical variable in a regression model, create an indicator variable for each type of LANG. |
Line 13: | Line 12: |
{{{ | |
Line 19: | Line 19: |
Now any two of the three new variables may be included in the regression model. It doesn't matter which two -- once you know who is in any two of the three groups, you know who is in the third. For example: |
}}} Now any two of the three new variables may be included in the regression model. It doesn't matter which two -- once you know who is in any two of the three groups, you know who is in the third. For example: {{{ |
Line 24: | Line 27: |
}}} |
|
Line 26: | Line 31: |
Dummy variables
(taken from a SPSS mailing list)
Since the values of a categorical variable do not convey numeric information, such a variable should not be used in a regression model. Instead, each value of the categorical variable can be represented in the model with an indicator variable. An indicator (or dummy) variable contains only the values 1 and 0, with a value of 1 indicating that the associated observation has the given categorical value.
For example, let the variable LANG take on three levels (British, French, and German) that were originally coded as 1, 2, or 3 (respectively). To include this categorical variable in a regression model, create an indicator variable for each type of LANG.
In SPSS, you must first create the three new variables and give them a value. In this instance we give the variables each a value
- of zero as a starting point. Then an IF command replaces these zeros with ones for the appropriate observations. The syntax is:
COMPUTE british=0. COMPUTE french=0. COMPUTE german=0. IF lang=1 british=1. IF lang=2 french=1. IF lang=3 german=1.
Now any two of the three new variables may be included in the regression model. It doesn't matter which two -- once you know who is in any two of the three groups, you know who is in the third.
For example:
REGRESSION VARIABLES = yvar british french german xv4 xv5 /DEPENDENT = yvar /METHOD = ENTER british french xv4.
The total sum of squares for the set of indicator variables will be constant, regardless of which subset you enter. However, the individual parameter estimates will differ, depending on which subset is used.
For more information, see the REGRESSION chapter in any SPSS Reference Guide