FAQ/saspol - CBU statistics Wiki

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment
Type the odd letters out: ONlY twO thinGs aRE infiNite

location: FAQ / saspol

SAS macro for computing polychoric correlations

The following macro, which is downloadable from the SAS website, may also be copied and pasted into a SAS file of form *.sas which can be included for use in a factor analysis using SAS using syntax as below.

PROC IMPORT OUT= WORK.WHEATON 
            DATAFILE= "C:\Documents and Settings\peterw\Desktop\My Documents\EQS FILES STAI\TRAIB_CC.TXT"
            DBMS=TAB REPLACE;
     GETNAMES=YES;
     DATAROW=2; 
%inc 'C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\JOE HERBERT\POLYCHORIC MACRO.sas'; 
%polychor(var=T_Q_21 T_Q_22 T_Q_23 T_Q_24 T_Q_25,out=dist)    
proc print; run;  

The polychoric correlation macro is below - which is assumed in the above to be put into a file called polychoric macro.sas.

/********************************************************************** |
*   **                                                                     |
|                                                                          |
|                               %POLYCHOR macro                            |
|                                 Version 1.2                              |
|                                                                          |
|      DISCLAIMER:                                                         |
|        THIS INFORMATION IS PROVIDED BY SAS INSTITUTE INC. AS A SERVICE   |
|   TO                                                                     |
|        ITS USERS.  IT IS PROVIDED "AS IS".  THERE ARE NO WARRANTIES,     |
|        EXPRESSED OR IMPLIED, AS TO MERCHANTABILITY OR FITNESS FOR A      |
|        PARTICULAR PURPOSE REGARDING THE ACCURACY OF THE MATERIALS OR     |
|   CODE                                                                   |
|        CONTAINED HEREIN.                                                 |
|                                                                          |
|      PURPOSE:                                                            |
|        The POLYCHOR macro creates a SAS data set containing a            |
|   correlation                                                            |
|        matrix of polychoric correlations or a distance matrix based on   |
|        polychoric correlations.                                          |
|                                                                          |
|      REQUIRES:                                                           |
|        %POLYCHOR requires only Version 6.07 or later of base SAS         |
|   Software.                                                              |
|                                                                          |
|      USAGE:                                                              |
|        Before calling the POLYCHOR macro, you must first define the      |
|   macro in                                                               |
|        your current SAS session. You can do this either by copying this  |
|   file                                                                   |
|        into the SAS program editor and submitting it, or by using a      |
|   %INCLUDE                                                               |
|        statement containing the path and filename of this file on your   |
|   system.                                                                |
|                                                                          |
|        Once the macro is defined, call the macro using the desired       |
|   options.                                                               |
|        See the section below for an example.                             |
|                                                                          |
|        The options and allowable values are:                             |
|                                                                          |
|           DATA=   SAS data set to be analyzed.  If the DATA= option is   |
|   not                                                                    |
|                   supplied, the most recently created SAS data set is    |
|                   used.                                                  |
|                                                                          |
|           VAR=    Polychoric or tetrachoric correlations will be         |
|   computed                                                               |
|                   for every pair of variables listed in the VAR= option. |
|                   Individual variable names, separated by blanks, must   |
|   be                                                                     |
|                   specified.  By default, all numeric variables found in |
|                   the data set will be used.  See LIMITATIONS below for  |
|                   time considerations.                                   |
|                                                                          |
|           OUT=    Specifies the name of the output data set that will    |
|                   contain the correlation or distance matrix.  By        |
|   default,                                                               |
|                   the output data set is named _PLCORR.                  |
|                                                                          |
|           TYPE=   Specifies the type of matrix to be created.  If        |
|                   TYPE=CORR (the default), then a correlation matrix is  |
|                   computed and the output data set is assigned a data    |
|   set                                                                    |
|                   type of CORR.  If TYPE=DISTANCE, then a distance       |
|   matrix                                                                 |
|                   is computed and the output dat set is assigned a data  |
|                   set type of DISTANCE.                                  |
|                                                                          |
|      PRINTED OUTPUT:                                                     |
|        No printed output is generated by the %POLYCHOR macro.            |
|                                                                          |
|      DETAILS:                                                            |
|        The PLCORR option in the FREQ procedure is used iteratively to    |
|        compute the polychoric correlation for each pair of variables.    |
|   If                                                                     |
|        both variables in a pair are binary (that is, they take on only   |
|   two                                                                    |
|        distinct values), then the correlation computed by the PLCORR     |
|        option is usually referred to as the tetrachoric correlation.     |
|                                                                          |
|        The individual correlation coefficients are then assembled into   |
|        either a TYPE=CORR data set containing a matrix of polychoric     |
|        correlations, or a TYPE=DISTANCE data set containing a matrix of  |
|        dissimilarity values.  The dissimilarity value used is computed   |
|   as:                                                                    |
|                                                                          |
|               1 - plcorr**2                                              |
|                                                                          |
|        where plcorr is the polychoric correlation.                       |
|                                                                          |
|        The resulting data set can be used for descriptive analyses only  |
|   in                                                                     |
|        either the FACTOR or the CALIS procedure (specify METHOD=ULS in   |
|        either procedure) if the correlation matrix is computed.  If the  |
|        maximum likelihood method (METHOD=ML) is used, note that none of  |
|        the hypothesis tests will be valid, and the polychoric            |
|   correlation                                                            |
|        matrix may be indefinite with small samples.  The distance matrix |
|        can be used in the CLUSTER procedure (however, the CCC value is   |
|   not                                                                    |
|        valid) or the MDS procedure.                                      |
|                                                                          |
|        See the Appendix, "Special SAS Data Sets" in the SAS/STAT User's  |
|        Guide for a description of TYPE=CORR and DISTANCE data sets.      |
|                                                                          |
|      MISSING VALUES:                                                     |
|        Observations with missing values are omitted from the computation |
|                                                                          |
|        of correlations.  However, when computing the polychoric          |
|        correlation between two variables, if an observation's values for |
|                                                                          |
|        these two variables are not missing, then the observation is used |
|        regardless of any missing values the observation may have on      |
|   other                                                                  |
|        variables.                                                        |
|                                                                          |
|      LIMITATIONS:                                                        |
|        LIMITED ERROR CHECKING IS DONE.  If the DATA= option is           |
|   specified,                                                             |
|        be sure the named data set exists.  If DATA= is not specified, a  |
|        data set must have been created previously in the current SAS     |
|        session.  Be sure that the variables specified in the VAR= option |
|        exist on that data set.  Running PROC CONTENTS on the data set    |
|        prior to using this macro is recommended for verifying the data   |
|   set                                                                    |
|        name and the names of variables.                                  |
|                                                                          |
|        The time required to compute the correlation or distance matrix   |
|        increases quadratically as the number of variables increases.  Up |
|                                                                          |
|        to 999 variables are allowed, but the time required for more than |
|                                                                          |
|        100 variables may be exorbitant.                                  |
|                                                                          |
|      EXAMPLE:                                                            |
|                                                                          |
|          data ordinal;                                                   |
|             array x{5} x1-x5;                                            |
|             do n=1 to 20;                                                |
|                do i=1 to 5;                                              |
|                   x{i}=rantbl(238423,.1,.2,.4,.2,.1);                    |
|                end;                                                      |
|                keep x1-x5;                                               |
|                output;                                                   |
|             end;                                                         |
|             run;                                                         |
|                                                                          |
|          * If not already defined in your current SAS session, define    |
|   the                                                                    |
|          * POLYCHOR macro before calling it by putting the path and      |
|          * filename of your copy of this file in the %INCLUDE statement  |
|          * below.  Example:  %inc 'c:\mysasfiles\polychor.sas';          |
|                                                                          |
|   *****************************************************************;     |
|                                                                          |
|          %inc '';                                                        |
|                                                                          |
|          * Create and print a TYPE=CORR data set named _PLCORR           |
|   containing                                                             |
|          * a matrix of polychoric correlations among all variables in    |
|   the                                                                    |
|          * data set ORDINAL.                                             |
|                                                                          |
|   *****************************************************************;     |
|                                                                          |
|          %polychor()                                                     |
|          proc print; run;                                                |
|                                                                          |
|          * Create and print a TYPE=DISTANCE data set named DIST          |
|   containing                                                             |
|          * a dissimilarity matrix using variables X1, X2, and X5.        |
|                                                                          |
|   *****************************************************************;     |
|                                                                          |
|          %polychor(data=ordinal,var=x1 x2 x5,out=dist,type=distance)     |
|          proc print; run;                                                |
|                                                                          |
|   ********************************************************************** |
|   **/                                                                    
%macro polychor(                                                           
          data=_last_,                                                    
          var=_numeric_,                                                  
         out=_plcorr,                                                    
          type=corr                                                       
          );                                                              
                                                                          
   options nonotes nostimer;                                              
   %if &data=_last_ %then %let data=&syslast;                             
                                                                          
   /* Verify that TYPE=CORR or DISTANCE */                                
   %if %upcase(&type) ne CORR and %upcase(&type) ne DISTANCE %then %do;   
     %put ERROR: TYPE= must be CORR or DISTANCE.;                         
     %goto exit;                                                          
   %end;                                                                  
                                                                          
   data _null_;                                                           
    set &data;                                                            
    array x{*} &var;                                                      
    length name $8.;                                                      
    if _n_=1 then                                                         
    do i=1 to dim(x);                                                     
      call vname(x{i} , name);                                            
      call symput('_v'||trim(left(put(i,4.))) , name);                    
    end;                                                                  
    p=dim(x);                                                             
    call symput('_p',trim(left(put(p,4.))));                              
    run;                                                                  
                                                                         
   %do _i=1 %to &_p;                                                      
   %do _j=&_i+1 %to &_p;                                                  
     proc freq data=&data noprint;                                        
       tables &&_v&_i * &&_v&_j / plcorr;                                 
       output out=_tmp plcorr;                                            
       run;                                                               
     data _null_;                                                         
       set _tmp;                                                          
       value=    %if %upcase(&type)=CORR %then _plcorr_;                  
                 %if %upcase(&type)=DISTANCE %then 1-_plcorr_**2;         
       ;                                                                  
       call symput("p&_i._&_j" , value);                                  
       run;                                                               
   %end;                                                                  
   %end;                                                                  
                                                                          
   data &out                                                              
     %if %upcase(&type)=CORR %then %do;                                   
       ;                                                                  
       _type_='CORR';                                                     
       length _name_ $8.;                                                 
     %end;                                                                
     %if %upcase(&type)=DISTANCE %then %str( (type=distance); );          
                                                                          
     /* Create matrix */                                                  
     array x{*}     %do i=1 %to &_p;                                      
                        &&_v&i                                            
                    %end;                                                 
       ;                                                                  
     do i=1 to dim(x);                                                    
       do j=1 to i;                                                       
                                                                          
         /* Set diagonal values */                                        
         if i=j then x{j}=   %if %upcase(&type)=CORR %then 1;             
                             %if %upcase(&type)=DISTANCE %then 0;         
        ;                                                                
                                                                          
        /* Set lower triangular values */                                
         else                                                             
         x{j}=symget("p"||trim(left(put(j,4.)))||"_"                      
   ||trim(left(put(i,4.))));                                              
       end;                                                               
                                                                          
       /* Create _NAME_ variable for CORR data sets */                    
       %if %upcase(&type)=CORR %then                                      
         %str( _name_=symget("_v"||trim(left(put(i,4.)))); );             
       drop i j;                                                          
       output;                                                            
     end;                                                                 
     run;                                                                 
                                                                          
   /* Add _TYPE_=MEAN, STD and N observations to CORR data sets */        
   %if %upcase(&type)=CORR %then %do;                                     
    proc summary data=&data;                                             
       var &var;                                                          
       output out=_simple (drop=_type_ _freq_ rename=(_stat_=_type_));    
       run;                                                               
     data &out (type=corr);                                               
       set _simple (where=(_type_ in ('MEAN','STD','N'))) &out;           
       run;                                                               
   %end;                                                                  
                                                                          
   %if &syserr=0 %then                                                    
   %if %upcase(&type)=CORR %then %do;                                     
     %put;                                                                
     %put POLYCHOR: Polychoric correlation matrix was output to data set  
   %upcase(&out).;                                                        
     %put;                                                                
   %end;                                                                  
   %else %do;                                                             
     %put;                                                                
     %put POLYCHOR: Distance matrix based on polychoric correlations was  
   output;                                                                
     %put %str(          to data set %upcase(&out).);                     
     %put;                                                                
   %end;                                                                  
                                                                          
   %exit:                                                                 
   options notes stimer;                                                  
   %mend polychor;