FAQ/saspol - CBU statistics Wiki
location: FAQ / saspol

SAS macro for computing polychoric correlations

The following macro, which is downloadable from the SAS website, may also be copied and pasted into a SAS file of form *.sas which can be included for use in a factor analysis using SAS using syntax as below.

PROC IMPORT OUT= WORK.WHEATON 
            DATAFILE= "C:\Documents and Settings\peterw\Desktop\My Documents\EQS FILES STAI\TRAIB_CC.TXT"
            DBMS=TAB REPLACE;
     GETNAMES=YES;
     DATAROW=2; 
%inc 'C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\JOE HERBERT\POLYCHORIC MACRO.sas'; 
%polychor(var=T_Q_21 T_Q_22 T_Q_23 T_Q_24 T_Q_25,out=dist)    
proc print; run;  

The polychoric correlation macro is below - which is assumed in the above to be put into a file called polychoric macro.sas.

/********************************************************************** |
*   **                                                                     |
|                                                                          |
|                               %POLYCHOR macro                            |
|                                 Version 1.2                              |
|                                                                          |
|      DISCLAIMER:                                                         |
|        THIS INFORMATION IS PROVIDED BY SAS INSTITUTE INC. AS A SERVICE   |
|   TO                                                                     |
|        ITS USERS.  IT IS PROVIDED "AS IS".  THERE ARE NO WARRANTIES,     |
|        EXPRESSED OR IMPLIED, AS TO MERCHANTABILITY OR FITNESS FOR A      |
|        PARTICULAR PURPOSE REGARDING THE ACCURACY OF THE MATERIALS OR     |
|   CODE                                                                   |
|        CONTAINED HEREIN.                                                 |
|                                                                          |
|      PURPOSE:                                                            |
|        The POLYCHOR macro creates a SAS data set containing a            |
|   correlation                                                            |
|        matrix of polychoric correlations or a distance matrix based on   |
|        polychoric correlations.                                          |
|                                                                          |
|      REQUIRES:                                                           |
|        %POLYCHOR requires only Version 6.07 or later of base SAS         |
|   Software.                                                              |
|                                                                          |
|      USAGE:                                                              |
|        Before calling the POLYCHOR macro, you must first define the      |
|   macro in                                                               |
|        your current SAS session. You can do this either by copying this  |
|   file                                                                   |
|        into the SAS program editor and submitting it, or by using a      |
|   %INCLUDE                                                               |
|        statement containing the path and filename of this file on your   |
|   system.                                                                |
|                                                                          |
|        Once the macro is defined, call the macro using the desired       |
|   options.                                                               |
|        See the section below for an example.                             |
|                                                                          |
|        The options and allowable values are:                             |
|                                                                          |
|           DATA=   SAS data set to be analyzed.  If the DATA= option is   |
|   not                                                                    |
|                   supplied, the most recently created SAS data set is    |
|                   used.                                                  |
|                                                                          |
|           VAR=    Polychoric or tetrachoric correlations will be         |
|   computed                                                               |
|                   for every pair of variables listed in the VAR= option. |
|                   Individual variable names, separated by blanks, must   |
|   be                                                                     |
|                   specified.  By default, all numeric variables found in |
|                   the data set will be used.  See LIMITATIONS below for  |
|                   time considerations.                                   |
|                                                                          |
|           OUT=    Specifies the name of the output data set that will    |
|                   contain the correlation or distance matrix.  By        |
|   default,                                                               |
|                   the output data set is named _PLCORR.                  |
|                                                                          |
|           TYPE=   Specifies the type of matrix to be created.  If        |
|                   TYPE=CORR (the default), then a correlation matrix is  |
|                   computed and the output data set is assigned a data    |
|   set                                                                    |
|                   type of CORR.  If TYPE=DISTANCE, then a distance       |
|   matrix                                                                 |
|                   is computed and the output dat set is assigned a data  |
|                   set type of DISTANCE.                                  |
|                                                                          |
|      PRINTED OUTPUT:                                                     |
|        No printed output is generated by the %POLYCHOR macro.            |
|                                                                          |
|      DETAILS:                                                            |
|        The PLCORR option in the FREQ procedure is used iteratively to    |
|        compute the polychoric correlation for each pair of variables.    |
|   If                                                                     |
|        both variables in a pair are binary (that is, they take on only   |
|   two                                                                    |
|        distinct values), then the correlation computed by the PLCORR     |
|        option is usually referred to as the tetrachoric correlation.     |
|                                                                          |
|        The individual correlation coefficients are then assembled into   |
|        either a TYPE=CORR data set containing a matrix of polychoric     |
|        correlations, or a TYPE=DISTANCE data set containing a matrix of  |
|        dissimilarity values.  The dissimilarity value used is computed   |
|   as:                                                                    |
|                                                                          |
|               1 - plcorr**2                                              |
|                                                                          |
|        where plcorr is the polychoric correlation.                       |
|                                                                          |
|        The resulting data set can be used for descriptive analyses only  |
|   in                                                                     |
|        either the FACTOR or the CALIS procedure (specify METHOD=ULS in   |
|        either procedure) if the correlation matrix is computed.  If the  |
|        maximum likelihood method (METHOD=ML) is used, note that none of  |
|        the hypothesis tests will be valid, and the polychoric            |
|   correlation                                                            |
|        matrix may be indefinite with small samples.  The distance matrix |
|        can be used in the CLUSTER procedure (however, the CCC value is   |
|   not                                                                    |
|        valid) or the MDS procedure.                                      |
|                                                                          |
|        See the Appendix, "Special SAS Data Sets" in the SAS/STAT User's  |
|        Guide for a description of TYPE=CORR and DISTANCE data sets.      |
|                                                                          |
|      MISSING VALUES:                                                     |
|        Observations with missing values are omitted from the computation |
|                                                                          |
|        of correlations.  However, when computing the polychoric          |
|        correlation between two variables, if an observation's values for |
|                                                                          |
|        these two variables are not missing, then the observation is used |
|        regardless of any missing values the observation may have on      |
|   other                                                                  |
|        variables.                                                        |
|                                                                          |
|      LIMITATIONS:                                                        |
|        LIMITED ERROR CHECKING IS DONE.  If the DATA= option is           |
|   specified,                                                             |
|        be sure the named data set exists.  If DATA= is not specified, a  |
|        data set must have been created previously in the current SAS     |
|        session.  Be sure that the variables specified in the VAR= option |
|        exist on that data set.  Running PROC CONTENTS on the data set    |
|        prior to using this macro is recommended for verifying the data   |
|   set                                                                    |
|        name and the names of variables.                                  |
|                                                                          |
|        The time required to compute the correlation or distance matrix   |
|        increases quadratically as the number of variables increases.  Up |
|                                                                          |
|        to 999 variables are allowed, but the time required for more than |
|                                                                          |
|        100 variables may be exorbitant.                                  |
|                                                                          |
|      EXAMPLE:                                                            |
|                                                                          |
|          data ordinal;                                                   |
|             array x{5} x1-x5;                                            |
|             do n=1 to 20;                                                |
|                do i=1 to 5;                                              |
|                   x{i}=rantbl(238423,.1,.2,.4,.2,.1);                    |
|                end;                                                      |
|                keep x1-x5;                                               |
|                output;                                                   |
|             end;                                                         |
|             run;                                                         |
|                                                                          |
|          * If not already defined in your current SAS session, define    |
|   the                                                                    |
|          * POLYCHOR macro before calling it by putting the path and      |
|          * filename of your copy of this file in the %INCLUDE statement  |
|          * below.  Example:  %inc 'c:\mysasfiles\polychor.sas';          |
|                                                                          |
|   *****************************************************************;     |
|                                                                          |
|          %inc '';                                                        |
|                                                                          |
|          * Create and print a TYPE=CORR data set named _PLCORR           |
|   containing                                                             |
|          * a matrix of polychoric correlations among all variables in    |
|   the                                                                    |
|          * data set ORDINAL.                                             |
|                                                                          |
|   *****************************************************************;     |
|                                                                          |
|          %polychor()                                                     |
|          proc print; run;                                                |
|                                                                          |
|          * Create and print a TYPE=DISTANCE data set named DIST          |
|   containing                                                             |
|          * a dissimilarity matrix using variables X1, X2, and X5.        |
|                                                                          |
|   *****************************************************************;     |
|                                                                          |
|          %polychor(data=ordinal,var=x1 x2 x5,out=dist,type=distance)     |
|          proc print; run;                                                |
|                                                                          |
|   ********************************************************************** |
|   **/                                                                    
%macro polychor(                                                           
          data=_last_,                                                    
          var=_numeric_,                                                  
         out=_plcorr,                                                    
          type=corr                                                       
          );                                                              
                                                                          
   options nonotes nostimer;                                              
   %if &data=_last_ %then %let data=&syslast;                             
                                                                          
   /* Verify that TYPE=CORR or DISTANCE */                                
   %if %upcase(&type) ne CORR and %upcase(&type) ne DISTANCE %then %do;   
     %put ERROR: TYPE= must be CORR or DISTANCE.;                         
     %goto exit;                                                          
   %end;                                                                  
                                                                          
   data _null_;                                                           
    set &data;                                                            
    array x{*} &var;                                                      
    length name $8.;                                                      
    if _n_=1 then                                                         
    do i=1 to dim(x);                                                     
      call vname(x{i} , name);                                            
      call symput('_v'||trim(left(put(i,4.))) , name);                    
    end;                                                                  
    p=dim(x);                                                             
    call symput('_p',trim(left(put(p,4.))));                              
    run;                                                                  
                                                                         
   %do _i=1 %to &_p;                                                      
   %do _j=&_i+1 %to &_p;                                                  
     proc freq data=&data noprint;                                        
       tables &&_v&_i * &&_v&_j / plcorr;                                 
       output out=_tmp plcorr;                                            
       run;                                                               
     data _null_;                                                         
       set _tmp;                                                          
       value=    %if %upcase(&type)=CORR %then _plcorr_;                  
                 %if %upcase(&type)=DISTANCE %then 1-_plcorr_**2;         
       ;                                                                  
       call symput("p&_i._&_j" , value);                                  
       run;                                                               
   %end;                                                                  
   %end;                                                                  
                                                                          
   data &out                                                              
     %if %upcase(&type)=CORR %then %do;                                   
       ;                                                                  
       _type_='CORR';                                                     
       length _name_ $8.;                                                 
     %end;                                                                
     %if %upcase(&type)=DISTANCE %then %str( (type=distance); );          
                                                                          
     /* Create matrix */                                                  
     array x{*}     %do i=1 %to &_p;                                      
                        &&_v&i                                            
                    %end;                                                 
       ;                                                                  
     do i=1 to dim(x);                                                    
       do j=1 to i;                                                       
                                                                          
         /* Set diagonal values */                                        
         if i=j then x{j}=   %if %upcase(&type)=CORR %then 1;             
                             %if %upcase(&type)=DISTANCE %then 0;         
        ;                                                                
                                                                          
        /* Set lower triangular values */                                
         else                                                             
         x{j}=symget("p"||trim(left(put(j,4.)))||"_"                      
   ||trim(left(put(i,4.))));                                              
       end;                                                               
                                                                          
       /* Create _NAME_ variable for CORR data sets */                    
       %if %upcase(&type)=CORR %then                                      
         %str( _name_=symget("_v"||trim(left(put(i,4.)))); );             
       drop i j;                                                          
       output;                                                            
     end;                                                                 
     run;                                                                 
                                                                          
   /* Add _TYPE_=MEAN, STD and N observations to CORR data sets */        
   %if %upcase(&type)=CORR %then %do;                                     
    proc summary data=&data;                                             
       var &var;                                                          
       output out=_simple (drop=_type_ _freq_ rename=(_stat_=_type_));    
       run;                                                               
     data &out (type=corr);                                               
       set _simple (where=(_type_ in ('MEAN','STD','N'))) &out;           
       run;                                                               
   %end;                                                                  
                                                                          
   %if &syserr=0 %then                                                    
   %if %upcase(&type)=CORR %then %do;                                     
     %put;                                                                
     %put POLYCHOR: Polychoric correlation matrix was output to data set  
   %upcase(&out).;                                                        
     %put;                                                                
   %end;                                                                  
   %else %do;                                                             
     %put;                                                                
     %put POLYCHOR: Distance matrix based on polychoric correlations was  
   output;                                                                
     %put %str(          to data set %upcase(&out).);                     
     %put;                                                                
   %end;                                                                  
                                                                          
   %exit:                                                                 
   options notes stimer;                                                  
   %mend polychor;

None: FAQ/saspol (last edited 2013-03-08 10:17:18 by localhost)