# Algorithms for matching and allocating randomly to groups during clinical trials

Matching Algorithms

It is often advantageous to match controls to cases prior to statistical analysis in observational studies (e.g. by matching controls to patients coming in for hospital admissions). Matching minimises confounding effects between the match variables. Bland and Altman (1994) mention that only individual (1-1) matching should be regarded as yielding a matched study.

Exact matching, however, may not always be possible and a more pragmatic approach incorporated into the R Package 'Matching' (Sekhon (2011) - see also http://sekhon.berkeley.edu/matching) maximizes paired p-values from paired t-tests to match controls to cases. Other 1-1 matching criteria can also be used. This method performs well compared to other methods (Diamond and Sekhon, 2013).

The algorithm 'GenMatch' contained in the 'Matching' package produces optimal matches which can be subsequently tested for paired group diffrences using paired t-tests and McNemar tests. An example of its use is given below generating simulated data representing age and gender and matching the cases with the controls based upon these values. Matching can be either with (the default) or without replacement.

install.packages(c("Matching")) library(Matching) Y <- rnorm(1:100) X <- matrix(NA,100,2) lds <- matrix(NA,100,1) notlds <- matrix(NA,100,1) x <- c(0,1) gp <- rep(x,each=50) X[,1] <- rep(x,each=2) X[,2] <- round(rnorm(100,50,3)) mat <- GenMatch(Tr=gp,X=X) # 1-1 matching with replacement mat <- GenMatch(Tr=gp,X=X,replace=F) #with no replacement matching gp on X age <- X[,2] gen <- X[,1] id1 <- mat$matches[,1] id2 <- mat$matches[,2] row <- c(1:100) for (k in 1:50) { lds[k] <- age[id1[k]] notlds[k] <- age[id2[k]] } t.test(lds,notlds, paired=TRUE) # stats sig! need a lot of non-lds compared to lds ; the last few matches have higher ages since fewer matches to choose from! for (k in 1:50) { lds[k] <- gen[id1[k]] notlds[k] <- gen[id2[k]] } lds <- as.factor(lds) notlds <- as.factor(notlds) mcnemar.test(lds,notlds) # also stats.sig! (gender differences)

Minimisation (Random allocation in a Clinical Trial)

When the groups are determined by the researcher e.g. treatment/placebo allocation in a clinical trial a randomized allocation procedure may be more appropriate for group allocation. A popular technique for group allocation in clinical trials is minimization which attempts to allocate people to groups to best balance confounding group variables (stratifiers) such as age or gender.

Minimisation works by computing an imbalance score within each factor should the patient be allocated to a particular treatment group. The various imbalances are added together to give the overall imbalance in the study. The treatment group that would minimise the imbalance can be chosen directly, or a random element may be added (perhaps allocating a higher chance to the groups that will minimise the imbalance, or perhaps only allocating a chance to groups that will minimise the imbalance). The technique allocates to each group as each new individual ebters the clinical trial.

Two popular stand-aline packages for performing minimization are available. QMinim (Saghaei and Saghei 2011) is available both on-line or as free downloadable software. An older approach incorporated by the Minim software is also free for download for use on MS DOS (See here).

References

Bland JM and Altham DG (1994) Statistics notes: Matching. *BMJ* **309** 1128.

Pocock S and Simon R (1975) Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. *Biometrics* **31** 103-15. The authors also mention a minimization method and, like QMinim, use a small random component to ensure that allocations remain unpredictable.

Saghaei, M & Saghaei, S (2011) Implementation of an open-source customizable minimization program for allocation of patients to parallel groups in clinical trials (2011) *J. Biomedical Science and Engineering* **4** 734-739.

References for the Package Matching in R

(From http://cran.r-project.org/web/packages/Matching/citation.html)

To cite 'Matching' in publications use:

Jasjeet S. Sekhon (2011). Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R. *Journal of Statistical Software*, **42(7)**, 1-52. URL http://www.jstatsoft.org/v42/i07/.

To refer to the theory on which this package is based (full reference now available):

Alexis Diamond and Jasjeet S. Sekhon Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies. (2013) *Review of Economics and Statistics*. **95(3)**, 932-945.

Further developing the theory and practice:

Jasjeet Singh Sekhon and Richard D. Grieve (2012). A Matching Method For Improving Covariate Balance in Cost-Effectiveness Analyses. *Health Economics*, **21(6)**, 695-714.