The sequence example (continued) evaluating probabilities of numbers of replicates

The example is the one we mentioned earlier where there are 92 stimuli to be drawn with replacement to make a sequence of length 6. A k-th repetition implies k+1 repeats of the same stimuli. In the below (r choose s) is the combination function.

[From Laurence Shaw with some additional corrections suggested subsequently by Daniel Molinari]

As said, P(0 replicates) is given by (92x91x...87)/$$926 $$ (or equivalently (92 Choose 6)/$$926 $$ which calculates the number of ways of selecting 6 distinct objects from 92 choices and divides by the total number of choices).

For P(1 replicate) we need to be more careful as we could have 1,2 or 3 instances of 1 replicate (AABCDE, AABBCD, AABBCC in some permutation). We calculate the probability of each instance separately and sum the probabilities.

P( 1 instance 1 replicate): Need 1 number to be repeated of which we have 92 choices. Need 4 more distinct numbers of which we have 91 Choose 4 choices. Need two places for the repeated number (6 Choose 2 = 15 choices) and the distinct 4 can go in any order in the remaining places (4! Choices). So P(1 instance 1 replicate) = 92*(91 Choose 4)*15*4!/$$926 $$ = 0.146. A less elegant way illustrating this approach to doing the same calculation is is illustrated here.

P( 2 instances 1 replicate): Need two numbers to be repeated (92 Choose 2). 2 more distinct numbers (90 choose 2) , two places for the first number (6 Choose 2 = 15), two for the 2nd repeated number (4 choose 2 = 6) and the two distinct numbers can go either way around in the remaining places (2! = 2). Hence P( 2 instances 1 replicate) = (92 Choose 2)*(90 Choose 2)*15*6*2 / $$926 $$ = 0.005.

P( 1 instance 2 replicates) = 0.002. A derivation of this is given here.

For P( 3 instances 1 replicate): 3 numbers to be repeated, we need two places for the first one, two for the second one and the final number goes wherever is left and there are 3! ways of ordering these 3 numbers giving P( 3 instances 1 replicate) = 3! * (92 Choose 3)*(6 choose 2)*(4 choose 2) / $$926 $$ = 0.00012 (approx.) so this is having little impact of the below.

For 2 replicates you can also have P( 2 instances 2 replicates) which is given by 2! * (92 Choose 2)*(6 Choose 3)/$$926 $$ = 0.0000002 so again has had little impact on your calculation. This is the AAABBB case.

The sum of these probabilities equals 0.847 (0 replicates) + 0.146 (1 instance of a single replicate) + 0.005 (2 instances of 1 replicate) + 0.002 (1 instance of 2 replicates) = 1, as expected! Other probabilities such as multiple incidences of 2 or more repeated stimuli e.g. AABBCD or 3 or more replicates of the same stimulus e.g. AAAABC are negligible.

An example for a couple of these (one instance 3 reps and one instance 4 reps) is here. P(5 reps) = 92 / $$926 $$.

An interesting thing is the same replicate may need to be counted twice e.g. the sequence AABBBC is both one repeat (of A) ie 1 instance, 1 replicate and two repeats, of B. ie 1 instance, 2 replicates.

Laurence further commented:

This kind of situation (having multiple replicates of varying length within the same sequence) can still be calculated easily. Pick a number to be repeated twice and 3 spots for it (92*(6 choose 3)), one to be repeated once and 2 spots (91*(3 choose 2)) and a final number (90 choices) so the probability is 92*(6 choose 3)*91*3*90/$$926 $$ = 0.00007. Your final scenario is a 1 rep and a 4 rep which would be 92*(6 choose 4)*91/$$926 $$ = 0.0000002 so they both have minimal impact.

A nice illustration that repeats in relatively short sequences are not uncommon even when the possibilities for selection swamp the sequence length. The most well known example of course probably being the 'birthday problem'.