Template:Multi-locus allele clusters

From Wikipedia, the free encyclopedia

Multi-Locus Allele Clusters


In a haploid population, when a single locus is considered (blue), with two alleles, + and - we can see a differential geographical distribution between Population I (70% +) and Population II (30% +).

When we want to assign an individual to one of these populations using this single locus we will assign any + to population I because the probability (p) of this allele belonging to Population I is p = 0.7, the probability (q) of incorrectly assigning this allele to Population I is q = 1 − p, or 0.3. This amounts to a Bernoulli trial because the answer to the question "is this the correct population?" is a simple yes or no. This makes the test binomially distributed but with a single trial.

But when three loci per individual are taken into account, each with p = 0.7 for a + allele in Population I the average number of + alleles per individual becomes kp = 2.1 (number of trials (k = 3) × probability for each allele (p = 0.7)) and 0.9 (3 × 0.3) + alleles per individual in Population II. This is sometimes referred to as the population trait value. Because alleles are discrete entities we can only assign an individual to a population based on the number of whole + alleles it contains. Therefore we will assign any individual with three or two + alleles to Population I, and any individual with one or fewer + alleles to population II.

The binomial distribution with three trials and a probability of 0.7 shows that the probability of an individual from this population having a single + allele is 0.189 and for zero + alleles it is 0.027, which gives a misclassification rate of 0.189 + 0.027 = 0.216, which is a smaller chance of misclassification than for a single allele. Misclassification becomes much smaller as we use more alleles. When more loci are taken into account, each new locus adds an extra independent test to the binomial distribution, decreasing the chance of misclassification.

Using modern computer software and the abundance of genetic data now available, it is possible not only to distinguish such correlations for hundreds or even thousands of alleles, which form clusters, it is also possible to assign individuals to given populations with very little chance of error.[citation needed] It should be noted, however, that genes tend to vary clinally, and there are likely to be intermediate populations that reside in the geographical areas between our sample populations (Population III, for example, may lie equidistantly from Population I and Population II). In this case it may well be that Population III may display characteristics of both population I and Population II and have intermediate frequencies for many of the alleles used for classification, causing this population to be more prone to misclassification.