The distribution of loci across a chromosome and the whole chromosome is called a linkage map. Loci on the same chromosome are in linkage disequilibrium with each other, they are dependent. The act of inferring the position of these loci using the observed genotypes is called linkage mapping or linkage analysis.
Locus (pl. loci) is a specific location where the marker or gene is located.
Position measures:
To locate a specific locus we use bp (base pairs) or cM (centiMorgan) where 1cM corresponds to 1 Mb=10^6 bp approximately. (the relation depends on the recombination fraction).
Physical Map: Simply counting the number of base pairs
Genetic Map: Via que number of crossovers, 1M is the length of a segment that produces on average 1 crossover. We use the recombination fraction estimation.
There is no lineal relation between r_AC and x_AC. To translate from morgan to base pairs we use map functions: Haldane Map Function and Kosambi Map Function.
In practice, we estimate the recombination fraction and use a map function to convert it into base pairs.
To estimate r, we use Mating Designs, when they can't we carried out we can use pedigree data (shows relationships between family members) but it's very difficult.
Once we have estimated \(r\) we can construct the linkage map. It consists of two steps. First we group the loci in chromosomes and then we order the locus within that chromosome.
Grouping the loci is the “easy” step. We compare the pairwise LOD scores or the likelihood ratio test statistics (ADD LINK), if the \(LOC_{AB}>3\), A and B are in the same linkage group.
Additionally we can take into consideration \(r\), if \(LOC_{AB}>3\) and \(r_{AB}<0.45\), A and B are in the same linkage group.
Note 1: We can use chromosome and linkage group interchangeably (ithink)
Note 2, likelihood ratio test: \(H_0: r_{AB}=\frac{1}{2} \), the two loci are not related and \(H_1:r<\frac{1}{2 \)
For rearranging the loci we first assign an “optimality” criterion and then search which combination maximizes it (or minimizes). There are three measures.
Sum of adjacent recombination coefficients, \(sar=\sum_{i=1}^{m-1} = \hat{r}_{i(i+1)} \)
Sum of adjacent distances, \(sad=\sum_{i=1}^{m-1} = \hat{x}_{i(i+1)}\)
Sum of adjacent likelihoods, \(sal=\sum_{i=1}^{m-1} = L(\hat{r}_{i(i+1)})\)
Where i and (i+1) are adjacent loci. When the loci are ordered sar and sad will reach their minimum value and sal its maximum (Note 3).
Note 3: We always use \(\hat{r}\) or \(\hat{x}\) which is only an approximation of \(r\) so the maximum/minimum may not be the true order.
Note 4: \(sad\) and \(sal\) are additive, which is necessary for the branch and bound algorithm
To search within all the possible loci orders we use search algorithms