Xplorigin is a software tool for deciphering population ancestry of different regions along an individual's genome. The tool is based on a generalized hidden Markov model, trained on data from the International HapMap Project.
Analysis of population ancestry relies on differences in the frequency of variants between populations. The first methods to perform such analysis relied on ancestry-informative markers that have been selected for showing large frequency differences. However, the most abundant source of genetic data today are whole genome arrays. Such data are based on markers selected due to technology and LD considerations, rather than informativeness with respect to ancestry. This means each particular marker will have random, typically slight frequency differences.
Xplorigin takes as input data from whole genome arrays and pools information across many consecutive markers to decide ancestry at each locus. The dependence of nearby markers on one another is a major obsacle to using this information mathematically. We therefore use haplotypes within haplotype blocks as our atomic variant. This not only captures the intermarker correlation structure, but also helps information content; the differences in haplotype frequencies across populations are typically greater (in terms of power to distinguish origin) than differences in SNP allele frequencies.