Project Overview
HAP takes as input a set of genotypes over a region, taken
from a population, and returns the haplotype phase of each of
the individuals' genotypes. From our studies, we observed that
HAP is very accurate when the number of individuals taken is at least a couple of dozens.
The public version of HAP currently works with unrelated individuals, but soon an updated
version would be added, where mother, father and child trios are allowed as part of the input.
In addition to phasing, HAP also produces a partition of the region into blocks
of correlated SNPs. The block partition of the haplotypes is such that
it minimizes the number of tag SNPs.
HAP also provides for each block the results of some statistical tests that show the
correlation between a haplotype and a given phenotype.
HAP leverages a new insight into the underlying structure of
haplotypes which shows that SNPs are organized in highly correlated
``blocks'' (Daly et al., 01', Patil et al., 01').
HAP has shown to have competitive accuracy compared to state of the art softwares
(such as PHASE, HAPLOTYPER). On the other hand, HAP is extremely fast
and can be used on very large data sets.
Genotypes, Haplotypes and SNPs - what are they?
Critical to the understanding of the genetic basis for complex
diseases is the modeling of human variation. Most of this variation
can be characterized by single nucleotide polymorphisms (SNPs) which
are mutations at a single nucleotide position. Currently, the human
genome project provided the genome sequence from a small set of
individuals. Clearly, in order to fully understand the
functions of the different parts of the genomes, we need to understand
better the way the genomes differ from one individual to another.
Each person's genome contains two copies of each chromosome, one
inherited from the father and the other from the mother. A person's
genotype specifies the pair of bases at each site, but does not
specify which base occurs on which chromosome. The sequence of each
chromosome separately is called a haplotype. The determination of the
haplotypes within a population is essential for understanding genetic
variation and the inheritance of complex diseases.