How to use it
A general idea
Related Papers
The Webserver
What's New?
People involved
How to find us

HAP input format:

Note: The format for the HAP webserver has changed. It is now simpler and easier to use. Any comments are welcome.

The format for the input to the HAP webserver is as follows. There are three parts to the input file. The first part is information about the genotypes which is mandatory. The second part is information about the labels for the SNPs which is optional, and the third part is information about the phenotypes which is also optional. Each part is entered into a separate box in the input or uploaded from a file.

In the genotype information, each line of the input file corresponds to an individual. Each line contains a key identifying the individual followed by a and then the genotype of the individual. The genotype for each individual is specified by a string of symbols representing the sequence of alleles. At each position, one of the following possibilities appears:

  1. 'A','G','C' or 'T' representing a homozygous genotype.
  2. 'H' representing a heterozygous genotype.
  3. '?' representing a missing genotype.
HAP will infer the SNPs for the heterozygous and missing genotypes based on the input data. However, HAP currently only works with biallelic SNPs, and thus, there may only be two alleles in each position. If there is only one type of homozygous genotype, the program will assign the symbol "1" to the other type. All individuals must have the same number of SNPs.

In the phenotype information, each line of the input file corresponds to an individual. Each line contains a key identifying the individual followed by a and then a number corresponding to the quantity of the genotype. This number can be a real number and should be written in a standard floating point notation (i.e., 35.03 is fine but 3503E-2 is not valid).

In the SNP labels information, each line of the input file contains a string which will be used as the identifier of the SNP. If there are 10 SNPs in the region, 10 lines are expected.

Example for input files

HAP quality control feature

After uploading your input file and submitting it to the HAP web server, HAP will produce a table that lists a row for each SNP. Each row contains the following fields:
  • KEEP: Mark the box in that field if you want the SNP to participate in your study. (by default, all SNPs participate)
  • SNP: The ordinal number of the SNP.
  • Label: The SNP label if given, or otherwise simply SNPxx where xx is the SNP ordinal number.
  • Allele1, Allele2: The two alleles.
  • Frequency: The observed frequency of the minor allele.
  • HWE: A Hardy-Weinberg Equilibrium test. This is a chi-squared test over the observed number of heterozygous and homozygous vs. the expected number of heterozygous and homozygous given the assumption of HW-equilibrium. Large numbers in this columns imply departures from Hardy Weinberg.
  • ObsHet: The observed frequency of heterozygous SNPs in the sample.
  • ExpHet: The expected frequency of heterozygous SNPs if the minor allele frequency is the one observed and under the Hardy-Weinberg equilibrium.
  • Missing: The frequency of missing genotypes in this SNP.

HAP output

You can get some of the output of HAP in a text format. Simply click on the link to TEXT FORMAT. The first table given in the output describes the block partition. There is a row for each block, and the different fields are self-explanatory. Note that the current block partition of HAP is using the criterion of minimum number of tag SNPs that describe the data.

Haplotype predictions

The haplotype predictions are given in the following way. Every individual is represented by two strings, which correspond to its two haplotypes. The different blocks are separated by vertical lines. In each block, each haplotype is colored by its sequences. There are six colors used - five colors that are used for the five most common haplotypes, and black, which is used for any other haplotype.

Block information

For each block, there are a few statistics that are being printed out by HAP. First, the frequency of each of the SNPs is given. Then, the haplotype distribution (given by a count of the phased haplotypes) is given, and the diplotype distribution is given as well. Finally, for each common haplotype or diplotype, a set of results for various statistical test are given, where bold numbers correspond to statistically significant results.
Web Server