|
|
 |
|
HAP input format:
Note: The format for the HAP webserver has
changed. It is now simpler and easier to use. Any comments are welcome.
The format for the input to the HAP webserver is as follows. There
are three parts to the input file. The first part is information about
the genotypes which is mandatory. The second part is information about
the labels for the SNPs which is optional, and the
third part is information about the phenotypes which is also optional.
Each part is entered into a separate box in the input or uploaded
from a file.
In the genotype information, each line of the input file corresponds to
an individual. Each line contains a key identifying the individual
followed by a and then the genotype of the individual.
The genotype for each individual is specified by a string of symbols
representing the sequence of alleles. At each position, one of the
following possibilities appears:
- 'A','G','C' or 'T' representing a homozygous genotype.
- 'H' representing a heterozygous genotype.
- '?' representing a missing genotype.
HAP will infer the SNPs for the heterozygous and missing genotypes based
on the input data. However, HAP currently only works with biallelic SNPs,
and thus, there may only be two alleles in each position.
If there is only one type of homozygous genotype, the program will assign the
symbol "1" to the other type. All individuals must have the same
number of SNPs.
In the phenotype information, each line of the input file corresponds
to an individual. Each line contains a key identifying the individual
followed by a and then a number corresponding to the quantity of the
genotype. This number can be a real number and should be written in a standard
floating point notation (i.e., 35.03 is fine but 3503E-2 is not valid).
In the SNP labels information, each line of the input file contains a string
which will be used as the identifier of the SNP. If there are 10 SNPs in the
region, 10 lines are expected.
Example for input files
HAP quality control feature
After uploading your input file and submitting it to the HAP web server,
HAP will produce a table that lists a row for each SNP. Each row contains
the following fields:
- KEEP: Mark the box in that field if you want the SNP to
participate in your study. (by default, all SNPs participate)
- SNP: The ordinal number of the SNP.
- Label: The SNP label if given, or otherwise simply SNPxx where xx is the
SNP ordinal number.
- Allele1, Allele2: The two alleles.
- Frequency: The observed frequency of the minor allele.
- HWE: A Hardy-Weinberg Equilibrium test. This is a chi-squared
test over the observed number of heterozygous and homozygous vs. the
expected number of heterozygous and homozygous given the assumption of
HW-equilibrium. Large numbers in this columns imply departures from
Hardy Weinberg.
- ObsHet: The observed frequency of heterozygous SNPs in the
sample.
- ExpHet: The expected frequency of heterozygous SNPs if the minor
allele frequency is the one observed and under the Hardy-Weinberg
equilibrium.
- Missing: The frequency of missing genotypes in this SNP.
HAP output
You can get some of the output of HAP in a text format. Simply click on the
link to TEXT FORMAT.
The first table given in the output describes the block partition. There is
a row for each block, and the different fields are self-explanatory. Note
that the current block partition of HAP is using the criterion of minimum
number of tag SNPs that describe the data.
Haplotype predictions
The haplotype predictions are given in the following way. Every individual is
represented by two strings, which correspond to its two haplotypes. The
different blocks are separated by vertical lines. In each block, each
haplotype is colored by its sequences. There are six colors used - five
colors that are used for the five most common haplotypes, and black, which is
used for any other haplotype.
Block information
For each block, there are a few statistics that are being printed out by HAP.
First, the frequency of each of the SNPs is given. Then, the haplotype
distribution (given by a count of the phased haplotypes) is given, and the
diplotype distribution is given as well. Finally, for each common haplotype
or diplotype, a set of results for various statistical test are given,
where bold numbers correspond to statistically significant results.
|
 |
|
 |