* 1000 genome proejct
http://www.1000genomes.org/wiki/Analysis/variant-call-format
* samtools mpileup
http://samtools.sourceforge.net/mpileup.shtml
Tag Description
Specifications for VCF format are different in Info field.
http://www.1000genomes.org/wiki/Analysis/variant-call-format
- AA ancestral allele
- AC allele count in genotypes, for each ALT allele, in the same order as listed
- AF allele frequency for each ALT allele in the same order as listed: use this when estimated from primary data, not called genotypes
- AN total number of alleles in called genotypes
- BQ RMS base quality at this position
- CIGAR cigar string describing how to align an alternate allele to the reference allele
- DB dbSNP membership
- DP combined depth across samples, e.g. DP=154
- END end position of the variant described in this record (esp. for CNVs)
- H2 membership in hapmap2
- MQ RMS mapping quality, e.g. MQ=52
- MQ0 Number of MAPQ == 0 reads covering this record
- NS Number of samples with data
- SB strand bias at this position
- SOMATIC indicates that the record is a somatic mutation, for cancer genomics
- VALIDATED validated by follow-up experiment
* samtools mpileup
http://samtools.sourceforge.net/mpileup.shtml
Tag Description
- I16
- 1 #reference Q13 bases on the forward strand 2 #reference Q13 bases on the reverse strand
- 3 #non-ref Q13 bases on the forward strand 4 #non-ref Q13 bases on the reverse strand
- 5 sum of reference base qualities 6 sum of squares of reference base qualities
- 7 sum of non-ref base qualities 8 sum of squares of non-ref base qualities
- 9 sum of ref mapping qualities 10 sum of squares of ref mapping qualities
- 11 sum of non-ref mapping qualities 12 sum of squares of non-ref mapping qualities
- 13 sum of tail distance for ref bases 14 sum of squares of tail distance for ref bases
- 15 sum of tail distance for non-ref bases 16 sum of squares of tail distance for non-ref
- INDEL Indicating the variant is an INDEL.
- DP The number of reads covering or bridging POS.
- DP4 Number of 1) forward ref alleles; 2) reverse ref; 3) forward non-ref; 4) reverse non-ref alleles, used in variant calling. Sum can be smaller than DP because low-quality bases are not counted.
- PV4 P-values for 1) strand bias (exact test); 2) baseQ bias (t-test); 3) mapQ bias (t); 4) tail distance bias (t)
- FQ Consensus quality. If positive, FQ equals the phred-scaled probability of there being two or more different alleles. If negative, FQ equals the minus phred-scaled probability of all chromosomes being identical. Notably, given one sample, FQ is positive at hets and negative at homs.
- AF1 EM estimate of the site allele frequency of the strongest non-reference allele.
- CI95 Equal-tail (Bayesian) credible interval of the site allele frequency at the 95% level.
- PC2 Phred-scaled probability of the alternate allele frequency of group1 samples being larger (,smaller) than of group2 samples.
- PCHI2 Posterior weighted chi^2 P-value between group1 and group2 samples. This P-value is conservative.
- QCHI2 Phred-scaled PCHI2
- RP Number of permutations yeilding a smaller PCHI2
Specifications for VCF format are different in Info field.
'Bioinformatics > Biological data analysis' 카테고리의 다른 글
[python] universal set - computing subsets from a set (list) (0) | 2012.12.24 |
---|---|
[python] the ways to call external programs (0) | 2012.12.24 |
[linux] replace comma to tab (0) | 2012.12.24 |
[liftover] convert hg18 to hg19 - liftOver (0) | 2012.12.24 |
[python] Writing An Hadoop MapReduce Program In Python (0) | 2012.07.27 |