MD tag and cigar
10A5^AC6
REF: ATCGTAGCTAATTTGGACATCGGT
READ: ATCGTAGCTATTTTGG--ATCGGT
MD TAG: 10 A5 ^AC6
CIGAR: 16M 2D6M
READ: atcGTAGCTATTTTGGATA..GGT (ATCGTAGCTATTTTGGATAAAGGT)
MD TAG: 17 C1TC3
CIGAR: 3S 16M 2N3M
READ: ATCGTAGCTAATTTGGACATCGGT (ATCGTGGAGCTAATTTGGACATCGGT)
CIGAR: 5M 2I19M
MD TAG
The MD eld aims to achieve SNP/indel calling without looking at the reference. For example, a string `10A5^AC6' means from the leftmost reference base in the alignment, there are 10 matches followed by an A on the reference which is di erent from the aligned read base; the next 5 reference bases are matches followed by a 2bp deletion from the reference; the deleted sequence is AC; the last 6 bases are matches. The MD eld ought to match the CIGAR string.
CIGAR
M alignment match (can be a sequence match or mismatch)
I insertion to the reference
D deletion from the reference
N skipped region from the reference
S soft clipping (clipped sequences present in SEQ)
H hard clipping (clipped sequences NOT present in SEQ)
P padding (silent deletion from padded reference)
= sequence match
X sequence mismatch
H can only be present as the rst and/or last operation.
S may only have H operations between them and the ends of the CIGAR string.
For mRNA-to-genome alignment, an N operation represents an intron. For other types of alignments, the interpretation of N is not de ned.
Sum of lengths of the M/I/S/=/X operations ought to equal the length of SEQ.
'Bioinformatics > Biological data analysis' 카테고리의 다른 글
[pygr] Pygr 다운로드 및 설치 (0) | 2011.08.08 |
---|---|
[python] python 설치 (로컬계정) (0) | 2011.08.08 |
[python] 상대경로를 절대경로로 변경 (0) | 2011.08.03 |
[picard] MarkDuplicates.jar 에서 MAPQ should be 0 for unmapped read (0) | 2011.05.25 |
[picard] Picard를 사용하여 duplicates 마킹 (0) | 2011.05.25 |