MD tag and cigar
10A5^AC6

REF:         ATCGTAGCTAATTTGGACATCGGT
READ:        ATCGTAGCTATTTTGG--ATCGGT
MD TAG:      10        A5   ^AC6
CIGAR:       16M             2D6M
READ:        atcGTAGCTATTTTGGATA..GGT (ATCGTAGCTATTTTGGATAAAGGT)
MD TAG:      17               C1TC3
CIGAR:       3S 16M             2N3M
READ:        ATCGTAGCTAATTTGGACATCGGT (ATCGTGGAGCTAATTTGGACATCGGT)
CIGAR:       5M   2I19M


MD TAG
The MD eld aims to achieve SNP/indel calling without looking at the reference. For example, a string `10A5^AC6' means from the leftmost reference base in the alignment, there are 10 matches followed by an A on the reference which is di erent from the aligned read base; the next 5 reference bases are matches followed by a 2bp deletion from the reference; the deleted sequence is AC; the last 6 bases are matches. The MD eld ought to match the CIGAR string.

CIGAR

M     alignment match (can be a sequence match or mismatch)
I     insertion to the reference
D     deletion from the reference
N     skipped region from the reference
S     soft clipping (clipped sequences present in SEQ)
H     hard clipping (clipped sequences NOT present in SEQ)
P     padding (silent deletion from padded reference)
=     sequence match
X     sequence mismatch

H can only be present as the rst and/or last operation.
S may only have H operations between them and the ends of the CIGAR string.
For mRNA-to-genome alignment, an N operation represents an intron. For other types of alignments, the interpretation of N is not de ned.
Sum of lengths of the M/I/S/=/X operations ought to equal the length of SEQ.
Posted by 옥탑방람보
,