MD tag and cigar
10A5^AC6
REF: ATCGTAGCTAATTTGGACATCGGT
READ: ATCGTAGCTATTTTGG--ATCGGT
MD TAG: 10 A5 ^AC6
CIGAR: 16M 2D6M
READ: atcGTAGCTATTTTGGATA..GGT (ATCGTAGCTATTTTGGATAAAGGT)
MD TAG: 17 C1TC3
CIGAR: 3S 16M 2N3M
READ: ATCGTAGCTAATTTGGACATCGGT (ATCGTGGAGCTAATTTGGACATCGGT)
CIGAR: 5M 2I19M
MD TAG
The MD eld aims to achieve SNP/indel calling without looking at the reference. For example, a string `10A5^AC6' means from the leftmost reference base in the alignment, there are 10 matches followed by an A on the reference which is di erent from the aligned read base; the next 5 reference bases are matches followed by a 2bp deletion from the reference; the deleted sequence is AC; the last 6 bases are matches. The MD eld ought to match the CIGAR string.
CIGAR
M alignment match (can be a sequence match or mismatch)
I insertion to the reference
D deletion from the reference
N skipped region from the reference
S soft clipping (clipped sequences present in SEQ)
H hard clipping (clipped sequences NOT present in SEQ)
P padding (silent deletion from padded reference)
= sequence match
X sequence mismatch
H can only be present as the rst and/or last operation.
S may only have H operations between them and the ends of the CIGAR string.
For mRNA-to-genome alignment, an N operation represents an intron. For other types of alignments, the interpretation of N is not de ned.
Sum of lengths of the M/I/S/=/X operations ought to equal the length of SEQ.