'Bioinformatics/Biological data analysis' 카테고리의 글 목록 (3 Page)

'Bioinformatics/Biological data analysis'에 해당되는 글 30건

2011.08.23 [base quality] Base Quality in BAM
2011.08.11 [java] 자바에서 시스템상의 명령어를 직접 호출할 경우
2011.08.10 [picard] CollectGcBiasMetrixcs.jar
2011.08.08 [pygr] Pygr 다운로드 및 설치
2011.08.08 [python] python 설치 (로컬계정)
2011.08.05 [bam] MD tag and cigar
2011.08.03 [python] 상대경로를 절대경로로 변경
2011.05.25 [picard] MarkDuplicates.jar 에서 MAPQ should be 0 for unmapped read
2011.05.25 [picard] Picard를 사용하여 duplicates 마킹
2011.05.24 [python] 현재 작업하고 있는 소스의 파일 이름 알아오는 방법

[base quality] Base Quality in BAM

Bioinformatics/Biological data analysis 2011. 8. 23. 11:28

SOLiD의 BAM파일의 경우에는 base quality을 ord('A') 를 한 후 -33 을 하면 됨. (0~40 까지의 범위)

QUAL: ASCII of base QUALity plus 33 (same as the quality string in the Sanger FASTQ format).

A base quality is the phred-scaled base error probability which equals 10 log10 Pr{base is wrong}.

This eld can be a `*' when quality is not stored. If not a `*', SEQ must not be a `*' and the

length of the quality string ought to equal the length of SEQ.

기본적으로 모두 옛날부터 시퀀서에서 적용되던 phred quality score의 개념을 따른다. 10은 10%의 에러 확률, 20은 1%의 에러 확률, 30은 0.1%의 에러 확률을 의미한다. 예를 들어 어떤 시퀀서가 99.99%의 정확도를 냈다고 한다면 그건 생산된 데이터(reads)의 대부분이 QV40 이상이었다는 의미가 된다. 장비마다 데이터를 생산하면서 각 메커니즘에 맞게 어떤 신호가 어떤 형식으로 나와서 그게 base call 또는 color call을 할 때 어느 정도의 정확성을 보이는지 미리 training 시켜서 얻은 경험(?)으로 나타낸다. 보통 다양한 생물종의 데이터를 준비하고 같은 기종이라도 여러 대에서 실험하면서 일종의 점수표를 만드는 것으로 안다. 따라서 개념은 같지만 서로 다른 기종의 QV를 그대로 비교하는 건 좀 위험하며, 기종에 따라 QV를 좀 더 좋게 보여주는 것이 있을 수도 있다. 시퀀싱을 한 후에 일차적인 평가를 하는데 중요한 단서이기는 하지만 실제 최종적인 서열의 정확도를 보여주는 것은 아니다. 참고로 다른 NGS들과 달리 SOLiD에서는 QV를 기반으로 한 필터링을 하지 않고 일단 모두 raw data로 생산한다. 복잡한 genome에서 QV가 특별하게 낮은 영역도 있을 수 있으므로, 그러한 곳에 대한 정보를 전부 잃기보다는 일단 분석 과정까지 가지고 간다는 의미가 있다.

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[hotspotter] hotspotter-1.2.1 설치 (0)	2011.12.13
[gatk] Invalid sequence number 24 (0)	2011.12.09
[java] 자바에서 시스템상의 명령어를 직접 호출할 경우 (0)	2011.08.11
[picard] CollectGcBiasMetrixcs.jar (0)	2011.08.10
[pygr] Pygr 다운로드 및 설치 (0)	2011.08.08

Posted by 옥탑방람보

[java] 자바에서 시스템상의 명령어를 직접 호출할 경우

Bioinformatics/Biological data analysis 2011. 8. 11. 09:32

자바에서 시스템 명령어나 외부 명령어를 시스템 상에서 사용해야할때
standard out 이나 standard error 가 지속적으로 발생하여
정상적인 프로세스 종료를 하지 못하고 프로세스가 좀비화 되는 경우가 있습니다.

이것은 자바가 standard out/error 를 직접 가져오게 되어서 발생되는 현상으로
일정 이상의 데이터가 쌓이게 되면 프로세스는 죽게 됩니다.

따라서 이때에는 standard out/error 을 파일로 떨구어 주어야 합니다.
시스템 상에서 실행되는 명령어에 직접 standard out/error 를 파일로 떨구는 부분을 추가해야 합니다.

예) bcftools -bcvg test.bam > standard.out 2> standard.error

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[gatk] Invalid sequence number 24 (0)	2011.12.09
[base quality] Base Quality in BAM (0)	2011.08.23
[picard] CollectGcBiasMetrixcs.jar (0)	2011.08.10
[pygr] Pygr 다운로드 및 설치 (0)	2011.08.08
[python] python 설치 (로컬계정) (0)	2011.08.08

Posted by 옥탑방람보

[picard] CollectGcBiasMetrixcs.jar

Bioinformatics/Biological data analysis 2011. 8. 10. 17:42

레퍼런스 순서와 bam 안의 염색체 순서가 일치해야함.

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[base quality] Base Quality in BAM (0)	2011.08.23
[java] 자바에서 시스템상의 명령어를 직접 호출할 경우 (0)	2011.08.11
[pygr] Pygr 다운로드 및 설치 (0)	2011.08.08
[python] python 설치 (로컬계정) (0)	2011.08.08
[bam] MD tag and cigar (0)	2011.08.05

Posted by 옥탑방람보

[pygr] Pygr 다운로드 및 설치

Bioinformatics/Biological data analysis 2011. 8. 8. 12:58

1. download (설치된 파이썬 버젼과 일치하는 것)
http://code.google.com/p/pygr/downloads/list
2. egg 파일 풀기 (알집 등 이용)
3. 파이썬 라이브러리 밑에 위치 (import 가능한 위치)

저작자표시 비영리

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[java] 자바에서 시스템상의 명령어를 직접 호출할 경우 (0)	2011.08.11
[picard] CollectGcBiasMetrixcs.jar (0)	2011.08.10
[python] python 설치 (로컬계정) (0)	2011.08.08
[bam] MD tag and cigar (0)	2011.08.05
[python] 상대경로를 절대경로로 변경 (0)	2011.08.03

Posted by 옥탑방람보

[python] python 설치 (로컬계정)

Bioinformatics/Biological data analysis 2011. 8. 8. 11:05

1. 다운로드
2. 압축 풀기
3. 설치 디렉토리에 mv
4. configure
5. make
6. ~/.bashrc

#python

export PATH=/.../install/Python-2.6.7/:$PATH

export PYTHONPATH=/.../kimps/lib:/.../install/Python-2.6.7/Lib:/.../install/Python-2.6.7/Lib/site-packages

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[picard] CollectGcBiasMetrixcs.jar (0)	2011.08.10
[pygr] Pygr 다운로드 및 설치 (0)	2011.08.08
[bam] MD tag and cigar (0)	2011.08.05
[python] 상대경로를 절대경로로 변경 (0)	2011.08.03
[picard] MarkDuplicates.jar 에서 MAPQ should be 0 for unmapped read (0)	2011.05.25

Posted by 옥탑방람보

[bam] MD tag and cigar

Bioinformatics/Biological data analysis 2011. 8. 5. 11:08

MD tag and cigar

10A5^AC6

REF: ATCGTAGCTAATTTGGACATCGGT

READ: ATCGTAGCTATTTTGG--ATCGGT

MD TAG: 10 A5 ^AC6

CIGAR: 16M 2D6M

READ: atcGTAGCTATTTTGGATA..GGT (ATCGTAGCTATTTTGGATAAAGGT)

MD TAG: 17 C1TC3

CIGAR: 3S 16M 2N3M

READ: ATCGTAGCTAATTTGGACATCGGT (ATCGTGGAGCTAATTTGGACATCGGT)

CIGAR: 5M 2I19M

MD TAG

The MD eld aims to achieve SNP/indel calling without looking at the reference. For example, a string `10A5^AC6' means from the leftmost reference base in the alignment, there are 10 matches followed by an A on the reference which is di erent from the aligned read base; the next 5 reference bases are matches followed by a 2bp deletion from the reference; the deleted sequence is AC; the last 6 bases are matches. The MD eld ought to match the CIGAR string.

CIGAR

M alignment match (can be a sequence match or mismatch)

I insertion to the reference

D deletion from the reference

N skipped region from the reference

S soft clipping (clipped sequences present in SEQ)

H hard clipping (clipped sequences NOT present in SEQ)

P padding (silent deletion from padded reference)

= sequence match

X sequence mismatch

H can only be present as the rst and/or last operation.

S may only have H operations between them and the ends of the CIGAR string.

For mRNA-to-genome alignment, an N operation represents an intron. For other types of alignments, the interpretation of N is not de ned.

Sum of lengths of the M/I/S/=/X operations ought to equal the length of SEQ.

저작자표시 비영리

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[pygr] Pygr 다운로드 및 설치 (0)	2011.08.08
[python] python 설치 (로컬계정) (0)	2011.08.08
[python] 상대경로를 절대경로로 변경 (0)	2011.08.03
[picard] MarkDuplicates.jar 에서 MAPQ should be 0 for unmapped read (0)	2011.05.25
[picard] Picard를 사용하여 duplicates 마킹 (0)	2011.05.25

Posted by 옥탑방람보

[python] 상대경로를 절대경로로 변경

Bioinformatics/Biological data analysis 2011. 8. 3. 13:12

import os

a = '../work/'
print os.path.abspath(a)

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[python] python 설치 (로컬계정) (0)	2011.08.08
[bam] MD tag and cigar (0)	2011.08.05
[picard] MarkDuplicates.jar 에서 MAPQ should be 0 for unmapped read (0)	2011.05.25
[picard] Picard를 사용하여 duplicates 마킹 (0)	2011.05.25
[python] 현재 작업하고 있는 소스의 파일 이름 알아오는 방법 (0)	2011.05.24

Posted by 옥탑방람보

[picard] MarkDuplicates.jar 에서 MAPQ should be 0 for unmapped read

Bioinformatics/Biological data analysis 2011. 5. 25. 13:43

Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 62094965, Read name ILLUMINA-A16956_100211:4:14:19403:10471#0, MAPQ should be 0 for unmapped read

이러한 에러 발생시,

VALIDATION_STRINGENCY=LENIENT

옵션을 추가해준다.

This is a common problem, and you'll run into it with all the Picard suite of tools.

There's a setting that goes something like VALIDATION_STRINGENCY, and if you set it to LENIENT, it will complain about those reads, but it won't stop on them.

This happens with bwa, because it concatenates reference sequence, which leads to slightly odd things happening when a read aligns over the overlap. So this might be the source of your problem. Regardless of what's causing it, you can examine the problem reads, and cut them out, or change the stringency to let them go through.

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[python] python 설치 (로컬계정) (0)	2011.08.08
[bam] MD tag and cigar (0)	2011.08.05
[python] 상대경로를 절대경로로 변경 (0)	2011.08.03
[picard] Picard를 사용하여 duplicates 마킹 (0)	2011.05.25
[python] 현재 작업하고 있는 소스의 파일 이름 알아오는 방법 (0)	2011.05.24

Posted by 옥탑방람보

[picard] Picard를 사용하여 duplicates 마킹

Bioinformatics/Biological data analysis 2011. 5. 25. 13:05

java -jar MarkDuplicates.jar INPUT=test.bam OUTPUT=test.marked.bam METRICS_FILE=test.txt TMP_DIR=. ASSUME_SORTED=true

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[python] python 설치 (로컬계정) (0)	2011.08.08
[bam] MD tag and cigar (0)	2011.08.05
[python] 상대경로를 절대경로로 변경 (0)	2011.08.03
[picard] MarkDuplicates.jar 에서 MAPQ should be 0 for unmapped read (0)	2011.05.25
[python] 현재 작업하고 있는 소스의 파일 이름 알아오는 방법 (0)	2011.05.24

Posted by 옥탑방람보

[python] 현재 작업하고 있는 소스의 파일 이름 알아오는 방법

Bioinformatics/Biological data analysis 2011. 5. 24. 16:11

import inspec
print inspect.getfile( inspect.currentframe() )

import os
print os.path.abspath( __file__ )

import sys
sys._getframe().f_code.co_filename

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[python] python 설치 (로컬계정) (0)	2011.08.08
[bam] MD tag and cigar (0)	2011.08.05
[python] 상대경로를 절대경로로 변경 (0)	2011.08.03
[picard] MarkDuplicates.jar 에서 MAPQ should be 0 for unmapped read (0)	2011.05.25
[picard] Picard를 사용하여 duplicates 마킹 (0)	2011.05.25

Posted by 옥탑방람보

이전 1 2 3 다음

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

옥탑방람보

'Bioinformatics/Biological data analysis'에 해당되는 글 30건

[base quality] Base Quality in BAM

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[java] 자바에서 시스템상의 명령어를 직접 호출할 경우

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[picard] CollectGcBiasMetrixcs.jar

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[pygr] Pygr 다운로드 및 설치

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[python] python 설치 (로컬계정)

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[bam] MD tag and cigar

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[python] 상대경로를 절대경로로 변경

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[picard] MarkDuplicates.jar 에서 MAPQ should be 0 for unmapped read

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[picard] Picard를 사용하여 duplicates 마킹

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

[python] 현재 작업하고 있는 소스의 파일 이름 알아오는 방법

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

카테고리

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

달력

링크

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역


	by 옥탑방람보