'분류 전체보기' 카테고리의 글 목록 (3 Page)

하드디스크 스토리지 증가속도 보다 3배 빨리 증가하는 NGS 데이터 아웃풋

Bioinformatics/Useful sources 2013. 2. 4. 15:04

http://genomebiology.com/2010/11/5/207

'Bioinformatics > Useful sources' 카테고리의 다른 글

HGP부터 2020년까지 PGM으로 가는 그림 (0)	2013.02.04

Posted by 옥탑방람보

,

하드디스크 스토리지 증가속도 보다 3배 빨리 증가하는 NGS 데이터 아웃풋 (0)	2013.02.04

[samtools] samtools sam bam (0)	2012.12.24
[samtools] SAMtools FAQ (0)	2012.12.24
[bam] MD tag and cigar (0)	2012.12.24
[python] a method to reduce ID length using ascii value (0)	2012.12.24
[python] decimal to binary (0)	2012.12.24

Generic Genome Browser - 예제

Bioinformatics/Genome browser 2013. 2. 4. 15:01

[General]

description = XXX Browser

Data Source에 표시되는 이름. Header에 정의가 없으면 이 이름이 헤더에 반영된다.

db_adaptor = Bio::DB::GFF

db_args = -adaptor memory -gff '/var/www/html/gbrowse/databases/XXX/'

파일기반으로 돌릴 때

db_args = -dsn dbi:mysql:database=DB_NAME;host=varigine.kobic.re.kr;user=root;passwd=XXX

디비기반으로 돌릴 때 (파일기반이나 디비기반이나 둘 중 하나만 기입)

plugins = FastaDumper GFFDumper RestrictionAnnotator

플러그인들 리스트 (데이터를 다운로드 등의 기능을 만들 수 있다.)

aggregators = transcript processed_transcript coding chromosome{centromere,cytoband} match recomb_rate{recombrate:ox_recombrate}

하나의 셋으로 지정하는 것들. 예를 들어 Match의 경우 같은 이름을 가진 셋들을 하나의 라인에 표시. 유전자의 각 부위를 하나의 라인에 표시하는데 사용

initial landmark = NM_015658

브라우저 실행시 default 값의 검색어로 지정되어진다.

# Web site configuration info

stylesheet = /gbrowse/gbrowse_wykim.css

css 파일이 위치하는 경로 - /var/www/html/gbrowse/gbrowse_wykim.css

buttons = /gbrowse/images/buttons tmp

images = /gbrowse/tmp

# Default glyph settings

glyph = generic

generic은 막대바로 표시

height = 8

bgcolor = black

fgcolor = black

label density = 50

label density는 이미지 상에 나타나는 요소의 ID들이 표시될 때의 촘촘함 정도를 어디까지 허용할 것인가

grid = 1

gridcolor = darkgray

bump density = 100

low res = 200000

keystyle = between

empty_tracks = suppress

link = AUTO

링크를 auto로 해두면 브라우져에서 클릭시 세부적은 영역정보를 보여준다.

# what image widths to offer

image widths = 450 640 800 1024 1152 1280

default width = 1024 low res = 200000

default features = CYT:overview CT:overview Lgene:region gtsh mRNA KSJgt JWgt YHgt CVgt

default로 나타나는 트랙들. 뒤에서 나열하는 각 트랙의 feature 이름을 넣으면 된다.

# max and default segment sizes for detailed view

max segment = 5000000

default segment = 250000

region segment = 2000000

# eight numbers for the zoom levels - should be more flexible, sorry

zoom levels = 100 500 1000 2000 5000 10000 20000 40000 100000 200000 500000 750000 1000000 2000000 5000000

# canonical features to show in overview

overview units = M

overview bgcolor = lightgrey

detailed bgcolor = blue

key bgcolor = beige

# examples to show in the introduction

examples = chr1:10247291..10267291 PLCH2 NM_015658 rs7542793 chrX ENST00000370434

예제 키워드. 브라우저상의 Examples에 나타난다.

# "automatic" classes to try when an unqualified identifier is given

automatic classes = overview Genes Contig

cache_overview = 0

header = <div id=topImage><img src='/gbrowse/images/topimage.jpg'/></div>

헤더에 나타날 부분이다. 위의 이미지는 /var/www/html/gbrowse/images/topimage.jpg 위치

footer = <BR><BR><BR>Copyright 2008 <br><a href="http://www.gachon.ac.kr">Gachun University of Medicine and Science</a>, Korea <br><a href="http://www.kobic.re.kr">Korean Bioinfomation Center (KOBIC)</a>, <a href="http://www.kribb.re.kr">KRIBB</a>, Korea

[CYT:overview]

feature = chromosome

glyph = ideogram

브라우저에 나타내는 모양지정

fgcolor = black

bgcolor = gneg:white gpos25:silver gpos50:gray gpos:gray gpos75:darkgray gpos100:black acen:cen gvar:var arcradius = 6

height = 25

bump = 0

label = 0

key = Ideogram

citation = Cytogenetic chromosome bands. Annotations from the <a href="http://genome.ucsc.edu/goldenPath/gbdDescriptions.html">UCSC Genome Browser Database</a> (cytoBand.txt.gz).

브라우저의 Tracks 안에 각 트랙이름을 클릭할 때의 설명이 이 citation 에 들어간 내용이 나타난다.

[CT:overview]

feature = contig

glyph = generic

fgcolor = black

bgcolor = blue

fillcolor = blue

bump = 1

label density = 10

height = 4

key = NT contigs

label = 0

citation = NT contigs created during the construction of the genome assembly. Annotations from the <a href="http://genome.ucsc.edu/goldenPath/gbdDescriptions.html">UCSC Genome Browser Database</a> (ctgPos.txt.gz).

[NT]

feature = contig

key = Contigs

background = black

link = http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=Nucleotide&dopt=GenBank&val=$name

citation = NT contigs created during the construction of the genome assembly. Annotations from the <a href="http://genome.ucsc.edu/goldenPath/gbdDescriptions.html">UCSC Genome Browser Database</a> (ctgPos.txt.gz). category = DNA

이 category는 Tracks 안에서 내용들을 그룹화시키는데 사용한다. 화면상에 그룹화 되어 나옴.

[RefGene]

feature = processed_transcript:UCSC_1

GFF파일의 칼럼에서 세번째 위치하는 이름이 이곳에 들어가면 된다. 기본적으로 세번째 이름만 위치하면 인식하게 되는데, 세번째 위치가 같지만 두번째 칼럼 이름이 다른 데이터들이 있다면 이를 위와 같이 세번째:두번째 이렇게 넣어주면 구분한다. 또한 이와 같이 processed_transcript 를 사용하여 유전자를 표시할 경우에는 뒤에 나오는 ensemble 의 경우와 겹치게 되므로 이를 구분하기 위해 여기에서는 processed_transcript:UCSC_1, 앙상블에서는 processed_transcript:Ensembl 이렇게 표시하면 된다.

glyph = processed_transcript

processed_transcript 라고 지정하면 그림이 유전자 스플라이싱 모양으로 나타난다.

stranded = 0

방향성 표시 여부

bgcolor = yellow

fgcolor = black

font2color = red

height = 8

description = sub {

my $f = shift;

return $f->attributes('Alias').': '.$f->attributes('Note');

}

label density = 15

link = http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=Nucleotide&dopt=GenBank&val=$name

link_target = _blank

key = Entrez genes

decorated_introns= 1

citation = mRNA sequences from NCBI's <a href="http://www.ncbi.nlm.nih.gov/RefSeq/">RefSeq resource</a>. Annotations from the <a href="http://genome.ucsc.edu/goldenPath/gbdDescriptions.html">UCSC Genome Browser Database</a> (refGene.txt.gz, refLink.txt.gz, refSeqSummary.txt.gz). Both RefSeq short descriptions and longer summaries (for annotated genes) are searchable, but only short descriptions are displayed alongside features.

category = Genes

[RefGene:300000]

300000 보다 넓 경우에는 RefGene 의 모양을 바꾼다는 말이다. 실제로 여기서는 같이 두었다. 원래는 glyph를 generic 으로 지정하여 넓게 보는 경우에 세세한 모양을 표시하지 않도록 지정하기 위해 넣은 부분이다.

feature = processed_transcript:UCSC_1

glyph = processed_transcript stranded = 1

bgcolor = yellow

fgcolor = black

height = 8

label density = 15

link = http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=Nucleotide&dopt=GenBank&val=$name key = Entrez genes decorated_introns= 1 citation = <a href="http://www.ncbi.nlm.nih.gov/RefSeq/">RefSeq</a> mRNAs mapped to the genome assembly

category = Genes

[dbS]

feature = snp129:UCSC

glyph = triangle

point = 1

orient = N

height = 5

label = 1

label density = 150

bump density = 250

bgcolor = blue

fgcolor = black

font2color = gray

link = http://www.ncbi.nih.gov/SNP/snp_ref.cgi?rs=$name

link_target = _blank

key = dbSNP SNPs (ver. 129)

citation = Reference SNP clusters (rs#'s) available in NCBI's <a href="http://www.ncbi.nih.gov/SNP/">dbSNP database</a>.

category = Variation

[gtsh]

feature = snp:HapMap_gt

glyph = allele_pie_multi

이 allele_pie_multi는 /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/Bio/Graphics/Glyph/ 에 위치

ref_allele = sub {

my $f = shift;

my $refallele = uc($f->dna);

$f->strand == -1 and $refallele =~ tr/ACTG/TGAC/;

return $refallele;

}

freq = sub { my $snp = shift;

my @pops = qw/CEU CHB JPT YRI/;

my @freqs = sort $snp->attributes('acounts');

#my $allele = $snp->attributes('Refallele');

my $allele = uc($snp->dna);

$snp->strand == -1 and $allele =~ tr/ACTG/TGAC/;

my %freqs;

foreach(@freqs){

my @items = split /:/;

if ($items[1] =~ /$allele\s([0-9.]+)/i){

$freqs{$items[0]} = $1;

}elsif($items[2] =~ /$allele\s([0-9.]+)/i){

$freqs{$items[0]} = $1;

}

return join ';', map { exists $freqs{$_} ? "$_:$freqs{$_}" : "$_:NO" } @pops;

}

alleles = sub {return shift->attributes('alleles')}

ref_strand = sub {shift->strand}

font2color = #0000FF

bgcolor = red

#stacked = 1

label density = 50

bump density = 250

key = HapMap genotyped SNPs

link = http://www.hapmap.org/cgi-perl/snp_details?name=$name&source=hapmap_B36

link_target = _blank

label = sub{ my $self = shift;

my $s = $self->strand;

my $n = $self->name;

return $n if $s == 0;

my $m = "+" if $s > 0;

$m = "-" if $s < 0;

return $n . "(" . $m . ")";

}

description = 1

height = 21

citation = SNPs in dbSNP genotyped by the <a href="http://www.hapmap.org">HapMap Project</a> (Phase 2, B36).

category = Variation

[KSJgt]

feature = KSJgt:KSJ

glyph = allele_tower

alleles =sub{ my $f= shift; return $f->attributes('allele');}

ref_strand = sub{shift->strand}

minor_allele = sub {

my $f = shift;

my @alleles = split /\//,$f->attributes('allele');

my ($ref_allele) = $f->attributes('ref');

return $alleles[0] eq $ref_allele ? $alleles[1] : $alleles[0];

}

maf = sub{my $f = shift;

my ($ref_cnt) = $f->attributes('sup1');

my ($oth_cnt) = $f->attributes('sup2');

if( $oth_cnt eq "" ){

$oth_cnt = $ref_cnt;

$ref_cnt = "0";

}

return 1- $ref_cnt/($ref_cnt+$oth_cnt);

}

label = 1

label density = 100

bump density = 225

fgcolor = black

bgcolor = blue

bump = 1

font2color = blue

key = KSJ genotypes

category = Variation

citation = Genotypes of KSJ.

link = http://www.koreagenome.org/pgp/?chr=$ref&f=$start&$w=120

$ref는 템플릿 이름, $start는 시작 포지션, $end는 끝 포지션

[KSJgt:100000]

feature = KSJgt:KSJ

glyph = triangle

point = 1

orient = N

bump density = 500

height = 5

label = 1

key = KSJ genotypes

category = Variation

'Bioinformatics > Genome browser' 카테고리의 다른 글

Generic Genome Browser 설치 (0)	2013.02.04
Proxy Server 를 이용한 IGV 구동 방법 (0)	2013.02.04

Posted by 옥탑방람보

,

Generic Genome Browser 설치

Bioinformatics/Genome browser 2013. 2. 4. 15:00

Tutorial for Generic Genome Browser (1.68버전)

GMOD wiki = http://www.gmod.org

Main page of GBrowse = http://gmod.org/wiki/GBrowse

Installation of GBrowse = http://gmod.org/wiki/GBrowse_Install_HOWTO

Download

http://www.gmod.org/wiki/index.php/Downloads

참고: 1.69버전은 bioperl 1.6 이상 설치가 되어 있어야함.

Pre requirements

MySQL - http://www.mysql.com

Apache Web Server - http://www.apache.org

Perl 5.0 이상 - http://www.perl.com

CPAN - http://www.cpan.org

Bioperl을 포함한 모듈 설치

>perl -MCPAN -e shell

>install CGI GD CGI::Session DBI Class::DBI::mysql Digest::MD5 Text::Shellwords

>install XML::Parser XML::Writer XML::Twig XML::DOM LWP MOBY Bio::Das GD::SVG

Bioperl 다운로드: http://search.cpan.org/~sendu/bioperl-1.5.2_102/

>gunzip bioperl-1.5.2_102.tar.gz

>tar xvf bioperl-1.5.2_102.tar

>perl Makefile.PL

>make

>make install

RedHat 계열일 경우는 yum install bioperl 사용가능

1.69버젼 설치시 Bioperl 설치

최신 bioperl 다운로드 - http://www.bioperl.org/DIST/nightly_builds/bioperl-live.tar.gz

최신 Bio::Graphics 설치 - http://gmod.cvs.sourceforge.net/viewvc/gmod/Bio-Graphics/

Install

>cd Generic-Genome-Browser-1.68

>perl Makefile.PL

>make

>make test (optional)

>make install UNINST=1

참고

설치시 default 디렉토리

CGI script: /usr/local/apache/cgi-bin/gbrowse

Static images: /usr/local/apache/htdocs/gbrowse

Config files: /usr/local/apache/conf/gbrowse.conf

The module: -standard site-specific Perl library location-

다른 디렉토리에 mysql 및 apache가 설치되어 있을 때

CONF Configuration file directory

HTDOCS Static files directory

CGIBIN CGI script directory

APACHE Base directory for Apache's conf, htdocs and cgibin directories

LIB Perl site-specific modules directory

BIN Perl executable scripts directory

NONROOT If set to a non-zero value (e.g. NONROOT=1) then install gbrowse in a way that does not require root access.

DO_XS Compile fast alignment algorithm (XS C extension)

옵션을 이용한 설치 예제

>perl Makefile.PL HTDOCS=/var/www/html CONF=/etc/httpd/conf CGIBIN=/var/www/cgi-bin

>perl Makefile.PL APACHE=/home/www

Fedora, MaxOSX, Ubuntu에 설치 시 예외발생

README.fedora, README.MacOSX, README.Ubuntu 를 읽어본다.

Fedora의 경우

system-config-selinux에서 selinux를 disabled 시켜준다. (혹은 /etc/sysconfig/selinux에서 직접)

(위와 같이 해도 안될경우에만) >yum update selinux-policy-targeted

>setsebool -P httpd_disable_trans 1

>/etc/init.d/httpd restart

perl Makefile.PL의 명령어 실행시 --SELINUX=1 옵션을 붙여준다.

설치 후 브라우저 테스트

http://localhost/gbrowse

http://localhost/cgi-bin/gbrowseyeast_chr1

MySQL 사용하기

mysql 데이터베이스 생성 및 권한 설정

mysql -uroot -p password -e 'create database yeast'

mysql -uroot -p password -e 'grant all privileges on yeast.* to me@localhost'

mysql -uroot -p password -e 'grant file on *.* to me@localhost'

mysql -uroot -p password -e 'grant select on yeast.* to nobody@localhost'

gff 파일 업로드

예시1) >bp_bulk_load_gff.pl -–maxfeature 1000000000 –c –d DB_NAME –u USER –p PASSWORD –-local -–gff3_munge -–fasta *.fa *.gff

예시2) >bp_load_gff.pl --maxfeature 1000000000 -c -d DB_NAME -u USER -p PASSWORD -–gff3_munge *.gff

처음 데이터를 밀어 넣을 때에는 bp_bulk_load_gff.pl 을 사용하는 것이 좋다. (속도 빠름) – 하지만 메모리를 많이 쓰게 되므로 dbSNP와 같이 많은 entry를 포함하는 경우에는 dbSNP데이터를 후에 bp_load_gff.pl 로 다시 밀어 넣어야 한다.

(참고) hapmap데이터를 넣어 allele_pie_multi.pm 을 사용해야할 경우에는 hapmap에서 제공하는 bulk_load_gff.pl 을 사용하여 우선 genotype 데이터를 밀어 넣어야한다.

bp_bulk_load_gff는 기존에 있는 데이터들이 모두 삭제되고 들어가게 되고 bp_load_gff에서 –c 옵션을 제거하고 사용하면 기존에 데이터가 있으면 새롭게 추가된다.

maxfeature 옵션은 영역이 넓은 데이터를 데이터베이스에 밀어 넣는 데에 사용. contig 같이 시작과 끝지점 차이가 큰 경우 이 옵션을 적용해야한다.

(참고) bp_load_gff.pl - This will incrementally load a database, optionally initializing it if it does not already exist. This script will work correctly even if the MySQL server is located on another host.

(참고) bp_bulk_load_gff.pl - This Perl script will initialize a new Bio::DB::GFF database with a fresh schema, deleting anything that was there before. It will then load the file. Only suitable for use the very first time you create a database, or when you want to start from scratch! The bulk loader is as much as 10x faster than bp_load_gff.pl, but does not work in the situation in which the MySQL database is running on a remote host.

(참고) bp_fast_load_gff.pl - This will incrementally load a database. On UNIX systems, it will activate a fast loader that makes the speed almost the same as the bulk loader. Be careful, though, because this is an experimental piece of software.

기본 경로

GBrowse의 환경설정을 위한 conf 파일들은 default 설정했을 경우 /etc/httpd/conf/gbrowse.conf/ 에 위치한다.

이 디렉토리 안에 위치하는 다양한 *.conf 파일 개수 만큼 브라우져를 디스플레이할 수 있다.

파일기반의 경우는 /var/www/html/gbrowse/databases/ 의 디렉토리 아래 해당 프로젝트의 폴더를 만들어 관리하면 된다. 이 폴더 안에 존재하는 *.gff 파일을 자동으로 인식한다.

conf 파일 설정

[General]

description = XXX Browser

Data Source에 표시되는 이름. Header에 정의가 없으면 이 이름이 헤더에 반영된다.

db_adaptor = Bio::DB::GFF

db_args = -adaptor memory -gff '/var/www/html/gbrowse/databases/XXX/'

파일기반으로 돌릴 때

db_args = -dsn dbi:mysql:database=DB_NAME;host=varigine.kobic.re.kr;user=root;passwd=XXX

디비기반으로 돌릴 때 (파일기반이나 디비기반이나 둘 중 하나만 기입)

plugins = FastaDumper GFFDumper RestrictionAnnotator

플러그인들 리스트 (데이터를 다운로드 등의 기능을 만들 수 있다.)

aggregators = transcript processed_transcript coding chromosome{centromere,cytoband} match recomb_rate{recombrate:ox_recombrate}

하나의 셋으로 지정하는 것들. 예를 들어 Match의 경우 같은 이름을 가진 셋들을 하나의 라인에 표시. 유전자의 각 부위를 하나의 라인에 표시하는데 사용

initial landmark = NM_015658

브라우저 실행시 default 값의 검색어로 지정되어진다.

# Web site configuration info

stylesheet = /gbrowse/gbrowse_wykim.css

css 파일이 위치하는 경로 - /var/www/html/gbrowse/gbrowse_wykim.css

buttons = /gbrowse/images/buttons tmp

images = /gbrowse/tmp

# Default glyph settings

glyph = generic

generic은 막대바로 표시

height = 8

bgcolor = black

fgcolor = black

label density = 50

label density는 이미지 상에 나타나는 요소의 ID들이 표시될 때의 촘촘함 정도를 어디까지 허용할 것인가

grid = 1

gridcolor = darkgray

bump density = 100

low res = 200000

keystyle = between

empty_tracks = suppress

link = AUTO

링크를 auto로 해두면 브라우져에서 클릭시 세부적은 영역정보를 보여준다.

# what image widths to offer

image widths = 450 640 800 1024 1152 1280

default width = 1024 low res = 200000

default features = CYT:overview CT:overview Lgene:region gtsh mRNA KSJgt JWgt YHgt CVgt

default로 나타나는 트랙들. 뒤에서 나열하는 각 트랙의 feature 이름을 넣으면 된다.

# max and default segment sizes for detailed view

max segment = 5000000

default segment = 250000

region segment = 2000000

# eight numbers for the zoom levels - should be more flexible, sorry

zoom levels = 100 500 1000 2000 5000 10000 20000 40000 100000 200000 500000 750000 1000000 2000000 5000000

# canonical features to show in overview

overview units = M

overview bgcolor = lightgrey

detailed bgcolor = blue

key bgcolor = beige

# examples to show in the introduction

examples = chr1:10247291..10267291 PLCH2 NM_015658 rs7542793 chrX ENST00000370434

예제 키워드. 브라우저상의 Examples에 나타난다.

# "automatic" classes to try when an unqualified identifier is given

automatic classes = overview Genes Contig

cache_overview = 0

header = <div id=topImage><img src='/gbrowse/images/topimage.jpg'/></div>

헤더에 나타날 부분이다. 위의 이미지는 /var/www/html/gbrowse/images/topimage.jpg 위치

footer = <BR><BR><BR>Copyright 2008 <br><a href="http://www.gachon.ac.kr">Gachun University of Medicine and Science</a>, Korea <br><a href="http://www.kobic.re.kr">Korean Bioinfomation Center (KOBIC)</a>, <a href="http://www.kribb.re.kr">KRIBB</a>, Korea

[CYT:overview]

feature = chromosome

glyph = ideogram

브라우저에 나타내는 모양지정

fgcolor = black

bgcolor = gneg:white gpos25:silver gpos50:gray gpos:gray gpos75:darkgray gpos100:black acen:cen gvar:var arcradius = 6

height = 25

bump = 0

label = 0

key = Ideogram

citation = Cytogenetic chromosome bands. Annotations from the <a href="http://genome.ucsc.edu/goldenPath/gbdDescriptions.html">UCSC Genome Browser Database</a> (cytoBand.txt.gz).

브라우저의 Tracks 안에 각 트랙이름을 클릭할 때의 설명이 이 citation 에 들어간 내용이 나타난다.

[CT:overview]

feature = contig

glyph = generic

fgcolor = black

bgcolor = blue

fillcolor = blue

bump = 1

label density = 10

height = 4

key = NT contigs

label = 0

citation = NT contigs created during the construction of the genome assembly. Annotations from the <a href="http://genome.ucsc.edu/goldenPath/gbdDescriptions.html">UCSC Genome Browser Database</a> (ctgPos.txt.gz).

[NT]

feature = contig

key = Contigs

background = black

link = http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=Nucleotide&dopt=GenBank&val=$name

citation = NT contigs created during the construction of the genome assembly. Annotations from the <a href="http://genome.ucsc.edu/goldenPath/gbdDescriptions.html">UCSC Genome Browser Database</a> (ctgPos.txt.gz). category = DNA

이 category는 Tracks 안에서 내용들을 그룹화시키는데 사용한다. 화면상에 그룹화 되어 나옴.

[RefGene]

feature = processed_transcript:UCSC_1

GFF파일의 칼럼에서 세번째 위치하는 이름이 이곳에 들어가면 된다. 기본적으로 세번째 이름만 위치하면 인식하게 되는데, 세번째 위치가 같지만 두번째 칼럼 이름이 다른 데이터들이 있다면 이를 위와 같이 세번째:두번째 이렇게 넣어주면 구분한다. 또한 이와 같이 processed_transcript 를 사용하여 유전자를 표시할 경우에는 뒤에 나오는 ensemble 의 경우와 겹치게 되므로 이를 구분하기 위해 여기에서는 processed_transcript:UCSC_1, 앙상블에서는 processed_transcript:Ensembl 이렇게 표시하면 된다.

glyph = processed_transcript

processed_transcript 라고 지정하면 그림이 유전자 스플라이싱 모양으로 나타난다.

stranded = 0

방향성 표시 여부

bgcolor = yellow

fgcolor = black

font2color = red

height = 8

description = sub {

my $f = shift;

return $f->attributes('Alias').': '.$f->attributes('Note');

}

label density = 15

link = http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=Nucleotide&dopt=GenBank&val=$name

link_target = _blank

key = Entrez genes

decorated_introns= 1

citation = mRNA sequences from NCBI's <a href="http://www.ncbi.nlm.nih.gov/RefSeq/">RefSeq resource</a>. Annotations from the <a href="http://genome.ucsc.edu/goldenPath/gbdDescriptions.html">UCSC Genome Browser Database</a> (refGene.txt.gz, refLink.txt.gz, refSeqSummary.txt.gz). Both RefSeq short descriptions and longer summaries (for annotated genes) are searchable, but only short descriptions are displayed alongside features.

category = Genes

[RefGene:300000]

300000 보다 넓 경우에는 RefGene 의 모양을 바꾼다는 말이다. 실제로 여기서는 같이 두었다. 원래는 glyph를 generic 으로 지정하여 넓게 보는 경우에 세세한 모양을 표시하지 않도록 지정하기 위해 넣은 부분이다.

feature = processed_transcript:UCSC_1

glyph = processed_transcript stranded = 1

bgcolor = yellow

fgcolor = black

height = 8

label density = 15

link = http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=Nucleotide&dopt=GenBank&val=$name key = Entrez genes decorated_introns= 1 citation = <a href="http://www.ncbi.nlm.nih.gov/RefSeq/">RefSeq</a> mRNAs mapped to the genome assembly

category = Genes

[dbS]

feature = snp129:UCSC

glyph = triangle

point = 1

orient = N

height = 5

label = 1

label density = 150

bump density = 250

bgcolor = blue

fgcolor = black

font2color = gray

link = http://www.ncbi.nih.gov/SNP/snp_ref.cgi?rs=$name

link_target = _blank

key = dbSNP SNPs (ver. 129)

citation = Reference SNP clusters (rs#'s) available in NCBI's <a href="http://www.ncbi.nih.gov/SNP/">dbSNP database</a>.

category = Variation

[gtsh]

feature = snp:HapMap_gt

glyph = allele_pie_multi

이 allele_pie_multi는 /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/Bio/Graphics/Glyph/ 에 위치

ref_allele = sub {

my $f = shift;

my $refallele = uc($f->dna);

$f->strand == -1 and $refallele =~ tr/ACTG/TGAC/;

return $refallele;

}

freq = sub { my $snp = shift;

my @pops = qw/CEU CHB JPT YRI/;

my @freqs = sort $snp->attributes('acounts');

#my $allele = $snp->attributes('Refallele');

my $allele = uc($snp->dna);

$snp->strand == -1 and $allele =~ tr/ACTG/TGAC/;

my %freqs;

foreach(@freqs){

my @items = split /:/;

if ($items[1] =~ /$allele\s([0-9.]+)/i){

$freqs{$items[0]} = $1;

}elsif($items[2] =~ /$allele\s([0-9.]+)/i){

$freqs{$items[0]} = $1;

}

return join ';', map { exists $freqs{$_} ? "$_:$freqs{$_}" : "$_:NO" } @pops;

}

alleles = sub {return shift->attributes('alleles')}

ref_strand = sub {shift->strand}

font2color = #0000FF

bgcolor = red

#stacked = 1

label density = 50

bump density = 250

key = HapMap genotyped SNPs

link = http://www.hapmap.org/cgi-perl/snp_details?name=$name&source=hapmap_B36

link_target = _blank

label = sub{ my $self = shift;

my $s = $self->strand;

my $n = $self->name;

return $n if $s == 0;

my $m = "+" if $s > 0;

$m = "-" if $s < 0;

return $n . "(" . $m . ")";

}

description = 1

height = 21

citation = SNPs in dbSNP genotyped by the <a href="http://www.hapmap.org">HapMap Project</a> (Phase 2, B36).

category = Variation

[KSJgt]

feature = KSJgt:KSJ

glyph = allele_tower

alleles =sub{ my $f= shift; return $f->attributes('allele');}

ref_strand = sub{shift->strand}

minor_allele = sub {

my $f = shift;

my @alleles = split /\//,$f->attributes('allele');

my ($ref_allele) = $f->attributes('ref');

return $alleles[0] eq $ref_allele ? $alleles[1] : $alleles[0];

}

maf = sub{my $f = shift;

my ($ref_cnt) = $f->attributes('sup1');

my ($oth_cnt) = $f->attributes('sup2');

if( $oth_cnt eq "" ){

$oth_cnt = $ref_cnt;

$ref_cnt = "0";

}

return 1- $ref_cnt/($ref_cnt+$oth_cnt);

}

label = 1

label density = 100

bump density = 225

fgcolor = black

bgcolor = blue

bump = 1

font2color = blue

key = KSJ genotypes

category = Variation

citation = Genotypes of KSJ.

link = http://www.koreagenome.org/pgp/?chr=$ref&f=$start&$w=120

$ref는 템플릿 이름, $start는 시작 포지션, $end는 끝 포지션

[KSJgt:100000]

feature = KSJgt:KSJ

glyph = triangle

point = 1

orient = N

bump density = 500

height = 5

label = 1

key = KSJ genotypes

category = Variation

GFF 파일 만들기

GFF 파일은 9개의 칼럼으로 구성되어진다. 칼럼들은 각각

템플릿, 소스정보, feature, 시작, 끝, 스코어, 방향, phase, family정보와 description 이다. 이들 칼럼은 반드시 TAB으로 구분이 된다.

예제1) GFF verion 1 (일반적인 사용)

chr1 Solexa MPSS_Cluster 18976 18992 . + . chr1 chr1.2_0

chr1 Solexa MPSS_Cluster 72000 72027 . + . chr1 chr1.7_0

chr1 Solexa MPSS_Cluster 149895 149913 . + . chr1 chr1.13_0

마지막에 위치한 chr1.2_0은 뷰어상에 표시될 수 있는 이름이 위치한다. Chr1과 chr1.2_0 사이는 공백한칸으로 처리한다.

예제2-1) GFF version 3 스플라이싱 모양 만들기

##gff-version 3

chr14 UCSC_1 mRNA 91108541 91111136 . - . ID=NM_001080113;Alias=PP8961;Note=hypothetical protein LOC650662

chr14 UCSC_1 three_prime_UTR 91108541 91110276 . - . Parent=NM_001080113

chr14 UCSC_1 CDS 91110277 91110709 . - . Parent=NM_001080113

chr14 UCSC_1 five_prime_UTR 91110710 91111136 . - . Parent=NM_001080113

conf 파일에 이 refgene부분은 processed_transcript라 해 두었다. 이것은 자동으로 위의 것들이 하나의 셋임을 인식하게 한다. 이것은 첫번째 로우에 ID의 정보가 아래의 Parent 정보와 일치하기 때문에 하나로 인식하는 것이다. UTR이라는 키워드와 CDS라는 키워드를 인식하여 뷰어상에 구분지어 나타낸다.

예제2-2) GFF version 3 스플라이싱 모양 만들기

##gff-version 3

chr1 Ensembl mRNA 24416 25944 . - . ID=ENST00000379481

chr1 Ensembl UTR 24416 25000 . - . Parent=ENST00000379481

chr1 Ensembl CDS 25000 25037 . - . Parent=ENST00000379481

chr1 Ensembl CDS 25139 25344 . - . Parent=ENST00000379481

chr1 Ensembl CDS 25583 25599 . - . Parent=ENST00000379481

chr1 Ensembl UTR 25599 25944 . - . Parent=ENST00000379481

conf파일에 refgene은 processed_transcript:UCSC_1 그리고 ensembl은 processed_transcript:Ensembl 이라 표시해두었기 때문에 둘은 구분된다.

예제3) GFF version 1 EST 스플라이싱 만들기

##gff-version 1

chr1 UCSC_mRNA match 3283 4270 . + . Target BC070227

chr1 UCSC_mRNA HSP 3283 3820 . + . Target BC070227

chr1 UCSC_mRNA HSP 3821 4122 . + . Target BC070227

chr1 UCSC_mRNA HSP 4134 4270 . + . Target BC070227

processed_transcript가 아닌 EST나 보통 mRNA처럼 CDS 영역 표시가 없고 또한 엑손간 꺽은 선이 아닌 일자로 뻗은 점선으로 표시하기 위해서는 위와 같이 사용하면 된다. Target부분이 같은 것들이 하나로 인식되게 된다.

예제4) GFF version 3 아이디가 중복될 경우

##gff-version 3

chr10 KSJ KSJgt 62221 62221 . + . ID=ksj:rs12345;allele=T/A;ref=T;sup1=5;sup2=1;

chr10 KSJ KSJgt 68106 68106 . + . ID=ksj:rs22222;allele=C/A;ref=C;sup1=4;sup2=8;

##gff-version 3

chr2 CV CVgt 2994 2994 . + . ID=vent:rs12345;allele=C/G;ref=C;

chr2 CV CVgt 5491 5491 . + . ID=vent:rs22222;allele=C/G;ref=G;

위의 두 가지의 경우 rs12345, rs22222 의 두 키가 중복되어 들어가게 된다. 그럴경우에는 key 값이 꼬여 mysql이 제대로 작동하지 않을 수 있다. 그렇기 때문에 ID=ksj:rs12345, ID=vent:rs12345 이렇게 소스를 명시하여 두 개의 키 값을 구분지어 넣을 수 있다. 이렇게 넣게되면 화면상에는 ksj나 vent 부분은 사라지고 모두 rs12345로 표시되게 된다. 또한 rs12345로 검색시에도 두 개의 결과를 모두 찾을 수 있다.

기타 주의사항 및 팁

파일기반 보다 디비기반이 수십배는 빠르다.

http://hapmap.org에서 제공하는 gbrowse파일들을 활용하면 쉽게 구성할 수 있다.

GFF3 버전일 경우에는 반드시 파일의 상단에 ##gff-version 3을 명시해주어야 한다.

새로운 glyph 파일을 만들거나 받아서 사용하고 싶을 때에는 보통 /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/Bio/Graphics/Glyph/ 디렉토리에 넣어두면 된다.

hapmap 데이터의 allele_pie_multi.pm을 사용할 때에는 반드시 HapMap에서 제공하는 bulk_load_gff.pl을 사용해야 한다.

ID가 중복되어 들어가야할 경우에는 ID 앞에 출처를 명시해준다. 예) ID=ksj:rs123456

GBrowse 로 구성된 참고할 만한 사이트

Gevab - http://www.gevab.org

HapMap - http://www.hapmap.org

koreagenome - http://www.koreagenome.org

watson genome - http://jimwatsonsequence.cshl.edu/cgi-perl/gbrowse/jwsequence/

yh genome - http://yh.genomics.org.cn/

'Bioinformatics > Genome browser' 카테고리의 다른 글

Generic Genome Browser - 예제 (0)	2013.02.04
Proxy Server 를 이용한 IGV 구동 방법 (0)	2013.02.04

Posted by 옥탑방람보

,

Proxy Server 를 이용한 IGV 구동 방법

Bioinformatics/Genome browser 2013. 2. 4. 14:58

Proxy Server 를 이용한 IGV 구동 방법

Source: 수원센터에 위치해 있는 대상서버

Destination: 메디슨빌딩에서 사용하는 PC

1. Source IP 에서 Destination IP 로 가는 8080번, 60151번 포트 양방향으로 방화벽 오픈 (방화벽 오픈은 “팀룸-유용한자료”에서 확인)

2. FreeProxy 소프트웨어 설치 – http://www.handcraftedsoftware.org

3. 시작 – 프로그램 – FreeProxy – FreeProxy Control Centre 실행

4. 창 내 Proxy 더블클릭

5. Use HTTP Authentication? 체크

6. Realm 박스에 아이디 입력

7. Done 버튼 – 예 클릭

8. Users 클릭

9. 유저 계정 및 비번 추가

10. 유저 그룹 추가 및 추가된 유저를 해당 그룹에 할당

11. Proxy 더블클릭 – Permissions – Add Resource – Type: HTTP Proxy Service – for this user group – 생성한 유저그룹 선택 – User must authenticate to gain access to this resource? 체크

12. Done – Done – Done – 예 클릭

13. Start/Stop 클릭 – Service mode 에서 Start 클릭

14. 이후 Source 서버에서 띄운 IGV에서 View – Preferences – Proxy – Use proxy 체크 – Host, Port 정보, 계정정보 입력 후 IGV 재시동

'Bioinformatics > Genome browser' 카테고리의 다른 글

Generic Genome Browser - 예제 (0)	2013.02.04
Generic Genome Browser 설치 (0)	2013.02.04

Posted by 옥탑방람보

,

파일시스템 조사 - 비교 분석

TA/Common 2013. 2. 4. 14:53

주요 12가지 파일시스템 종류 및 비교

FAT16, FAT32, exFAT, ext2, ext4, NTFS, XFS GFS2, HFS, HFS Plus, ZFS

	Creator	Max file size	Max volume size	WinXP	Win7	CentOS 5	Fedora16	Mac OS	특징
FAT16	Microsoft (MS-DOS 3.0)	2 GB	2 GB or 4 GB	Yes	Yes	Yes	Yes	Yes	파일사이즈 제한
FAT32	Microsoft (Win95)	4 GB	2 TB	Yes	Yes	Yes	Yes	Yes	파일사이즈 제한
exFAT	Microsoft (Win Vista)	127 PB	64 ZB, 512 TB recommended	XP SP2	Yes	With third party driver	With third party driver	10.6.5 and later	MBR, GPT Vista 이상
Ext2	Remy Card (Linux)	2 TB	32 TB	No	Partial (Ext2Fsd)	Yes	Yes	Yes	파일사이즈 제한
Ext3	Stephen Tweedie (Linux)	2 TB	32 TB	No	Partial (Ext2Fsd)	Yes	Yes	No	파일사이즈 제한 Win에서 사용어려움
Ext4	Various (Linux)	16 TB	1 EB	No	Partial (Ext2Fsd)	Since kernel 2.6.28	Yes	10.6.5 and later	Win에서 사용 어려움
NTFS (3.0)	Microsoft (WinNT)	16 EB	16 EB 512 TB (Win)	Yes	Yes	No (since kernel 2.2)	Yes	Read only	MBR, GPT 2 TB 이상일 경우 GPT로 구성
XFS	SGI (Linux)	8 EB	8 EB	No	No	Yes	Yes	No	Win에서 사용 못함
GFS2	Sistina (Red Hat)	8 EB	8 EB	No	No	Yes	Yes	No	Win에서 사용 못함
HFS	Apple (MacOS)	2 GB	2 TB	With third party app	With third party app	Yes	Yes	Yes	Win에서 사용어려움
HFS Plus	Apple (MaxOS 8.1)	8 EB	8 EB	With third party app	With third party app	Partial	Partial	9 and later	Win, Linux 사용 어려움
ZFS	Sun Microsystems (Solais)	16 EB	16 EB	No	No	With third party	With third party kernel module	10.5 and later	Win에서 사용 못함 Linux에서 사용 어려움

'TA > Common' 카테고리의 다른 글

원하는 사이즈를 가진 파일 만들기 (0)	2013.02.05
디스크 IO상황, 네트워크 전송상황 모니터링 하기 - dstat (0)	2013.02.05
nfs 마운트 서버 및 클라이언트 설정 (0)	2013.02.04
[ubuntu][locale] perl: warning: Please check that your locale settings: (0)	2013.02.04
nfs 파일 시스템에서 파일 lock 이 되지 않을 때 (0)	2013.02.04

Posted by 옥탑방람보

,

nfs 마운트 서버 및 클라이언트 설정

TA/Common 2013. 2. 4. 14:46

<서버>

1. /etc/exports 파일내 설정

/usr/local/apache/logs 182.192.71.20(rw,async)

/etc/init.d/nfs restart

2. /etc/hosts.allow 파일내 클라이언트 아이피 허용

3. rpcinfo -p 명령어로 필요한 데몬 떠있는지 확인

<클라이언트>

1. rpcinfo -p 명령어로 필요한 데몬 떠있는지 확인

2. /etc/init.d/nfs restart

<열려야하는 포트>

TCP 111, 2049

UDP 111, 2049, 32789

<아래는 참고 자료>

http://how-to.linuxcareer.com/how-to-configure-nfs-on-linux

1 서버 구성 요소NFS는 서버와 클라이언트 간의 구별을 명확히 해야 한다. 서버와 클라이언트 간의 요구 사항이 전혀 다르기 때문이다. 서버는 rpc.mounted, rpc.nfsd 라는 두 가지 서버로 구성되어 있다. 설정 파일은 /etc/exports 이다. 클라이언트들이 mount하여 사용할 수 있도록 몇몇 디렉토리 이하를 허용하는 작업을 export한다고 표현한다. 현재 NFS 설정의 유일한 파일이다.

2 확인 사항NFS는 RPC(Remote Procedure Call)라는 것을 사용하기 때문에 port mapper라는 특별한 서버가 먼저 떠 있는 상태여야 한다.

#rpcinfo -P
프로그램 버전 원형 포트
100000 2 tcp 111 rpcbind
...........

rpcinfo 명령을 사용하여 rpcbind가 등록되어 있는지, 그리고 mountd와 nfs가 등록되어 있는지 확인한다.
레드햇 계열에서 port mapper를 실행하고 중지하는 스크립트는 /etc/rc.d/init.d/portmap이며 NFS 스크립트는 같은 디렉토리의 nfs이다.
특정 호스트의 포트 맵핑 상황을 알고 싶을 때는 rpcinfo -p 다음에 host name 또는 IP 주소를 적어 주면 된다.

3 설정설정 파일은 전술한 바와 같이 /etc/exports 이다. exports 파일의 형식은 오리지널 SunOS의 형식과 비슷하지만, 몇 가지 옵션에서 차이가 난다.

<export할 디렉토리> <허가할 클라이언트>(옵션...)

기본 형식은 우선 가장 좌측에 허용할 디렉토리 이름이 오고, 그 다음 마운트할 수 있는 권한을 갖는 클라이언트 이름이 따른다.
괄호 안에는 몇 가지 옵션을 적는다.
# 문자로 시작하는 것은 주석이며 설정행이 여러 줄일 때는 중간행의 끝에 역슬래쉬 무자를 적는데 이는 매우 일반적인 유닉스 설정 파일의 통례이므로 그리 어렵진 않다.
클라이언트 이름은 하나의 호스트 IP 주소 또는 도메인 이름 형식으로 적을 수 있다. 또는 *.cs.foo.edu 와 같은 와일드 카드 문자인 *, ?등을 사용할 수 있다. 이렇게 적어주면 cs.foo.edu 도메인의 모든 호스트를 가리키게 되므로 a.cs.foo.edu, b.cs.foo.edu 등의 도메인 이름을 가진 호스트들은 앞에 주어진 디렉토리를 마운트하여 사용할 수 있다. 그러나 a.b.cs.foo.edu 와 같은 이름의 호스트는 해당되지 않는다. 즉, 와일드 카드 문자는 도트(.)를 포함하지 않는다.
'address/netmask' 표기법을 사용할 수도 있다. 192.168.1.0/24 는 192.168.1이라는 C클래스에 해당하는 모든 호스트를 나타낸다.

ro : 읽기 전용으로만 마운트하도록 강제한다. 기본적으로 읽기/쓰기를 허용하며 rw를 명시적으로 적어주어도 된다.
noaccess : NFS 마운트를 허용하지 않을 때 적는다. 보통 디렉토리 이하의 마운트를 허용하면서도 특정 디렉토리 이하는 제외시키고자 할 때 사용된다.
root_squash, no_root_squash : NFS 서버에도 root 사용자가 있을 것이고, NFS 클라이언트에도 root가 있을 것이다. 그러나 두 root가 같은 root가 될 순 없다. NFS 클라이언트의 root가 NFS 서버의 root 권한을 가질 수 없다. 따라서 기본값은 root_squash로 클라이언트 root는 nobody와 같은 사용자로 맵핑되어 버린다. 서버와 클라언트의 root 사용자를 일치하도록 하려면 no_root_squash라고 적으면 된다.
all_squash, no_all_squash : 기본값은 no_all_squash로서 root를 제외한 일반 사용자 ID에 대해서는 서버와 클라이언트 UID가 동일한 사용자이며 동일한 권한을 갖는다고 생각한다. 이는 root에 대한 기본 처리값과 반대이다. 그러나 all_squash를 해버리면 모든 UID, GID를 무조건 익명 사용자 ID로 매핑해 버린다.

설정 예

/ mater(rw) trusty(rw,no_root_squash)
/projects proj*.local.domain(rw)
/usr *.local.domain(ro) @trusted(rw)
/home/joe pc001(rw,all_squash,anonuid=150,anongid=100)
/pub (ro,insecure,all_squash)
/pub/private (noaccess)

변경된 설정 내용을 반영시키려면 NFS 데몬들을 다시 띄워야 한다. 또는 HUP 시그널을 보내면 된다.

#/etc/rc.d/init.d/nfs stop
#/etc/rc.d/init.d/nfs start

'TA > Common' 카테고리의 다른 글

디스크 IO상황, 네트워크 전송상황 모니터링 하기 - dstat (0)	2013.02.05
파일시스템 조사 - 비교 분석 (0)	2013.02.04
[ubuntu][locale] perl: warning: Please check that your locale settings: (0)	2013.02.04
nfs 파일 시스템에서 파일 lock 이 되지 않을 때 (0)	2013.02.04
rpmrebuild (0)	2013.02.04

Posted by 옥탑방람보

,

[ubuntu][locale] perl: warning: Please check that your locale settings:

TA/Common 2013. 2. 4. 14:46

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "ko_KR.eucKR",
        LC_ALL = "ko_KR.eurKR",
        LC_MESSAGES = "ko_KR.eucKR",
        LANG = "ko_KR.eucKR"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

$> locale-gen kr_KR kr_KR.UTF-8

$> dpkg-reconfigure locales

'TA > Common' 카테고리의 다른 글

파일시스템 조사 - 비교 분석 (0)	2013.02.04
nfs 마운트 서버 및 클라이언트 설정 (0)	2013.02.04
nfs 파일 시스템에서 파일 lock 이 되지 않을 때 (0)	2013.02.04
rpmrebuild (0)	2013.02.04
리눅스에서 pdf 를 이미지로 변환 (0)	2013.02.04

Posted by 옥탑방람보

,

옥탑방람보

'Kimps'에 해당되는 글 126건

kimps

하드디스크 스토리지 증가속도 보다 3배 빨리 증가하는 NGS 데이터 아웃풋

'Bioinformatics > Useful sources' 카테고리의 다른 글

HGP부터 2020년까지 PGM으로 가는 그림

'Bioinformatics > Useful sources' 카테고리의 다른 글

[TMAP]TMAP설치

'Bioinformatics > Biological data analysis' 카테고리의 다른 글

Generic Genome Browser - 예제

'Bioinformatics > Genome browser' 카테고리의 다른 글

Generic Genome Browser 설치

'Bioinformatics > Genome browser' 카테고리의 다른 글

Proxy Server 를 이용한 IGV 구동 방법

'Bioinformatics > Genome browser' 카테고리의 다른 글

파일시스템 조사 - 비교 분석

'TA > Common' 카테고리의 다른 글

nfs 마운트 서버 및 클라이언트 설정

'TA > Common' 카테고리의 다른 글

[ubuntu][locale] perl: warning: Please check that your locale settings:

'TA > Common' 카테고리의 다른 글

카테고리

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

달력

링크

티스토리툴바


	by 옥탑방람보