Use liftOver provided from UCSC web site.
The liftOver is designed to work between differenct assemblies of the same organism. This program does not have multiple alignment function but allows to lift genomic coordinates in the same organism. So, this requires a map.chain file which has the old genome as the target and the new genome as the query. For human genome you can get it at UCSC download page.

Usage: liftOver oldFile map.chain newFile unMapped
oldFile : BED format (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) (optional)
  ==> It works with a input file having first 3 columns only. All positions MUST be zero base.
map.chain: (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/)
newFile : output file name
unMapped: output file name for un-lifted positions
ex) ./liftOver hg18.positions hg18ToHg19.over.chain ./output/hg19.positions ./output/unmapped.positions

Download liftOver: (http://hgdownload.cse.ucsc.edu/admin/exe/)

 

Posted by 옥탑방람보
,

http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

 

 

 

Writing An Hadoop MapReduce Program In Python

by Michael G. Noll on September 21, 2007 (last updated: June 17, 2012)

In this tutorial, I will describe how to write a simple MapReduce program for Hadoop in the Python programming language.


Motivation

Even though the Hadoop framework is written in Java, programs for Hadoop need not to be coded in Java but can also be developed in other languages like Python or C++ (the latter since version 0.14.1). However, the documentation and the most prominent Python example on the Hadoop home page could make you think that youmust translate your Python code using Jython into a Java jar file. Obviously, this is not very convenient and can even be problematic if you depend on Python features not provided by Jython. Another issue of the Jython approach is the overhead of writing your Python program in such a way that it can interact with Hadoop – just have a look at the example in<HADOOP_INSTALL>/src/examples/python/WordCount.py and you see what I mean. I still recommend to have at least a look at the Jython approach and maybe even at the new C++ MapReduce API called Pipes, it’s really interesting.

Having that said, the ground is prepared for the purpose of this tutorial: writing a Hadoop MapReduce program in a more Pythonic way, i.e. in a way you should be familiar with.

What we want to do

We will write a simple MapReduce program (see also Wikipedia) for Hadoop in Python but without using Jython to translate our code to Java jar files.

Our program will mimick the WordCount example, i.e. it reads text files and counts how often words occur. The input is text files and the output is text files, each line of which contains a word and the count of how often it occured, separated by a tab.

Note: You can also use programming languages other than Python such as Perl or Ruby with the “technique” described in this tutorial. I wrote some words about what happens behind the scenes. Feel free to correct me if I’m wrong.

Prerequisites

You should have an Hadoop cluster up and running because we will get our hands dirty. If you don’t have a cluster yet, my following tutorials might help you to build one. The tutorials are tailored to Ubuntu Linux but the information does also apply to other Linux/Unix variants.

Python MapReduce Code

The “trick” behind the following Python code is that we will use HadoopStreaming (see also the wiki entry) for helping us passing data between our Map and Reduce code via STDIN (standard input) and STDOUT (standard output). We will simply use Python’s sys.stdin to read input data and print our own output to sys.stdout. That’s all we need to do because HadoopStreaming will take care of everything else! Amazing, isn’t it? Well, at least I had a “wow” experience…

Map: mapper.py

Save the following code in the file /home/hduser/mapper.py. It will read data from STDIN (standard input), split it into words and output a list of lines mapping words to their (intermediate) counts to STDOUT (standard output). The Map script will not compute an (intermediate) sum of a word’s occurrences. Instead, it will output “<word> 1″ immediately – even though the <word> might occur multiple times in the input – and just let the subsequent Reduce step do the final sum count. Of course, you can change this behavior in your own scripts as you please, but we will keep it like that in this tutorial because of didactic reasons :-)

Make sure the file has execution permission (chmod +x /home/hduser/mapper.py should do the trick) or you will run into problems.

01 #!/usr/bin/env python
02   
03 import sys
04   
05 # input comes from STDIN (standard input)
06 for line in sys.stdin:
07     # remove leading and trailing whitespace
08     line = line.strip()
09     # split the line into words
10     words = line.split()
11     # increase counters
12     for word in words:
13         # write the results to STDOUT (standard output);
14         # what we output here will be the input for the
15         # Reduce step, i.e. the input for reducer.py
16         #
17         # tab-delimited; the trivial word count is 1
18         print '%s\t%s' % (word, 1)

Reduce: reducer.py

Save the following code in the file /home/hduser/reducer.py. It will read the results of mapper.py from STDIN (standard input), and sum the occurrences of each word to a final count, and output its results to STDOUT (standard output).

Make sure the file has execution permission (chmod +x /home/hduser/reducer.py should do the trick) or you will run into problems.

01 #!/usr/bin/env python
02   
03 from operator import itemgetter
04 import sys
05   
06 current_word = None
07 current_count = 0
08 word = None
09   
10 # input comes from STDIN
11 for line in sys.stdin:
12     # remove leading and trailing whitespace
13     line = line.strip()
14   
15     # parse the input we got from mapper.py
16     word, count = line.split('\t', 1)
17   
18     # convert count (currently a string) to int
19     try:
20         count = int(count)
21     except ValueError:
22         # count was not a number, so silently
23         # ignore/discard this line
24         continue
25   
26     # this IF-switch only works because Hadoop sorts map output
27     # by key (here: word) before it is passed to the reducer
28     if current_word == word:
29         current_count += count
30     else:
31         if current_word:
32             # write result to STDOUT
33             print '%s\t%s' % (current_word, current_count)
34         current_count = count
35         current_word = word
36   
37 # do not forget to output the last word if needed!
38 if current_word == word:
39     print '%s\t%s' % (current_word, current_count)

Test your code (cat data | map | sort | reduce)

I recommend to test your mapper.py and reducer.py scripts locally before using them in a MapReduce job. Otherwise your jobs might successfully complete but there will be no job result data at all or not the results you would have expected. If that happens, most likely it was you (or me) who screwed up.

Here are some ideas on how to test the functionality of the Map and Reduce scripts.

 # very basic test
 hduser@ubuntu:~$ echo "foo foo quux labs foo bar quux" | /home/hduser/mapper.py
 foo     1
 foo     1
 quux    1
 labs    1
 foo     1
 bar     1
 quux    1
hduser@ubuntu:~$ echo "foo foo quux labs foo bar quux" | /home/hduser/mapper.py | sort -k1,1 | /home/hduser/reducer.py
 bar     1
 foo     3
 labs    1
 quux    2
 # using one of the ebooks as example input
 # (see below on where to get the ebooks)
 hduser@ubuntu:~$ cat /tmp/gutenberg/20417-8.txt | /home/hduser/mapper.py
 The     1
 Project 1
 Gutenberg       1
 EBook   1
 of      1
 [...]
 (you get the idea)

Running the Python Code on Hadoop

Download example input data

We will use three ebooks from Project Gutenberg for this example:

Download each ebook as text files in Plain Text UTF-8 encoding and store the files in a temporary directory of choice, for example /tmp/gutenberg.

hduser@ubuntu:~$ ls -l /tmp/gutenberg/
total 3604
-rw-r--r-- 1 hduser hadoop  674566 Feb  3 10:17 pg20417.txt
-rw-r--r-- 1 hduser hadoop 1573112 Feb  3 10:18 pg4300.txt
-rw-r--r-- 1 hduser hadoop 1423801 Feb  3 10:18 pg5000.txt
hduser@ubuntu:~$

Copy local example data to HDFS

Before we run the actual MapReduce job, we first have to copy the files from our local file system to Hadoop’s HDFS.

hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /tmp/gutenberg /user/hduser/gutenberg
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -ls
Found 1 items
drwxr-xr-x   - hduser supergroup          0 2010-05-08 17:40 /user/hduser/gutenberg
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -ls /user/hduser/gutenberg
Found 3 items
-rw-r--r--   3 hduser supergroup     674566 2011-03-10 11:38 /user/hduser/gutenberg/pg20417.txt
-rw-r--r--   3 hduser supergroup    1573112 2011-03-10 11:38 /user/hduser/gutenberg/pg4300.txt
-rw-r--r--   3 hduser supergroup    1423801 2011-03-10 11:38 /user/hduser/gutenberg/pg5000.txt
hduser@ubuntu:/usr/local/hadoop$

Run the MapReduce job

Now that everything is prepared, we can finally run our Python MapReduce job on the Hadoop cluster. As I said above, we useHadoopStreaming for helping us passing data between our Map and Reduce code via STDIN (standard input) and STDOUT (standard output).

hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar -file /home/hduser/mapper.py -mapper /home/hduser/mapper.py -file /home/hduser/reducer.py -reducer /home/hduser/reducer.py -input /user/hduser/gutenberg/* -output /user/hduser/gutenberg-output

If you want to modify some Hadoop settings on the fly like increasing the number of Reduce tasks, you can use the -Doption:

hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar -D mapred.reduce.tasks=16 ...

An important note about mapred.map.tasksHadoop does not honor mapred.map.tasks beyond considering it a hint. But it accepts the user specified mapred.reduce.tasks and doesn’t manipulate that. You cannot force mapred.map.tasks but can specify mapred.reduce.tasks.

The job will read all the files in the HDFS directory /user/hduser/gutenberg, process it, and store the results in the HDFS directory /user/hduser/gutenberg-output. In general Hadoop will create one output file per reducer; in our case however it will only create a single file because the input files are very small.

Example output of the previous command in the console:

hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar -mapper /home/hduser/mapper.py -reducer /home/hduser/reducer.py -input /user/hduser/gutenberg/* -output /user/hduser/gutenberg-output
 additionalConfSpec_:null
 null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
 packageJobJar: [/app/hadoop/tmp/hadoop-unjar54543/]
 [] /tmp/streamjob54544.jar tmpDir=null
 [...] INFO mapred.FileInputFormat: Total input paths to process : 7
 [...] INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local]
 [...] INFO streaming.StreamJob: Running job: job_200803031615_0021
 [...]
 [...] INFO streaming.StreamJob:  map 0%  reduce 0%
 [...] INFO streaming.StreamJob:  map 43%  reduce 0%
 [...] INFO streaming.StreamJob:  map 86%  reduce 0%
 [...] INFO streaming.StreamJob:  map 100%  reduce 0%
 [...] INFO streaming.StreamJob:  map 100%  reduce 33%
 [...] INFO streaming.StreamJob:  map 100%  reduce 70%
 [...] INFO streaming.StreamJob:  map 100%  reduce 77%
 [...] INFO streaming.StreamJob:  map 100%  reduce 100%
 [...] INFO streaming.StreamJob: Job complete: job_200803031615_0021
 [...] INFO streaming.StreamJob: Output: /user/hduser/gutenberg-output
hduser@ubuntu:/usr/local/hadoop$

As you can see in the output above, Hadoop also provides a basic web interface for statistics and information. When the Hadoop cluster is running, go to http://localhost:50030/ and browse around. Here’s a screenshot of the Hadoop web interface for the job we just ran.

A screenshot of Hadoop's web interface, showing the details of the MapReduce job we just ran.

Check if the result is successfully stored in HDFS directory /user/hduser/gutenberg-output:

hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -ls /user/hduser/gutenberg-output
 Found 1 items
 /user/hduser/gutenberg-output/part-00000     <r 1>   903193  2007-09-21 13:00
 hduser@ubuntu:/usr/local/hadoop$

You can then inspect the contents of the file with the dfs -cat command:

hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -cat /user/hduser/gutenberg-output/part-00000
 "(Lo)cra"       1
 "1490   1
 "1498," 1
 "35"    1
 "40,"   1
 "A      2
 "AS-IS".        2
 "A_     1
 "Absoluti       1
 [...]
 hduser@ubuntu:/usr/local/hadoop$

Note that in this specific output above the quote signs (“) enclosing the words have not been inserted by Hadoop. They are the result of how our Python code splits words, and in this case it matched the beginning of a quote in the ebook texts. Just inspect the part-00000 file further to see it for yourself.

Improved Mapper and Reducer code: using Python iterators and generators

The Mapper and Reducer examples above should have given you an idea of how to create your first MapReduce application. The focus was code simplicity and ease of understanding, particularly for beginners of the Python programming language. In a real-world application however, you might want to optimize your code by using Python iterators and generators (an even better introduction in PDF) as some readers have pointed out.

Generally speaking, iterators and generators (functions that create iterators, for example with Python’s yield statement) have the advantage that an element of a sequence is not produced until you actually need it. This can help a lot in terms of computational expensiveness or memory consumption depending on the task at hand.

Note: The following Map and Reduce scripts will only work “correctly” when being run in the Hadoop context, i.e. as Mapper and Reducer in a MapReduce job. This means that running the naive test “cat DATA | ./mapper.py | sort -k1,1 | ./reducer.py” will not work correctly anymore because some functionality is intentionally outsourced to Hadoop.

Precisely, we compute the sum of a word’s occurrences, e.g. (“foo”, 4), only if by chance the same word (“foo”) appears multiple times in succession. In the majority of cases, however, we let the Hadoop group the (key, value) pairs between the Map and the Reduce step because Hadoop is more efficient in this regard than our simple Python scripts.

mapper.py

01 #!/usr/bin/env python
02 """A more advanced Mapper, using Python iterators and generators."""
03   
04 import sys
05   
06 def read_input(file):
07     for line in file:
08         # split the line into words
09         yield line.split()
10   
11 def main(separator='\t'):
12     # input comes from STDIN (standard input)
13     data = read_input(sys.stdin)
14     for words in data:
15         # write the results to STDOUT (standard output);
16         # what we output here will be the input for the
17         # Reduce step, i.e. the input for reducer.py
18         #
19         # tab-delimited; the trivial word count is 1
20         for word in words:
21             print '%s%s%d' % (word, separator, 1)
22   
23 if __name__ == "__main__":
24     main()

reducer.py

01 #!/usr/bin/env python
02 """A more advanced Reducer, using Python iterators and generators."""
03   
04 from itertools import groupby
05 from operator import itemgetter
06 import sys
07   
08 def read_mapper_output(file, separator='\t'):
09     for line in file:
10         yield line.rstrip().split(separator, 1)
11   
12 def main(separator='\t'):
13     # input comes from STDIN (standard input)
14     data = read_mapper_output(sys.stdin, separator=separator)
15     # groupby groups multiple word-count pairs by word,
16     # and creates an iterator that returns consecutive keys and their group:
17     #   current_word - string containing a word (the key)
18     #   group - iterator yielding all ["<current_word>", "<count>"] items
19     for current_word, group in groupby(data, itemgetter(0)):
20         try:
21             total_count = sum(int(count) for current_word, count in group)
22             print "%s%s%d" % (current_word, separator, total_count)
23         except ValueError:
24             # count was not a number, so silently discard this item
25             pass
26   
27 if __name__ == "__main__":
28     main()
Posted by 옥탑방람보
,
Posted by 옥탑방람보
,

부모 둘다: WT hom
자식(환자): het

부모에서는 전혀 인자가 발견되지 않는데 환자에게서 새롭게 발견되는 변이이다.
이 경우는 사실상 MIE(medelian inheritance error) 에 해당되지만, 동시에 자식에게서 스스로 발생된 새로운 변이로 볼 수 있다. 그리고 이 변이가 실제 병으로 표현되는 인자라면 우성변이가 새롭게 생긴 위치라 할 수 있다.
Posted by 옥탑방람보
,
한 사이트에 대해서

부모 둘다: Het
자식(환자): Homo variant


부, 모는 모두 환자가 아닌데 환자인 자식에게서 병의 요인으로 나타나는 위치를 찾는다.
즉 부,모 에서는 Het 로 존재하여 열성으로 존재하여 병으로 표현되지 않지만, 부,모 각 열성인자를 하나씩 받은 자식에게서는 병으로 표현되는 경우.
Posted by 옥탑방람보
,

Compound heterozygosity in medical genetics is the condition of having two heterogeneous recessive alleles at a particular locus that can cause genetic disease in a heterozygous state.

부, 모, 자1(환자), 자2(환자 or 정상)

조합 1
부 (site1/site2): WT Hom/Het
모 (site1/site2): Het/WT Hom
자1 (site1/site2): Het/Het

조합 2
부 (site1/site2): Het/WT Hom
모 (site1/site2): WT Hom/Het
자1 (site1/site2): Het/Het

환자일 경우 반드시 하나의 유전자에서 발견된 2개의 변이에서 모두 Het 여야함.
부모는 각각 한 사이트씩 Het 여야함.
단. 부모 중 한 사람이라도 Variant Hom 일수 없음.

형제(자2)까지 고려하여 확인할 경우 경우
형제가 환자일 경우 Het/Het 여야함.
환자가 아닐 경우 WT Hom/WT Hom. WT Hom/Het, Het/WT Hom 세가지 경우가 가능.


이 compound heterozygous 는 하나의 위치(유전자)에서 두개의 het 가 병으로 표현되는 인자를 이른다. 따라서 환자가 아닌 부모에서는 이러한 병으로 갈 수 있는 인자가 동시에 존재하면 안된다. 즉 부모는 각각은 Het 를 하나씩가지면서 WT Hom 을 가지고 있어  site1은 부모 한쪽에서 변이를, site2는 다른 쪽에서 변이를 받아 두개의 변이가 동시에 하나의 병으로 표현되는 것을 찾는 것이다. 
Posted by 옥탑방람보
,

Picard를 이용하여 MarkDuplicates 를 수행하는 데 있어서,

"Value was put into PairIntoMap more than once" 라는 에러메시지가 발생하고 프로세스가 죽는 경우 입니다.

 

이 문제에 대해 해결할 수 있는 방안을 찾은 것 같습니다.

이 문제는 BAM파일 내의 리드 정보가 first read와 second read 사이의 flag 및 맵핑 위치 정보가 서로 맞지 않아 발생되는 오류로 확인하였습니다.

이것은 앞서 시도한 BWA -A 옵션을 줌으로써 mate에 대한 정보를 보정해주지 않는 것이 문제를 일으킨것 같습니다.

 

따라서 BWA -A 을 수행한 후에는

samtools fixmate input.bam output.bam

을 수행한 다음 이 output.bam 을 이용해 MarkDuplicates 를 수행하면 됩니다.
 

또 다른 경우로 밤파일들을 머지 한 데이터를 사용할 경우 머지한 밤파일내에 리드아이디가 중복해서 존재하게 되면 또한 위와 같은 에러가 발생합니다.
이러한 경우는 머지를 할때 samtools merge 를 사용할 경우입니다.
Picard의 MergeSamFiles.jar 를 이용하면 개발 bam들이 동일한 RG ID를 갖고 있으면, 자동으로 RG_ID 뒤에  ".1" 과 같이 숫자 인덱스를 만들어주게 됩니다. 보통 bioscope 등을 이용하여 맵핑한 bam 파일들은 RG_ID 가 default라는 명칭으로 되어 있습니다.

따라서 정리하면 위와 같은 에러메시지가 발생할 경우

1. "Value was put into PairIntoMap more than once"  와 함께 나타나는 리드 아이디가 중복해서 존재하는지 확인한다.
2. 중복해서 존재할 경우에는 merge 하는 툴을 Picard를 이용한다.
3. 중복해서 존재하지 않을 경우에는 samtools fixmate 를 실행한 후 재분석한다.


이상//

Posted by 옥탑방람보
,


bwa 를 이용한 pairing 에서 감당되지 않는 많은 시간이 소요되는 현상(4~5일 경과까지 지켜봄)이 나타납니다.

자세히 현상을 확인해본 결과

데이터 자체가 Pair-end 의 두 쌍의 데이터가 현저하게 떨어져 (insert size) 맵핑이 될때 문제가 발생되는 것으로 보입니다.

(아마 이것은 로우 데이터 자체에 문제가 있는 경우가 아닌가 생각됩니다.)

 

자세히 보면,

bwa 는 한쪽 end 가 unmapped 된 경우 이를 사용자가 넣은 max insert size 로 보정하는 단계가 있는 것으로 확인됩니다.

이로써 전체 데이터의 mapping rate, properly paired rate을 향상 시키고 잇는 것으로 확인됩니다.

간단하게 테스트 결과

read1, read2 에 각각 25만개의 리드가 있을 경우 chr22 레퍼런스에 맵핑하여 확인결과

1. max insert size 로 보정하는 단계를 스킵 (-A 옵션) 시킨경우

==> 198999 mapped (3.98%), 18295 properly paired (0.37%) 로 나타나지만,

2. max insert size (500) 로 보정할 경우

==> 207601 mapped (4.15%), 37214 properly paired (0.74%) 로 나타났습니다.

이 단계를 스킵하게 되면 전체적인 맵핑 통계수치가 낮게 나타나지만 속도 측면에서 큰 효과를 볼 수 있습니다.

특히 이 경우 처럼 Pair-end 의 두 쌍의 데이터가 현저하게 떨어져 맵핑이 된 상태의 데이터는 이 단계를 스킵으로 결과가 떨어지지만, 이 단계를 거치게 되면 일주일(?) 이상 소요됩니다.

Posted by 옥탑방람보
,

1. Boost : www.boost.org
If you do not have Boost, it is fairly easy to install it. Just download source package and unpack it somewhere. No compilation is necessary. Define environmental variable BOOST to the path to the Boost base directory. i.e, all the Boost header files should be accessible at $BOOST/boost.

2. GSL : GNU Scientific Library
Version 1.0 or later is required (The lastest version, currently 1.3, is recommended. It is fee so there is no reason not to upgrade). You should have script gsl-config on your PATH. Try the command

> gsl-config --version
1.3

to make sure you have the right version of GSL installed.

<Installation>
> tar xvf hotspotter-1.2.1.tar.gz
> cd hotspotter-1.0
> env BOOST=/path/to/boost make     (/usr/include/boost)


------------------
그냥 위와 같이 하면 에러가 발생한다.
 소스에 기본적으로 2가지 에러가 있다.
1. hotspotter/mll/stat/cfunc.hpp  파일에 오타가 있다. 211번째 라인에 있는 endl 을 end 로 변경해주어야 한다.
2. hostpotter/src/minim1D.hpp 파일에 헤더가 하나 빠져 있다.  include assert.h 를 해주어야 한다.

Posted by 옥탑방람보
,
밤파일 만들때 레퍼런스랑 지금 돌리는 레퍼런스랑 다를 경우 발생되는 에러이다.


<에러메세지>
org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Invalid sequence number 24
 at org.broadinstitute.sting.gatk.datasources.reads.GATKBAMIndex.<init>(GATKBAMIndex.java:100)
 at org.broadinstitute.sting.gatk.datasources.reads.BAMSchedule.<init>(BAMSchedule.java:105)
 at org.broadinstitute.sting.gatk.datasources.reads.BAMScheduler.getNextOverlappingBAMScheduleEntry(BAMScheduler.java:205)
 at org.broadinstitute.sting.gatk.datasources.reads.BAMScheduler.advance(BAMScheduler.java:108)
 at org.broadinstitute.sting.gatk.datasources.reads.BAMScheduler.next(BAMScheduler.java:79)
 at org.broadinstitute.sting.gatk.datasources.reads.BAMScheduler.next(BAMScheduler.java:47)
 at net.sf.picard.util.PeekableIterator.advance(PeekableIterator.java:71)
 at net.sf.picard.util.PeekableIterator.next(PeekableIterator.java:57)
 at org.broadinstitute.sting.gatk.datasources.reads.LowMemoryIntervalSharder.next(LowMemoryIntervalSharder.java:61)
 at org.broadinstitute.sting.gatk.datasources.reads.LowMemoryIntervalSharder.next(LowMemoryIntervalSharder.java:36)
 at org.broadinstitute.sting.gatk.datasources.reads.LocusShardStrategy.next(LocusShardStrategy.java:142)
 at org.broadinstitute.sting.gatk.datasources.reads.LocusShardStrategy.next(LocusShardStrategy.java:42)
 at org.broadinstitute.sting.utils.threading.GenomeLocProcessingTracker$OwnershipIterator$1.doBody(GenomeLocProcessingTracker.java:258)
 at org.broadinstitute.sting.utils.threading.GenomeLocProcessingTracker$OwnershipIterator$1.doBody(GenomeLocProcessingTracker.java:250)
 at org.broadinstitute.sting.utils.threading.GenomeLocProcessingTracker$WithLock.run(GenomeLocProcessingTracker.java:423)
 at org.broadinstitute.sting.utils.threading.GenomeLocProcessingTracker$OwnershipIterator.next(GenomeLocProcessingTracker.java:250)
 at org.broadinstitute.sting.utils.threading.GenomeLocProcessingTracker$OwnershipIterator.next(GenomeLocProcessingTracker.java:213)
 at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:53)
 at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:235)
 at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:117)
 at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:221)
 at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:87)

==>

This is a terrible error message with what I suspect is a fairly simple cause: I think you're running with a different reference (or different version of the same reference) than the reference to which your BAMs were initially aligned. Can you check to make sure that the version of the reference you're using matches the BAMs exactly?
(http://getsatisfaction.com/gsa/topics/unifiedgenotyper_problem_with_nt_option)
Posted by 옥탑방람보
,