Job arrays ( -t )
A job array is an option to qsub in SGE.
It allows a user to submit a series of jobs at once, over a range of values provided to the script in the variable $SGE_TASK_ID
Example
The most common application of this in genetics might be to submit the same job once per chromosome.
There are at least two ways to accomplish such a task:
- write a script taking parameter $1 to designate a chromosome, and submit as 22 individual jobs
$ for i in {1..22}; do qsub -N anal_chr${i} script.sh $i; done
- write a script using $SGE_TASK_ID to designate a chromosome, and submit as a Job Array
$ qsub -N analysis -t 1-22 script.sh
The two methods are equivalent in terms of efficiency; however, using Job Arrays provides several benefits in terms of job management and tracking.
-
a job array has a single $JOB_ID, so the entire array of jobs can be referenced at once by SGE commands such as qalter and qdel.
- a job array has more options for job control than a single submit, allowing for dependencies to be established between groups of jobs.
A good description and tutorial on job arrays can be found on the SGE website
Job dependencies with arrays ( -hold_jid )
Job dependencies allow one to specify that one job should not be run until another job completes.
One can use job dependencies as follows :
- In a two-step process such as imputation, where the second step depends on the results of the first
- - Splitting one long job into two smaller jobs helps the queue scheduler be more efficient
- - One can allocate resources to each job separately. Often, one step requires more or less memory than the other.
- To avoid clogging the queue with a large number of jobs
- - job dependencies can effectively limit the number of running jobs independent of the number of jobs submitted.
Example (two-step process)
Let's suppose one has two scripts: step1.sh and step2.sh
One can make step2.sh dependent on step1.sh as follows :
$ qsub step1.sh . Your job 12357 ('step1.sh') has been submitted $ qsub -hold_jid 12357 step2.sh . Your job 12358 ('step2.sh') has been submitted
One could also capture the step1_jid to be used in the step2 submit, as follows :
$ step1id=`qsub -terse step1.sh`; qsub -hold_jid $step1id step2.sh
Job array dependencies are designed for the case where one wants to repeat such a dependency over a range of values (such as once per chromosome). These are discussed in more detail, below.
Example (avoid clogging queue)
Another useful application of -hold_jid is to avoid flooding the queue with a large number of jobs at once. This is particularly useful when working with job-arrays (each of which can hold a large number of jobs).
If, for example, one has 100 jobs to submit but a MaxUjobs of 40, one can submit these all at once using a combination of arrays and -hold_jid.
The process looks like this :
- (1) split the 100 jobs into 3 arrays : 1-33, 34-66, 67-100
- (2) submit each set as an array, making each array dependent on the previous array
# submit first array, capture JOB_ID for the array in $jid jid=`qsub -terse -t 1-33 script.sh | sed -r "s/\.(.*)//"` # submit second array, dependent on completion of the first jid=`qsub -terse -t 34-66 -hold_jid $jid script.sh | sed -r "s/\.(.*)//"` # submit third array, dependent on completion of the second jid=`qsub -terse -t 67-100 -hold_jid $jid script.sh | sed -r "s/\.(.*)//"`
The behavior is that
- tasks 1-33 will submit and run as if they were 33 separate jobs.
-
task 34 (and 35 through 66) will not start until after all tasks in the first array (1-33) complete.
Job array dependencies ( -hold_jid_ad )
Job array dependencies are quite different from job dependencies.
An array dependency is designed for the scenario where one has a two-step process running for each of 22 chromosomes. For each chromosome, step 2 should not begin until step 1 completes.
The process for this is as follows :
- (1) submit step1 as an array (-t 1-22), where $SGE_TASK_ID denotes chromosome number.
- (2) submit step2 as an array (also -t 1-22), dependent -hold_jid_ad on step 1
The behavior is that
- chrom 22 step 2 depends only on chrom 22 step 1
- chrom 11 step 2 depends only on chrom 11 step 1
This means that chrom 22 step 2 is likely to start BEFORE step 1 chrom 11 ends. This is different from the behavior with -hold_jid ; if one used -hold_jid then chrom 22 step 2 couldn't start until all the step 1 tasks (in this case, all chromosomes) had completed.
There are various types of array dependencies described on the SGE website, including batch arrays and other blocking arrangements.
'TA > Cluster' 카테고리의 다른 글
sge 노드 추가 (서버1 sge에 서버2, 서버3을 추가) (0) | 2013.09.05 |
---|---|
slave에서 외부쪽으로 ping 했을때 master 거쳐서 나가는 경우 해결방법(rocks ping master redirect) (0) | 2013.02.05 |
[sge] adding a parallel enviroment in SGE (0) | 2012.12.24 |
[sge] queue management in SGE (0) | 2012.12.24 |
[sge] SGE qsub bashrc bash_profile (0) | 2012.12.24 |