sge job dependency

TA/Cluster 2014. 7. 22. 13:41

Job arrays ( -t )

A job array is an option to qsub in SGE.

It allows a user to submit a series of jobs at once, over a range of values provided to the script in the variable $SGE_TASK_ID

Example

The most common application of this in genetics might be to submit the same job once per chromosome.

There are at least two ways to accomplish such a task:

write a script taking parameter $1 to designate a chromosome, and submit as 22 individual jobs

$ for i in {1..22}; do qsub -N anal_chr${i} script.sh $i; done

write a script using $SGE_TASK_ID to designate a chromosome, and submit as a Job Array
```
$ qsub -N analysis -t 1-22 script.sh
```

The two methods are equivalent in terms of efficiency; however, using Job Arrays provides several benefits in terms of job management and tracking.

a job array has a single $JOB_ID, so the entire array of jobs can be referenced at once by SGE commands such as qalter and qdel.
a job array has more options for job control than a single submit, allowing for dependencies to be established between groups of jobs.

A good description and tutorial on job arrays can be found on the SGE website

http://wikis.sun.com/display/GridEngine/Submitting+Extended+Jobs+and+Advanced+Jobs#SubmittingExtendedJobsandAdvancedJobs-SubmittingArrayJobs

Job dependencies with arrays ( -hold_jid )

Job dependencies allow one to specify that one job should not be run until another job completes.

One can use job dependencies as follows :

In a two-step process such as imputation, where the second step depends on the results of the first
- Splitting one long job into two smaller jobs helps the queue scheduler be more efficient
- One can allocate resources to each job separately. Often, one step requires more or less memory than the other.
To avoid clogging the queue with a large number of jobs
- job dependencies can effectively limit the number of running jobs independent of the number of jobs submitted.

Example (two-step process)

Let's suppose one has two scripts: step1.sh and step2.sh

One can make step2.sh dependent on step1.sh as follows :

$ qsub step1.sh
 . Your job 12357 ('step1.sh') has been submitted

$ qsub -hold_jid 12357 step2.sh
 . Your job 12358 ('step2.sh') has been submitted

One could also capture the step1_jid to be used in the step2 submit, as follows :

$ step1id=`qsub -terse step1.sh`; qsub -hold_jid $step1id step2.sh

Job array dependencies are designed for the case where one wants to repeat such a dependency over a range of values (such as once per chromosome). These are discussed in more detail, below.

Example (avoid clogging queue)

Another useful application of -hold_jid is to avoid flooding the queue with a large number of jobs at once. This is particularly useful when working with job-arrays (each of which can hold a large number of jobs).

If, for example, one has 100 jobs to submit but a MaxUjobs of 40, one can submit these all at once using a combination of arrays and -hold_jid.

The process looks like this :

(1) split the 100 jobs into 3 arrays : 1-33, 34-66, 67-100
(2) submit each set as an array, making each array dependent on the previous array

# submit first array, capture JOB_ID for the array in $jid
jid=`qsub -terse -t 1-33 script.sh | sed -r "s/\.(.*)//"`

# submit second array, dependent on completion of the first
jid=`qsub -terse -t 34-66 -hold_jid $jid script.sh | sed -r "s/\.(.*)//"`

# submit third array, dependent on completion of the second
jid=`qsub -terse -t 67-100 -hold_jid $jid script.sh | sed -r "s/\.(.*)//"`

The behavior is that

tasks 1-33 will submit and run as if they were 33 separate jobs.
task 34 (and 35 through 66) will not start until after all tasks in the first array (1-33) complete.

Job array dependencies ( -hold_jid_ad )

Job array dependencies are quite different from job dependencies.

An array dependency is designed for the scenario where one has a two-step process running for each of 22 chromosomes. For each chromosome, step 2 should not begin until step 1 completes.

The process for this is as follows :

(1) submit step1 as an array (-t 1-22), where $SGE_TASK_ID denotes chromosome number.
(2) submit step2 as an array (also -t 1-22), dependent -hold_jid_ad on step 1

The behavior is that

chrom 22 step 2 depends only on chrom 22 step 1
chrom 11 step 2 depends only on chrom 11 step 1

This means that chrom 22 step 2 is likely to start BEFORE step 1 chrom 11 ends. This is different from the behavior with -hold_jid ; if one used -hold_jid then chrom 22 step 2 couldn't start until all the step 1 tasks (in this case, all chromosomes) had completed.

There are various types of array dependencies described on the SGE website, including batch arrays and other blocking arrangements.

http://wikis.sun.com/display/gridengine62u2/How+to+Configure+Array+Task+Dependencies+From+the+Command+Line

'TA > Cluster' 카테고리의 다른 글

sge 노드 추가 (서버1 sge에 서버2, 서버3을 추가) (0)	2013.09.05
slave에서 외부쪽으로 ping 했을때 master 거쳐서 나가는 경우 해결방법(rocks ping master redirect) (0)	2013.02.05
[sge] adding a parallel enviroment in SGE (0)	2012.12.24
[sge] queue management in SGE (0)	2012.12.24
[sge] SGE qsub bashrc bash_profile (0)	2012.12.24

Posted by 옥탑방람보

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

옥탑방람보