sge job dependency

TA/Cluster 2014. 7. 22. 13:41

Job arrays ( -t )

A job array is an option to qsub in SGE.

It allows a user to submit a series of jobs at once, over a range of values provided to the script in the variable $SGE_TASK_ID

Example

The most common application of this in genetics might be to submit the same job once per chromosome.

There are at least two ways to accomplish such a task:

  • write a script taking parameter $1 to designate a chromosome, and submit as 22 individual jobs
  • $ for i in {1..22}; do qsub -N anal_chr${i} script.sh $i; done
  • write a script using $SGE_TASK_ID to designate a chromosome, and submit as a Job Array
  • $ qsub -N analysis -t 1-22 script.sh

The two methods are equivalent in terms of efficiency; however, using Job Arrays provides several benefits in terms of job management and tracking.

  • a job array has a single $JOB_ID, so the entire array of jobs can be referenced at once by SGE commands such as qalter and qdel.

  • a job array has more options for job control than a single submit, allowing for dependencies to be established between groups of jobs.

A good description and tutorial on job arrays can be found on the SGE website

Job dependencies with arrays ( -hold_jid )

Job dependencies allow one to specify that one job should not be run until another job completes.

One can use job dependencies as follows :

  • In a two-step process such as imputation, where the second step depends on the results of the first
  • - Splitting one long job into two smaller jobs helps the queue scheduler be more efficient
  • - One can allocate resources to each job separately. Often, one step requires more or less memory than the other.
  • To avoid clogging the queue with a large number of jobs
  • - job dependencies can effectively limit the number of running jobs independent of the number of jobs submitted.

Example (two-step process)

Let's suppose one has two scripts: step1.sh and step2.sh

One can make step2.sh dependent on step1.sh as follows :

  • $ qsub step1.sh
     . Your job 12357 ('step1.sh') has been submitted
    
    $ qsub -hold_jid 12357 step2.sh
     . Your job 12358 ('step2.sh') has been submitted

One could also capture the step1_jid to be used in the step2 submit, as follows :

  • $ step1id=`qsub -terse step1.sh`; qsub -hold_jid $step1id step2.sh

Job array dependencies are designed for the case where one wants to repeat such a dependency over a range of values (such as once per chromosome). These are discussed in more detail, below.

Example (avoid clogging queue)

Another useful application of -hold_jid is to avoid flooding the queue with a large number of jobs at once. This is particularly useful when working with job-arrays (each of which can hold a large number of jobs).

If, for example, one has 100 jobs to submit but a MaxUjobs of 40, one can submit these all at once using a combination of arrays and -hold_jid.

The process looks like this :

  • (1) split the 100 jobs into 3 arrays : 1-33, 34-66, 67-100
  • (2) submit each set as an array, making each array dependent on the previous array
  • # submit first array, capture JOB_ID for the array in $jid
    jid=`qsub -terse -t 1-33 script.sh | sed -r "s/\.(.*)//"`
    
    # submit second array, dependent on completion of the first
    jid=`qsub -terse -t 34-66 -hold_jid $jid script.sh | sed -r "s/\.(.*)//"`
    
    # submit third array, dependent on completion of the second
    jid=`qsub -terse -t 67-100 -hold_jid $jid script.sh | sed -r "s/\.(.*)//"`

The behavior is that

  • tasks 1-33 will submit and run as if they were 33 separate jobs.
  • task 34 (and 35 through 66) will not start until after all tasks in the first array (1-33) complete.

Job array dependencies ( -hold_jid_ad )

Job array dependencies are quite different from job dependencies.

An array dependency is designed for the scenario where one has a two-step process running for each of 22 chromosomes. For each chromosome, step 2 should not begin until step 1 completes.

The process for this is as follows :

  • (1) submit step1 as an array (-t 1-22), where $SGE_TASK_ID denotes chromosome number.
  • (2) submit step2 as an array (also -t 1-22), dependent -hold_jid_ad on step 1

The behavior is that

  • chrom 22 step 2 depends only on chrom 22 step 1
  • chrom 11 step 2 depends only on chrom 11 step 1

This means that chrom 22 step 2 is likely to start BEFORE step 1 chrom 11 ends. This is different from the behavior with -hold_jid ; if one used -hold_jid then chrom 22 step 2 couldn't start until all the step 1 tasks (in this case, all chromosomes) had completed.

There are various types of array dependencies described on the SGE website, including batch arrays and other blocking arrangements.

Posted by 옥탑방람보
,