sge job dependency

TA/Cluster 2014. 7. 22. 13:41

Job arrays ( -t )

A job array is an option to qsub in SGE.

It allows a user to submit a series of jobs at once, over a range of values provided to the script in the variable $SGE_TASK_ID

Example

The most common application of this in genetics might be to submit the same job once per chromosome.

There are at least two ways to accomplish such a task:

  • write a script taking parameter $1 to designate a chromosome, and submit as 22 individual jobs
  • $ for i in {1..22}; do qsub -N anal_chr${i} script.sh $i; done
  • write a script using $SGE_TASK_ID to designate a chromosome, and submit as a Job Array
  • $ qsub -N analysis -t 1-22 script.sh

The two methods are equivalent in terms of efficiency; however, using Job Arrays provides several benefits in terms of job management and tracking.

  • a job array has a single $JOB_ID, so the entire array of jobs can be referenced at once by SGE commands such as qalter and qdel.

  • a job array has more options for job control than a single submit, allowing for dependencies to be established between groups of jobs.

A good description and tutorial on job arrays can be found on the SGE website

Job dependencies with arrays ( -hold_jid )

Job dependencies allow one to specify that one job should not be run until another job completes.

One can use job dependencies as follows :

  • In a two-step process such as imputation, where the second step depends on the results of the first
  • - Splitting one long job into two smaller jobs helps the queue scheduler be more efficient
  • - One can allocate resources to each job separately. Often, one step requires more or less memory than the other.
  • To avoid clogging the queue with a large number of jobs
  • - job dependencies can effectively limit the number of running jobs independent of the number of jobs submitted.

Example (two-step process)

Let's suppose one has two scripts: step1.sh and step2.sh

One can make step2.sh dependent on step1.sh as follows :

  • $ qsub step1.sh
     . Your job 12357 ('step1.sh') has been submitted
    
    $ qsub -hold_jid 12357 step2.sh
     . Your job 12358 ('step2.sh') has been submitted

One could also capture the step1_jid to be used in the step2 submit, as follows :

  • $ step1id=`qsub -terse step1.sh`; qsub -hold_jid $step1id step2.sh

Job array dependencies are designed for the case where one wants to repeat such a dependency over a range of values (such as once per chromosome). These are discussed in more detail, below.

Example (avoid clogging queue)

Another useful application of -hold_jid is to avoid flooding the queue with a large number of jobs at once. This is particularly useful when working with job-arrays (each of which can hold a large number of jobs).

If, for example, one has 100 jobs to submit but a MaxUjobs of 40, one can submit these all at once using a combination of arrays and -hold_jid.

The process looks like this :

  • (1) split the 100 jobs into 3 arrays : 1-33, 34-66, 67-100
  • (2) submit each set as an array, making each array dependent on the previous array
  • # submit first array, capture JOB_ID for the array in $jid
    jid=`qsub -terse -t 1-33 script.sh | sed -r "s/\.(.*)//"`
    
    # submit second array, dependent on completion of the first
    jid=`qsub -terse -t 34-66 -hold_jid $jid script.sh | sed -r "s/\.(.*)//"`
    
    # submit third array, dependent on completion of the second
    jid=`qsub -terse -t 67-100 -hold_jid $jid script.sh | sed -r "s/\.(.*)//"`

The behavior is that

  • tasks 1-33 will submit and run as if they were 33 separate jobs.
  • task 34 (and 35 through 66) will not start until after all tasks in the first array (1-33) complete.

Job array dependencies ( -hold_jid_ad )

Job array dependencies are quite different from job dependencies.

An array dependency is designed for the scenario where one has a two-step process running for each of 22 chromosomes. For each chromosome, step 2 should not begin until step 1 completes.

The process for this is as follows :

  • (1) submit step1 as an array (-t 1-22), where $SGE_TASK_ID denotes chromosome number.
  • (2) submit step2 as an array (also -t 1-22), dependent -hold_jid_ad on step 1

The behavior is that

  • chrom 22 step 2 depends only on chrom 22 step 1
  • chrom 11 step 2 depends only on chrom 11 step 1

This means that chrom 22 step 2 is likely to start BEFORE step 1 chrom 11 ends. This is different from the behavior with -hold_jid ; if one used -hold_jid then chrom 22 step 2 couldn't start until all the step 1 tasks (in this case, all chromosomes) had completed.

There are various types of array dependencies described on the SGE website, including batch arrays and other blocking arrangements.

Posted by 옥탑방람보
,
Parallel enviroment named by "smp" in SGE


# add "smp" parallel enviroment and correct like below

> qconf -ap smp

pe_name smp
slots 9999
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $pe_slots
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min

# add a queue group, "secondary" and add the "smp" in "pe_list" parameter

> qconf -aq secondary

 

Posted by 옥탑방람보
,
create queue
> qconf -aq [queue_name]

delete queue
> qconf -dq [queue_name]

manage queue
> qconf -mq [queue_name]

list queue
> qconf -sql

create parallel environment
> qconf -ap [pe-name]

delete parallel environment
> qconf -dp [pe-name]

manage parallel environment
> qconf -mp [pe-name]

list parallel environment
> qconf -spl

create host_list
> qconf -ah [host_list_name]

* create host_list
The default hostgroup is called @allhosts. To create a new hostgroup use the qconf -Ahgrp file_name command where the configuration file file_name has the structure described in man hostgroup.

Example:
> cat merlin0809.hgrp
group_name @merlin0809
hostlist merlin08 merlin09

>qconf -Ahgrp merlin0809.hgrp
added "@merlin0809" to host group list

> qconf -shgrpl
@allhosts
@merlin0809

> qconf -shgrp @merlin0809
group_name @merlin0809
hostlist merlin08 merlin09

The host groups are stored in the directory $SGE_ROOT/$SGE_CELL/spool/qmaster/hostgroups, e.g.:
> cat /opt/gridengine/default/spool/qmaster/hostgroups/@merlin0809
# Version: GE 6.0u7
#
# DO NOT MODIFY THIS FILE MANUALLY!
#
group_name @merlin0809
hostlist merlin08 merlin09


delete host_list
> qconf -dh [host_list_name]

manage host_list
> qconf -mhgrp [host_list_name]

list host_list
> qconf -shgrpl

list hosts in host_list
> qconf -shgrp [host_list_name]

 

Posted by 옥탑방람보
,
SGE would not read your .bashrc or .bash_profile when starting jobs.
You need to include -V or -v option.
ex) qsub -V -cwd -j y script.sh

-V
Specifies that all environment variables active within the qsub utility be exported to the context of the job.
 
-v variable[=value],...
Defines or redefines the environment variables to be exported to the execution context of the job. If the -v option is present Grid Engine will add the environment variables defined as arguments to the switch and, optionally, values of specified variables, to the execution context of the job.

You can also set as default option.
> vi /opt/gridengine/default/common/sge_request
(ADD) -V

 

Posted by 옥탑방람보
,

$root> qconf -ap serial

    slots = 9999

$root> qsub -cwd -pe serial 6 test.sh

 

'TA > Cluster' 카테고리의 다른 글

[sge] queue management in SGE  (0) 2012.12.24
[sge] SGE qsub bashrc bash_profile  (0) 2012.12.24
[sge] qsub 할때 default option 값 지정해두는 방법  (0) 2012.12.24
[sge] sge qsub, qstat, qdel  (0) 2012.12.24
[sge] sge queue 그룹 관리  (0) 2012.12.24
Posted by 옥탑방람보
,

qsub 할때 default option 값 지정해두는 방법

$> vi /opt/gridengine/default/common/sge_request

파일내에 넣어주면 된다.

 

예)

유저의 환경설정 상태를 그대로 가지고 가는 옵션 인 -V 를 이 파일내에 넣어준다.

'TA > Cluster' 카테고리의 다른 글

[sge] SGE qsub bashrc bash_profile  (0) 2012.12.24
[sge] qsub 할때 slot 수 할당하는 방법  (0) 2012.12.24
[sge] sge qsub, qstat, qdel  (0) 2012.12.24
[sge] sge queue 그룹 관리  (0) 2012.12.24
[useradd] 클러스터에서 계정 추가  (0) 2012.12.24
Posted by 옥탑방람보
,

1. job 제출

    $> qsub [jobname]  

    $> qsub -cwd [jobname]  # 현재 디렉토리에 STDOUT, STDERROR 씀.

    $> qsub -j y [jobname]  # STDOUT, STDERROR 를 같은 파일에 씀.

    $> qsub -q [queuename] [jobname] # 지정된 queuename 에만 job을 던짐.

    $> qsub -l hostname=[hostname] [jobname] # 지정된 hostname 에만 job을 던짐. 예) qsub -l hostname=compute-0-1

    $> qsub -S [path] [jobname]  # path 실행파일로 job을 실행 예) qsub -S /usr/bin/python sample.py

2. job 삭제

    $> qdel [jobid]  # 지정된 jobid 를 삭제

    $> qdel -u [userid]  # 지정된 userid 가 던진 job은 모두 삭제

    $> qdel -f  # 강제로 job을 삭제 (Eqw 등 되어진 상황에서는 sge 상 정보만 삭제하게 됨. 실제 노드에서는 돌고 있음)

3. job 모니터링

    $> qstat

    $> qstat -u "[userid]" # 지정된 userid 가 던진 job을 모니터링  예) qdel -u "*"  모든 계정에 대해 잡 모니터링

Posted by 옥탑방람보
,

1. queue 그룹 생성

    $> qconf -aq rloa.q  #생성

    $> qconf -mq rloa.q  #관리

2. hosts 그룹 생성

    $> qconf -ahgrp @rloahosts # 생성 (반드시 hostanme 앞에는 @ 가 있어야 한다)

    $> qconf -mhgrp @rloahosts # 수정

      group_name @rloahosts

      hostslist  hostname-0-0.local hostname-0-1.local hostname-0-2.local

    $> qconf -shgrp @rloahosts # 보기

    $> qconf -mq rloa.q             # 반영

      hostlist            @rloahosts

3. user 그룹 생성

    $> qconf -am rloausers # 생성

    $> qconf -mu rloausers # 수정

      name rloausers

      type ACL DEPT

      fshare 0

      oticket 0

      entries userid1,userid2,userid3

    $> qconf -mq rloa.q       # 반영

       user_lists       rloausers

Posted by 옥탑방람보
,

#!/usr/bin/env python

import kimps, drmaa

 

def argsOption():

    usage = "usage: %prog -o ./cluster_out/ ./input_dir/"

    parser = kimps.OptionParser( usage=usage )

    parser.add_option( "-o", "--cluster_out",dest="cluster_out",help="out stream of sge")

    (options, args) = parser.parse_args()

    if len(args) != 1:

        parser.error("incorrect number of arguments")

    return options, args

 

# main

options,args = argsOption()

input_dir = args[0]

 

inpath = input_dir

outpath = options.cluster_out

if not(outpath): outpath = inpath

 

s = drmaa.Session()

s.initialize()

jt = s.createJobTemplate()

joblist = []

#kimps_ctime.stime( kimps.sys._getframe().f_code.co_filename )

inlist = kimps.glob.glob( inpath + '*.sh' )

for onein in inlist:

    kimps.os.chmod( onein, 0766 )

    inname = onein.split('/')[-1]

    innametag = '.'.join(inname.split('.')[:-1])

 

    jt.remoteCommand = onein

    jt.nativeSpecification = " -V -S /bin/bash -j y -cwd -o "+outpath+innametag+".qout"

    jt.joinFiles = True

    joblist.append( s.runJob( jt ) )

 

s.synchronize( joblist, drmaa.Session.TIMEOUT_WAIT_FOREVER, False)

 

for curjob in joblist:

#   print "Collecting job "+curjob

    retval = s.wait( curjob, drmaa.Session.TIMEOUT_WAIT_FOREVER )

#   print 'Job: '+ str(retval.jobId ) + 'finished with status '+ str(retval.hasExited)

 

s.deleteJobTemplate( jt )

s.exit()

Posted by 옥탑방람보
,

#!/usr/bin/env python

import kimps, drmaa

 

def argsOption():

    usage = "usage: %prog -o ./cluster_out/ ./input_dir/"

    parser = kimps.OptionParser( usage=usage )

    parser.add_option( "-o", "--cluster_out",dest="cluster_out",help="out stream of sge")

    (options, args) = parser.parse_args()

    if len(args) != 1:

        parser.error("incorrect number of arguments")

    return options, args

 

# main

options,args = argsOption()

input_dir = args[0]

 

inpath = input_dir

outpath = options.cluster_out

if not(outpath): outpath = inpath

 

s = drmaa.Session()

s.initialize()

jt = s.createJobTemplate()

joblist = []

#kimps_ctime.stime( kimps.sys._getframe().f_code.co_filename )

inlist = kimps.glob.glob( inpath + '*.sh' )

for onein in inlist:

    kimps.os.chmod( onein, 0766 )

    inname = onein.split('/')[-1]

    innametag = '.'.join(inname.split('.')[:-1])

 

    jt.remoteCommand = onein

    jt.nativeSpecification = " -V -S ~/bin/python -j y -cwd -o "+outpath+innametag+".qo"

    jt.joinFiles = True

    joblist.append( s.runJob( jt ) )

 

s.synchronize( joblist, drmaa.Session.TIMEOUT_WAIT_FOREVER, False)

 

for curjob in joblist:

#   print "Collecting job "+curjob

    retval = s.wait( curjob, drmaa.Session.TIMEOUT_WAIT_FOREVER )

#   print 'Job: '+ str(retval.jobId ) + 'finished with status '+ str(retval.hasExited)

 

s.deleteJobTemplate( jt )

s.exit()

Posted by 옥탑방람보
,