sge job dependency

TA/Cluster 2014. 7. 22. 13:41

Job arrays ( -t )

A job array is an option to qsub in SGE.

It allows a user to submit a series of jobs at once, over a range of values provided to the script in the variable $SGE_TASK_ID

Example

The most common application of this in genetics might be to submit the same job once per chromosome.

There are at least two ways to accomplish such a task:

  • write a script taking parameter $1 to designate a chromosome, and submit as 22 individual jobs
  • $ for i in {1..22}; do qsub -N anal_chr${i} script.sh $i; done
  • write a script using $SGE_TASK_ID to designate a chromosome, and submit as a Job Array
  • $ qsub -N analysis -t 1-22 script.sh

The two methods are equivalent in terms of efficiency; however, using Job Arrays provides several benefits in terms of job management and tracking.

  • a job array has a single $JOB_ID, so the entire array of jobs can be referenced at once by SGE commands such as qalter and qdel.

  • a job array has more options for job control than a single submit, allowing for dependencies to be established between groups of jobs.

A good description and tutorial on job arrays can be found on the SGE website

Job dependencies with arrays ( -hold_jid )

Job dependencies allow one to specify that one job should not be run until another job completes.

One can use job dependencies as follows :

  • In a two-step process such as imputation, where the second step depends on the results of the first
  • - Splitting one long job into two smaller jobs helps the queue scheduler be more efficient
  • - One can allocate resources to each job separately. Often, one step requires more or less memory than the other.
  • To avoid clogging the queue with a large number of jobs
  • - job dependencies can effectively limit the number of running jobs independent of the number of jobs submitted.

Example (two-step process)

Let's suppose one has two scripts: step1.sh and step2.sh

One can make step2.sh dependent on step1.sh as follows :

  • $ qsub step1.sh
     . Your job 12357 ('step1.sh') has been submitted
    
    $ qsub -hold_jid 12357 step2.sh
     . Your job 12358 ('step2.sh') has been submitted

One could also capture the step1_jid to be used in the step2 submit, as follows :

  • $ step1id=`qsub -terse step1.sh`; qsub -hold_jid $step1id step2.sh

Job array dependencies are designed for the case where one wants to repeat such a dependency over a range of values (such as once per chromosome). These are discussed in more detail, below.

Example (avoid clogging queue)

Another useful application of -hold_jid is to avoid flooding the queue with a large number of jobs at once. This is particularly useful when working with job-arrays (each of which can hold a large number of jobs).

If, for example, one has 100 jobs to submit but a MaxUjobs of 40, one can submit these all at once using a combination of arrays and -hold_jid.

The process looks like this :

  • (1) split the 100 jobs into 3 arrays : 1-33, 34-66, 67-100
  • (2) submit each set as an array, making each array dependent on the previous array
  • # submit first array, capture JOB_ID for the array in $jid
    jid=`qsub -terse -t 1-33 script.sh | sed -r "s/\.(.*)//"`
    
    # submit second array, dependent on completion of the first
    jid=`qsub -terse -t 34-66 -hold_jid $jid script.sh | sed -r "s/\.(.*)//"`
    
    # submit third array, dependent on completion of the second
    jid=`qsub -terse -t 67-100 -hold_jid $jid script.sh | sed -r "s/\.(.*)//"`

The behavior is that

  • tasks 1-33 will submit and run as if they were 33 separate jobs.
  • task 34 (and 35 through 66) will not start until after all tasks in the first array (1-33) complete.

Job array dependencies ( -hold_jid_ad )

Job array dependencies are quite different from job dependencies.

An array dependency is designed for the scenario where one has a two-step process running for each of 22 chromosomes. For each chromosome, step 2 should not begin until step 1 completes.

The process for this is as follows :

  • (1) submit step1 as an array (-t 1-22), where $SGE_TASK_ID denotes chromosome number.
  • (2) submit step2 as an array (also -t 1-22), dependent -hold_jid_ad on step 1

The behavior is that

  • chrom 22 step 2 depends only on chrom 22 step 1
  • chrom 11 step 2 depends only on chrom 11 step 1

This means that chrom 22 step 2 is likely to start BEFORE step 1 chrom 11 ends. This is different from the behavior with -hold_jid ; if one used -hold_jid then chrom 22 step 2 couldn't start until all the step 1 tasks (in this case, all chromosomes) had completed.

There are various types of array dependencies described on the SGE website, including batch arrays and other blocking arrangements.

Posted by 옥탑방람보
,

> python run.py

ctrl + z

> bg

> jobs

[1]     run.py

> disown -h %1

'TA > Common' 카테고리의 다른 글

sharedata 데렉토리 속성 정의  (0) 2013.08.26
서버 날짜 수정 - date  (0) 2013.08.26
du apparent-size  (0) 2013.07.29
bash: /dev/null: Permission denied  (0) 2013.02.05
부팅시 자동 마운트 방법  (0) 2013.02.05
Posted by 옥탑방람보
,

sge 노드 추가 (서버1 sge에 서버2, 서버3을 추가)

 

[root@cluster1 & cluster2 & cluster3] cat /etc/hosts 내에 서로 아이피 등록되어야 함

 

[root@cluster1] qconf -ae

  hostname   cluster2.local

[root@cluster1] qconf -ae

  hostname   cluster3.local

[root@cluster1] qhost

[root@cluster1] qconf -mq all.q

  cluster2.local, cluster3.local 추가

[root@cluster1] qconf -mhgrp @allhosts

  cluster2.local, cluster3.local 추가

 

[root@cluster2 & cluster3] ps aux | grep sge

   XXX sge_execd

실행되고 있다면 데몬 중지

[root@cluster2 & cluster3] /etc/init.d/sgeexecd.XXX stop

또는

[root@cluster2 & cluster3] kill $(pidof sge_execd)

 

/etc/init.d/sgeexecd.XXX 없다면 설치

[root@cluster2 & cluster3] cd /opt/gridengine/

[root@cluster2 & cluster3] ./install_execd

 

[root@cluster2 & cluster3] vi /opt/gridengine/default/common/act_qmaster

cluster1.local

 

[root@cluster2 & cluster3] qstat -f

[root@cluster1] qstat -f

Posted by 옥탑방람보
,

<NFS 서버>

# chmod 514.513 /nas/data/share

# chmod g+s /nas/data/share   # set group id 설정 (디렉토리 하위에 생성되는 모든 파일은 해당 디렉토리 그룹명을 가짐)

# setfacl -m 'd:o:r-x' /nas/data/share   # file access controll 설정 (디렉토리 하위에 생성되는 모든 파일에 대한 권한 설정 가능)

 

<NFS 클라이언트>

# vi /etc/fstab

XXX.XXX.XXX.XXX:/nas/data/share   /sharedata  nfs defaults,suid   0 0

# mount -a

 

'TA > Common' 카테고리의 다른 글

실행중인 프로세스 유지하는 방법 - disown  (0) 2014.06.30
서버 날짜 수정 - date  (0) 2013.08.26
du apparent-size  (0) 2013.07.29
bash: /dev/null: Permission denied  (0) 2013.02.05
부팅시 자동 마운트 방법  (0) 2013.02.05
Posted by 옥탑방람보
,

date -s 20130625

date +%T --set="08:00:00"

 

'TA > Common' 카테고리의 다른 글

실행중인 프로세스 유지하는 방법 - disown  (0) 2014.06.30
sharedata 데렉토리 속성 정의  (0) 2013.08.26
du apparent-size  (0) 2013.07.29
bash: /dev/null: Permission denied  (0) 2013.02.05
부팅시 자동 마운트 방법  (0) 2013.02.05
Posted by 옥탑방람보
,

du apparent-size

TA/Common 2013. 7. 29. 16:34

Apparent size is the number of bytes your applications think are in the file. It's the amount of data that would be transferred over the network (not counting protocol headers) if you decided to send the file over FTP or HTTP. It's also the result of cat theFile | wc -c, and the amount of address space that the file would take up if you loaded the whole thing using mmap.

Disk usage is the amount of space that can't be used for something else because your file is occupying that space.

In most cases, the apparent size is smaller than the disk usage because the disk usage counts the full size of the last (partial) block of the file, and apparent size only counts the data that's in that last block. However, apparent size is larger when you have a sparse file (sparse files are created when you seek somewhere past the end of the file, and then write something there -- the OS doesn't bother to create lots of blocks filled with zeros -- it only creates a block for the part of the file you decided to write to).

 

du --apparent-size  : 실제 파일의 사이즈를 보여줌 즉, 네트웍상으로 전달되는 데이터 사이즈라고 보면 됨.

일반적인 ls, du 등으로 파일 사이즈 확인시에는 해당 파일이 차지하는 블럭사이즈를 계산하게 되므로 보통 --apparent-size로 보는 것보다 크게 보임.

'TA > Common' 카테고리의 다른 글

sharedata 데렉토리 속성 정의  (0) 2013.08.26
서버 날짜 수정 - date  (0) 2013.08.26
bash: /dev/null: Permission denied  (0) 2013.02.05
부팅시 자동 마운트 방법  (0) 2013.02.05
서버가 부팅시 시작되는 서비스들 확인  (0) 2013.02.05
Posted by 옥탑방람보
,

bash: /dev/null: Permission denied 라는 에러메시지 fix 방법

The problem seems to be with the permissions of
the /dev/null. This seems to be read only at the
moment for you. Check this by logging in as root
and listing it with the command:
ls -l /dev/null
You should see this if everything is correctly
set:
crw-rw-rw- 1 root root 1, 3

If you get a different set of permissions like
this maybe:
-rw-r--r-- 1 root root 1, 3

then you should (as root) delete the /dev/null with:
rm /dev/null

and recreate it (as root) again with:
mknod -m 0666 /dev/null c 1 3

(The device number according to the Kernel source
in the documentation under Documentation/devices.txt
supposed to be Major=1 und Minor=3)

Now, list the /dev/null again and you should see
the permissions as above.

'TA > Common' 카테고리의 다른 글

서버 날짜 수정 - date  (0) 2013.08.26
du apparent-size  (0) 2013.07.29
부팅시 자동 마운트 방법  (0) 2013.02.05
서버가 부팅시 시작되는 서비스들 확인  (0) 2013.02.05
원하는 사이즈를 가진 파일 만들기  (0) 2013.02.05
Posted by 옥탑방람보
,
slave에서 외부쪽으로 ping 했을때 master 거쳐서 나가는 경우 해결방법(rocks ping master redirect)
$> rocks list attr
$> netstat -nr
위 두가지 명령어로 slave에서 확인해보면
Private Gateway와 Public Gateway가 master IP로 되어있는 경우이다.
이 경우에는 해당 slave의 gateway 정보를 변경해준다. (마스터에 등록된 public gateway를 등록한다.)
$> vi /etc/sysconfig/network
GATEWAY=XXX.XXX
$> chattr +i /etc/sysconfig/network
$> /etc/init.d/network restart (안되면 reboot)

 

'TA > Cluster' 카테고리의 다른 글

sge job dependency  (0) 2014.07.22
sge 노드 추가 (서버1 sge에 서버2, 서버3을 추가)  (0) 2013.09.05
[sge] adding a parallel enviroment in SGE  (0) 2012.12.24
[sge] queue management in SGE  (0) 2012.12.24
[sge] SGE qsub bashrc bash_profile  (0) 2012.12.24
Posted by 옥탑방람보
,

부팅시 자동 마운트 방법
1) fstab 에 기록하는 방법
$> vi /etc/fstab 에 아래 항목 추가
XXX.XXX:/source/dir /mountpoint  nfs  defaults  0  0
2) rc.local에 등록하는 방법
$> vi /etc/rc.local
mount -t nfs XXX.XXX:/source/dir
1)과 2)번 중 하나의 방법으로 가능하다. 그러나 확인 결과 어떠한 이유때문인지는 모르겠으나, 1)번의 방법으로 안되는 경우를 많이 보았다.

Posted by 옥탑방람보
,

서버가 부팅시 시작되는 서비스들 확인 (run level에 따라 표시됨)
$> chkconfig --list
원하는 서비스 삭제방법
$> chkconfig --del iptables

Posted by 옥탑방람보
,