Supercomputing with SLURM

 
April 12, 2016.

This website explains how to use SuperComputer, SANGO, in OIST. You need OIST ID and password to see some links in this website.

Your analysis is recommended to conduct under /work directory. The compiled programes can be save under home directory (~).

Some documentations for SANGO can be found from SCS top>Getting started. The Sango cluster runs via SLURM scheduler.


job script

#!/bin/bash
#SBATCH --job-name=mp_raxml
#SBATCH --partition=compute
#SBATCH --mem=1G # 1G is used fo all cpus-per-task=2
#SBATCH --cpus-per-task=2 # max 24, corresponding to "-T 2"
#SBATCH --ntasks=1 # 1 task. Usually, 1

[Write the command line for your program.]


MAFFT

sbatch the following file, job.slurm.

sbatch job.slurm

jobscript is as follows:

#!/bin/bash
#SBATCH --job-name=mafft
#SBATCH --partition=compute
#SBATCH --mem=2G
#SBATCH --cpus-per-task=2
#SBATCH --ntasks=1 # 1 task

mafft example2.txt > out_example2.txt

mafftDir.tar.gz



RAxML

sbatch the following file, job.slurm.

sbatch job.slurm

jobscript is as follows:

#!/bin/bash
#SBATCH --job-name=job_script
#SBATCH --partition=compute
#SBATCH --mem=2G # It means total memory. "--mem-per-cpu=1G" is same meaning due to "--cpus-per-task=2"
#SBATCH --cpus-per-task=2 # max 24. It corresponds to "-T 2"
#SBATCH --ntasks=1 # 1 task

raxmlHPC-PTHREADS-SSE3 -f a -x 12345 -p 12345 -# 100 -m GTRGAMMA -s Coe5_500.PHYLIP -q part -o Scca -n outfile -T 2

slurmRAxML_WebExample5spp.tar.gz

Note: -T 2 is faster than -T 4.



RAxML via array job

Save the list of names of file and outgroup as 010_param.txt.

ENSORLP00000001536.txt Chicken_ENSGALP00000015412_CDK6
ENSP00000013070.txt Drosophila_FBpp0080532_CG15141
ENSP00000013807.txt Drosophila_FBpp0086578_Ercc1
ENSP00000215882.txt SeaSquirt_ENSCINP00000025857_NONE
ENSP00000218516.txt Anole_ENSACAP00000005195_NONE
ENSP00000221114.txt Drosophila_FBpp0080701_l37Ce
ENSP00000222250.txt Drosophila_FBpp0291104_Vdup1
ENSP00000225609.txt Drosophila_FBpp0083251_CG4433
ENSP00000231668.txt Drosophila_FBpp0078315_CG2023
ENSP00000239461.txt SeaSquirt_ENSCINP00000019608_NON

....

Then sbatch the following file, 015_raxArraySLURM.slurm.

#!/bin/bash
#SBATCH --job-name=array_raxml
#SBATCH --partition=compute
#SBATCH --mem=2G # It means total memory. "--mem-per-cpu=1G" is same meaning due to "--cpus-per-task=2"
#SBATCH --cpus-per-task=2 # max 24. It corresponds to "-T 2"
#SBATCH --ntasks=1 # 1 task
#SBATCH --array=1-100%240 # 12*20 amounts to 5% of all nodes ## --array=5,15,15200 style is also acceptable

tid=$SLURM_ARRAY_TASK_ID
params="$(head -$tid 010_param.txt | tail -1)"
param1=${params% *} # Shell Parameter Expansion.Eng,Jap.
param2=${params#* }

./raxmlHPC-PTHREADS-SSE3 -f a -x 12345 -p 12345 -# 5 -m GTRGAMMA -s 010_sequenceFileDir/$param1 -q 010_partitionFileDir/$param1 -o $param2 -n $param1 -T 2

raxmlSLURMArray.tar.gz (April 2016)


Useful commands

sbatch

Submit a job script.

squeue

Report the state of nodes managed by slurm.

squeue -u jun-inoue

Report the state of nodes for a specific user.

scancel

Cancel a running job step.


R

How to sbatch your job script

0. R script

Make a R script, test.R.

hel <- "hello"
write(hel, file="out.txt")

Method 1. Rscript
Save the following lines in a file, job.slurm, and sbatch it "sbatch job.slurm".

#!/bin/bash
#SBATCH --job-name=job_script
#SBATCH --partition=compute
#SBATCH --mem-per-cpu=1G
#SBATCH --ntasks=1

Rscript /home/j/jun-inoue/testSLURM_R/test.R

job script requiring longer time

#!/bin/bash
#SBATCH --job-name=Sn2
#SBATCH --partition=compute
#SBATCH --time=16:00:00
#SBATCH --mem-per-cpu=1G
#SBATCH --ntasks=1

Rscript /work/SinclairU/inoue/R_cal/Ens76/Sn2Dir/Tancha2.R



logfile

We can identify the required memory from the log file. Default setting of Timelimit is depending on the partition. See "SLURM partitions in Sango" for the partition information.