Tips for using cluster

Zihang Wang

2024-02-26 2024-03-01 2295 words 11 minutes

Contents

My own reference for using Emory Center for Systems Imaging Core (CSIC) cluster.

Note

Disclaimer: This is a personal reference for using Emory Center for Systems Imaging Core (CSIC) cluster. It is not an official nor universal guide. Please refer to the official documentation of your machine for the most accurate information.

1 Overview

The accessible computer names are csic.som.emory.edu (accessible to world).
The default login server is master1. Do not perform any computation tasks on it.
I like use node4 for most jobs. After login master1, login to node4 by ssh node4, and then type the password.
On Mac: Royal TSX for terminal, FileZilla for file transferring.
On Windows: XShell for terminal, WinSCP for file transfer. It’s more convenient on Windows. You can open and edit scripts using WinSCP without downloading them first and re-upload.

Tip

On Windows XShell, use Ctrl + Insert to copy, and Shift + Insert to paste.

2 Basic

ls list all folders/files at the current location.
cd path/ open the specified path.
cd .. open the parent path.
top or w check the server loading.
htop check the CPU and memory usage.
bash XXX.sh run bash scripts.
Rscript YYY.R run R scripts.
matlab -nodisplay -nosplash -nodesktop -r "run ZZZ.m" run Matlab scripts.
ps determine your shell type.
pwd Displays the present working directory.
echo Prints a string of text, or value of a variable to the terminal.
mkdir Create a new directory.
touch Create a new file.
rm Remove a file or directory.
cp Copy a file or directory.
mv Move or rename a file or directory.
cat Concatenate and print the contents of a file.
grep Search for a pattern in a file.
sudo Run a command with administrative privileges.
df Display the amount of disk space available.
history Show a list of previously executed commands.

3 Screen

screen open a screen to run jobs on a server.
Ctrl+A+D on Windows or Ctrl+A+D on Mac hang on the screen.
screen -ls list the screen.
screen -r XXX open the screen by specifying the screen number XXX.
screen -X -S YYY quit kill the screen by specifying the screen number YYY.
exit quit screen or current server.

4 Submit parallel jobs

qsub XXX.sh submit a job to the server.
qstat check the status of the submitted jobs.
qdel XXX delete the job by specifying the job number XXX.
qdel -u ZZZ delete all jobs submitted by the user ZZZ.

4.1 Example: submit parallel jobs for running R scripts

Simple goal: run R scripts for 400 different seeds.

4.1.1 Step 1: prepare the R script

Remember to set the seed in the R script. The seedID comes from another bash script. One template could be this test.R:

1
2
3
4
5
6
7


rm(list = ls())

setwd('XXX')
seed = seedID
set.seed(seed)

# Any R codes

4.1.2 Step 2: prepare the bash script

We need a bash script to execute the R script. One template could be this test.sh:

1
2
3
4
5
6
7
8


#!/bin/bash

#module load R/4.3

cd /your/working/path

#R --vanilla <test.R 
Rscript test_seedID.R 

4.1.3 Step 3: submit the job

We need a bash script to submit the job. One template could be this submit.sh:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


#!/bin/bash

cd /your/working/path

for seedID in $(seq 1 400)
do

sed -e s:seedID:"${seedID}":g <test.R >/your/working/path/test_${seedID}.R

sed -e s:seedID:"${seedID}":g </your/working/path/test.sh >/your/working/path/test_${seedID}.sh

qsub  -cwd -N seed${seedID} /your/working/path/test_${seedID}.sh
#qsub  -cwd -q R4.q -N seed${seedID} /your/working/path/test_${seedID}.sh

done

Note that the maximum number of jobs that can be submitted to the cluster at the same time is around 600. Otherwise, someone in other labs or the cluster manager will contact you to stop the jobs :) If more than 1000 jobs to be submitted, consider modifying the following script to submit them in batches.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186


#!/bin/bash


mkdir -p ./LOGS # create directory to save results or output
mkdir -p ./result

submit_mark=1   # counter, count the index of the nodes

        for a in $(seq 1 100); do   # index for the submitted
        
                    # set the limit for the total number of jobs sumitted by YOUR_NAME
                    num=`qstat -u YOUR_NAME | wc -l`
                    while [ $num -gt 50 ]; do   # set the maximum number of jobs to 50
                        sleep 60    # sleep 60 seconds before check the number of jobs again
                        num=`qstat -u YOUR_NAME | wc -l`
                    done
                    
                    
                    
                    if [ ! -e ./result/rec_"$a"_M.mat ]; then    # check to see if there is already an output or result

                    while [ $submit_mark -gt 0 ]; do

                        job_num=`qstat -u YOUR_NAME | grep node4 | wc -l` #190G  # count the number of jobs on node 4
                        if [[ $job_num -lt 3 ]] && [[ $submit_mark -eq 1 ]]; then # if the job_num is less than 3 (i.e. $job_num -lt 3), then you can submit jobs to it; if it is larger than 3, you will move on to the next node
                            # "-pe smp 5" request 5 computing thread
                            # "-l h=node4.cluster" request node4 
                            # i am not sure how to set the computing thread for R, make sure the computing thread your job is using matches the computing thread you requested
                            # for now, you just focus on setting the job_num limit at each node, the computing_thread limit at each node, and the totol number of job limit
                            # if you don't want to submit the job to some node, you can change the name to some other node you prefer, but do not delete the submission block
                            # if you want to skip node5, just change node5 to node6 or something else. it is ok if two blocks are sumitting to the same node
                            # you can use "nohup ./run_job.sh &" to let it run in the background, but there is a risk if your jobs are not arranged properly.
                            # if you are absolutely sure, the job arrangement or setup is correct, you can go ahead and use nohup command.
                            qsub  -R y -N rec_"$r"_"$p"_"$a" -w n -pe smp 5 -cwd -l h=node4.cluster -l mem_free=50G -l h_vmem=50G -j y -o ./LOGS/log_"$r"_"$p"_"$a" ./run_your_job.sh 5 ${r} ${p} ${a} ./result/rec_"$a"
                            submit_mark=2
                            # sleep 15 seconds to make sure the submission is complete before submitting the next job, sometimes the cluster is busy, it might take longer to complete the submission, then you need to increase the waiting time accordingly
                            sleep 15
                            break
                        fi
                        # the following block checks if the job submitted to node4 is successful. If it is successful, submit_mark would have been set to 2, then you can move on to submit jobs to node5. If it is not successful, submit_mark would still be 1, in order to move on to node5, you need to set submit_mark to 2, so that it can move on to the second block
                        # pay attension to the last block, the beginning block and the end block should match each other to form a loop
                        if [ $submit_mark -eq 1 ]; then
                            submit_mark=2
                        fi

                        job_num=`qstat -u YOUR_NAME | grep node5 | wc -l` #190G
                        if [[ $job_num -lt 3 ]] && [[ $submit_mark -eq 2 ]]; then
                            qsub  -R y -N rec_"$r"_"$p"_"$a" -w n -pe smp 5 -cwd -l h=node5.cluster -l mem_free=50G -l h_vmem=50G -j y -o ./LOGS/log_"$r"_"$p"_"$a" ./run_your_job.sh 5 ${r} ${p} ${a} ./result/rec_"$a"
                            submit_mark=3
                            sleep 15
                            break
                        fi
                        if [ $submit_mark -eq 2 ]; then
                            submit_mark=3
                        fi

                        job_num=`qstat -u YOUR_NAME | grep node6 | wc -l` #190G
                        if [[ $job_num -lt 3 ]] && [[ $submit_mark -eq 3 ]]; then
                            qsub  -R y -N rec_"$r"_"$p"_"$a" -w n -pe smp 5 -cwd -l h=node6.cluster -l mem_free=50G -l h_vmem=50G -j y -o ./LOGS/log_"$r"_"$p"_"$a" ./run_your_job.sh 5 ${r} ${p} ${a} ./result/rec_"$a"
                            submit_mark=4
                            sleep 15
                            break
                        fi
                        if [ $submit_mark -eq 3 ]; then
                            submit_mark=4
                        fi

                        job_num=`qstat -u YOUR_NAME | grep node8 | wc -l` #190G
                        if [[ $job_num -lt 2 ]] && [[ $submit_mark -eq 4 ]]; then
                            qsub  -R y -N rec_"$r"_"$p"_"$a" -w n -pe smp 5 -cwd -l h=node8.cluster -l mem_free=50G -l h_vmem=50G -j y -o ./LOGS/log_"$r"_"$p"_"$a" ./run_your_job.sh 5 ${r} ${p} ${a} ./result/rec_"$a"
                            submit_mark=5
                            sleep 15
                            break
                        fi
                        if [ $submit_mark -eq 4 ]; then
                            submit_mark=5
                        fi

                        job_num=`qstat -u YOUR_NAME | grep master2 | wc -l` #120G
                        if [[ $job_num -lt 3 ]] && [[ $submit_mark -eq 5 ]]; then
                            qsub -R y -N rec_"$r"_"$p"_"$a" -w n -pe smp 5 -cwd -l h=master2.cluster -l mem_free=50G -l h_vmem=50G -j y -o ./LOGS/log_"$r"_"$p"_"$a" ./run_your_job.sh 5 ${r} ${p} ${a} ./result/rec_"$a"
                            submit_mark=6
                            sleep 15
                            break
                        fi
                        if [ $submit_mark -eq 5 ]; then
                            submit_mark=6
                        fi

                        job_num=`qstat -u YOUR_NAME | grep node15 | wc -l` #120G
                        if [[ $job_num -lt 3 ]] && [[ $submit_mark -eq 6 ]]; then
                            qsub -R y -N rec_"$r"_"$p"_"$a" -w n -pe smp 5 -cwd -l h=node15.cluster -l mem_free=50G -l h_vmem=50G -j y -o ./LOGS/log_"$r"_"$p"_"$a" ./run_your_job.sh 5 ${r} ${p} ${a} ./result/rec_"$a"
                            submit_mark=7
                            sleep 15
                            break
                        fi
                        if [ $submit_mark -eq 6 ]; then
                            submit_mark=7
                        fi

                        job_num=`qstat -u YOUR_NAME | grep node16 | wc -l`  #120G
                        if [[ $job_num -lt 2 ]] && [[ $submit_mark -eq 7 ]]; then
                            qsub  -R y -N rec_"$r"_"$p"_"$a" -w n -pe smp 5 -cwd -l h=node16.cluster -l mem_free=50G -l h_vmem=50G -j y -o ./LOGS/log_"$r"_"$p"_"$a" ./run_your_job.sh 5 ${r} ${p} ${a} ./result/rec_"$a"
                            submit_mark=8
                            sleep 15
                            break
                        fi
                        if [ $submit_mark -eq 7 ]; then
                            submit_mark=8
                        fi

                        job_num=`qstat -u YOUR_NAME | grep node17 | wc -l`  #120G
                        if [[ $job_num -lt 2 ]] && [[ $submit_mark -eq 8 ]]; then
                            qsub  -R y -N rec_"$r"_"$p"_"$a" -w n -pe smp 5 -cwd -l h=node17.cluster -l mem_free=50G -l h_vmem=50G -j y -o ./LOGS/log_"$r"_"$p"_"$a" ./run_your_job.sh 5 ${r} ${p} ${a} ./result/rec_"$a"
                            submit_mark=9
                            sleep 15
                            break
                        fi
                        if [ $submit_mark -eq 8 ]; then
                            submit_mark=9
                        fi

                        job_num=`qstat -u YOUR_NAME | grep node18 | wc -l`  #120G
                        if [[ $job_num -lt 2 ]] && [[ $submit_mark -eq 9 ]]; then
                            qsub  -R y -N rec_"$r"_"$p"_"$a" -w n -pe smp 5 -cwd -l h=node18.cluster -l mem_free=50G -l h_vmem=50G -j y -o ./LOGS/log_"$r"_"$p"_"$a" ./run_your_job.sh 5 ${r} ${p} ${a} ./result/rec_"$a"
                            submit_mark=10
                            sleep 15
                            break
                        fi
                        if [ $submit_mark -eq 9 ]; then
                            submit_mark=10
                        fi
                        
                        job_num=`qstat -u YOUR_NAME | grep node19 | wc -l`  #120G 40CPU
                        if [[ $job_num -lt 2 ]] && [[ $submit_mark -eq 10 ]]; then
                            qsub  -R y -N rec_"$r"_"$p"_"$a" -w n -pe smp 5 -cwd -l h=node19.cluster -l mem_free=50G -l h_vmem=50G -j y -o ./LOGS/log_"$r"_"$p"_"$a" ./run_your_job.sh 5 ${r} ${p} ${a} ./result/rec_"$a"
                            submit_mark=11
                            sleep 15
                            break
                        fi
                        if [ $submit_mark -eq 10 ]; then
                            submit_mark=11
                        fi

                        job_num=`qstat -u YOUR_NAME | grep node20 | wc -l`  #120G 40CPU
                        if [[ $job_num -lt 3 ]] && [[ $submit_mark -eq 11 ]]; then
                            qsub  -R y -N rec_"$r"_"$p"_"$a" -w n -pe smp 5 -cwd -l h=node20.cluster -l mem_free=50G -l h_vmem=50G -j y -o ./LOGS/log_"$r"_"$p"_"$a" ./run_your_job.sh 5 ${r} ${p} ${a} ./result/rec_"$a"
                            submit_mark=12
                            sleep 15
                            break
                        fi
                        if [ $submit_mark -eq 11 ]; then
                            submit_mark=12
                        fi

                        job_num=`qstat -u YOUR_NAME | grep node21 | wc -l`  #190G 56CPU
                        if [[ $job_num -lt 2 ]] && [[ $submit_mark -eq 12 ]]; then
                            qsub  -R y -N rec_"$r"_"$p"_"$a" -w n -pe smp 5 -cwd -l h=node21.cluster -l mem_free=50G -l h_vmem=50G -j y -o ./LOGS/log_"$r"_"$p"_"$a" ./run_your_job.sh 5 ${r} ${p} ${a} ./result/rec_"$a"
                            submit_mark=13
                            sleep 15
                            break
                        fi
                        if [ $submit_mark -eq 12 ]; then
                            submit_mark=13
                        fi

                        # in the last block, you need to reset submit_mark to 1, so that the submission process could restart from node4 at the beginning
                        job_num=`qstat -u YOUR_NAME | grep node20 | wc -l`  #120G 40CPU
                        if [[ $job_num -lt 2 ]] && [[ $submit_mark -eq 13 ]]; then
                            qsub  -R y -N rec_"$r"_"$p"_"$a" -w n -pe smp 5 -cwd -l h=node20.cluster -l mem_free=50G -l h_vmem=50G -j y -o ./LOGS/log_"$r"_"$p"_"$a" ./run_your_job.sh 5 ${r} ${p} ${a} ./result/rec_"$a"
                            submit_mark=1
                            sleep 15
                            break
                        fi
                        # this is the last block, if the submission to node20 is successful, submit_mark would have been reset to 1. If the submission to node20 is not successful, i.e. submit_mark is still 13, you need to reset it to 1, so that the submission process could restart froom node4 at the beginning
                        # you can also add more blocks to it, make sure that the begining and the end blocks should match each other so that they form a loop
                        if [ $submit_mark -eq 13 ]; then
                            submit_mark=1
                        fi

                        sleep 15

                    done

                    fi
        done

To submit the job, first type chmod u+x submit.sh or chmod a+x submit.sh in the terminal to make the bash scripts executable. Then run ./submit.sh in the terminal.

5 Misc

. enable_fmriprep enter the environment for running fmriprep.
R enter the R environment, place to install desired R packages.
#!/bin/bash bash script header.

6 References

CSIC official website: https://www.cores.emory.edu/csic/resources/computer/index.html.
Bash Scripting Tutorial – Linux Shell Script and Command Line for Beginners