Type:
- Linux
You can map the cluster into your Nautilus directly
Go to “File->Connect to Server” and then choose SSH in “Service type”
Server: marvin.st-andrews.ac.uk
Port: 22
User name: <your user name on marvin>
Password: <your password on marvin>
If everything worked fine you have a new entry in your nautilus that you can use to pass files to there.
Note: this could be slightly different based on Nautilus version. In some versions you only need to write “ssh://marvin.st-andrews.ac.uk”
- Mac
Install Fugu and run it, OR
install cyberduck and run it.
- Microsoft Windows
Install SSH Secure Shell or cyberduck
On the SSH Secure Shell open the and create a new connection ->
Screen is a full-screen window manager that multiplexes a physical terminal between several processes (typically interactive shells).
You can open multiple screens and keep them open even if your connection drops.
- Creating a new screen to work
Just type on marvin:
Now you can perform the commands that you want. After that you can detach it to attach later.
- Detaching From Screen
Press: “Ctrl-a” “d”
- Reattach to Screen
If your connection drops or you have detached from a screen, you can re-attach by just running:
Just type:
However, if you have multiple screens type:
Hypothetical output
There are several suitable screens on:
31917.pts-5.office (Detached)
31844.pts-0.office (Detached)
If you get this, just specify the screen that you want, type:
You can save also a record of each of each window in your screen session using the “ctrl+A, :hardcopy -h mysession0.hcpy” sequence of commands.
The modules system is a way to easily load software into your path. This approach has a number of advantages including allowing for multiple versions of the software to be installed at any given time.
- Listing Available Software
To list the available software run on terminal:
This should output something like:
————————- /usr/local/Modules/versions ————————–
3.2.10
——————— /usr/local/Modules/3.2.9/modulefiles ———————
artemis/16.0.0(default) dot modules scripts
bedtools/2.17.0 EMBOSS/6.6.0(default) null seqtk/1.0-r57(default)
blastall/2.2.26(default) FASTQC/0.10.1(default) openmpi/1.6.5(default) stampy/1.0.23(default)
blastScripts/(default) gatk/3.2-2(default) paml/4.7a(default) tophat/2.0.10(default)
bowtie/1.0.0 general_script_tools/(default) picard-tools/1.118(default) trimmomatic/0.32
bowtie2/2.1.0(default) gwas python/2.7(default) use.own
bwa/0.7.7(default) HTSlib/0.0.1(default) python/3.4 vcftools/0.1.12a(default)
CEGMA/2.5(default) interproscan/5.4-47.0(default) R/2.15
cufflinks/2.1.1 mafft/7.147(default) R/3.0
cufflinks/2.2.0(default) module-info samtools/0.1.19(default)
- Using The Software
To load a module into your path, run:
You only need to add the version if you want a different version than default. So, if you wanted to load tophat default version, you would run:
If you wanted specifically the version 2.1.1, you would run:
- Showing What Software is Loaded
To show what modules you have loaded at any time, you can run:
Depending on what modules you have loaded, it will produce something like this:
Currently Loaded Modulefiles:
Currently Loaded Modulefiles:
1) modules 3) R/3.0 5) blastScripts/(default)
2) python/2.7 4) blastall/2.2.26 6) cufflinks/2.1.1
- Unloading Software
Sometimes you want to no longer have a piece of software in path. To do this you unload the module by running:
Show how to use a specific Software
- Additional Features
There are additional features and operations that can be done with the module command.
Please run the following to get more information:
There are several versions of python, though it’s “python2.7” which has biopython installed and it’s loaded as a module by default, ready for you to use. Leaving out the “2.7” will give you the wrong python version. So, to run scripts you need to type:
Or, if you want your script to function as an executable, you add a shebang on the top line like so:
And them make it executable with “chmod 755”
To open a R console you need to type:
Note: by default the R 3.2.1 is load into your path.
R Libraries already installed in R 3.2.1
-cvTools
-biocLite core
-ggplot2
-PopGenome
-cn.mops
-MCMCglmm
-boot
-R2Cuba
-mvtnorm
-glmnet
-mgcv
-gsg
-numDeriv
-nlme
-qtl (RQTL)
-onemap
-limma
-edgeR
-diveRsity
R Libraries already installed in R 3.0.2
-biocLite core
-cvTools
-ggplot2
-PopGenome
-cn.mops
-MCMCglmm
-boot
-R2Cuba
-mvtnorm
-glmnet
-mgcv
-gsg
-numDeriv
-nlme
-methylkit
-qtl (RQTL)
-onemap
-cummeRbund
-limma
-edgeR
Note: other libraries can be installed at your request.
The cluster is a shared resource, analogous to a road network which by turns sees high and low traffic. Submitting and managing jobs via scripts is at the heart of using the cluster. Software is run on the nodes in the cluster by including the command and all its options and argument in a jobscript.
Note: Please do not run ANY computationally intensive tasks on the head node. If this is done, we will have to kill your jobs, because they will slow down all other users.
- Usage Guidelines
There are a number of different queues available to cluster users. Below is a table of the resource limitations associated with each:
all.q – This is the default queue. You can use it if your job don’t have any special requirement.
lowmemory.q – This queue is for jobs that less than 64GB of RAM.
highmemory.q – This queue is for jobs that require more than 64GB RAM.
blast.q – This queue is for blast jobs.
marvin.q – This queue is to submit jobs only on marvin.
- Submitting Jobs using a Script
A script is just a set of commands that we want to make happen once the job runs. Below is an example of simple script. You can do what you want there.
You need to create this script and save it under a convenient and memorable name, such as “hnamejobscript.sh” (this name already tell us it’s a shell script for launching a -very simple – hostname job on the cluster.
#!/bin/bash
#$ -V ## pass all environment variables to the job, VERY IMPORTANT
#$ -N run_something ## job name
#$ -S /bin/bash ## shell where it will run this job
#$ -j y ## join error output to normal output
#$ -cwd ## Execute the job from the current working directory
#$ -q lowmemory.q ## queue name
uptime > myUptime.${JOB_ID}.txt
echo $HOSTNAME >> myUptime.${JOB_ID}.txt
To then proceed to have it run, we invoke the “qsub” command like so:
- Checking the jobs that are running
type:
- Deleting jobs
To remove only one job type:
To remove all your jobs type:
More information Open Grid Engine aka Sun Grid Engine aka Oracle Grid Engine
Load the module first:
Now you are able to make blasts easly.
The databases available are: [email protected]; [email protected]; human_G38.fasta and human_genomic, otherwise you need to give the complete path to your database.
If you type only:
and you get this explanation:
blastSGE <file to process> <result.xml> <blast program> <database> <E-Value> <Max matches in a query range = 0> <limit hit number> <clean directories? Y|N> <translate table (optional)>
use (only aggregate the xml files into one): blastSGE <file to process> <result.xml>
use (only aggregate the xml files into one): blastSGE <path> <result.xml>
– <file to process> ## input file
– <result.xml> ## output file
– <blast program> ## blast program (blastn|blastp|blastx)
– <database> ## database name (nr|nt|human_G38.fasta|human_genomic) otherwise you need to give the complete path to your database.
– <E-Value>; ## E-Value limit
– <Max matches in a query range = 0> ## max number of match
– <limit hit number> ## max number os hits
– <clean directories? Y|N> ## in cluster mode the input file is divided in several files and each file is runned in one node. These directories have the results off that. If something goes wrong in the middle you can start the job in the point where it broke. You can remove them at the end when everything was done.
– <translate table (optional)> ## codon translation table, not compulsory