Difference between revisions of "SGSSynteny"
Line 9: | Line 9: | ||
* Latest Version 0.1 (29/04/2014): | * Latest Version 0.1 (29/04/2014): | ||
** [http://appliedbioinformatics.com.au/download/SGSSynteny.v0.1.tar.gz SGSSynteny.v0.1.tar.gz] should contain | ** [http://appliedbioinformatics.com.au/download/SGSSynteny.v0.1.tar.gz SGSSynteny.v0.1.tar.gz] should contain | ||
− | *** two main programs: SGSSynteny.v0.1.jar, graph_synteny.R | + | *** two main programs: SGSSynteny.v0.1.jar, graph_synteny.v0.1.R |
*** readme file | *** readme file | ||
*** folder with source code | *** folder with source code |
Revision as of 02:16, 18 August 2014
Contents
What does SGSSynteny depend on?
SGSGeneLoss depends on the following:
- Java 1.6 or higher
- R/3.1.0
- picard-tools
- ggplot2
Download
- Latest Version 0.1 (29/04/2014):
- SGSSynteny.v0.1.tar.gz should contain
- two main programs: SGSSynteny.v0.1.jar, graph_synteny.v0.1.R
- readme file
- folder with source code
- SGSSynteny.v0.1.tar.gz should contain
From now on in this manula SGSSynteny.v0.1.jar and graph_synteny.v0.1.R are referred to as SGSSynteny.jar and graph_synteny.R
To run the programs you have to use full names SGSSynteny.v0.1.jar and graph_synteny.v0.1.R
How to install?
- SGSSynteny.tar.gz
- Unpack SGSSynteny.tar.gz and place SGSSynteny.jar and all the R scripts in chosen directory/directories, for example ./my_synteny
- Move into ./my_synteny and create SGSSynteny_lib directory (on linux: cd ./my_synteny, mkdir SGSSynteny_lib directory)
- The name of the lib directory is the name of the .jar file witout .jar extension + _lib, so if you are using SGSSynteny.v0.1.jar the lib directory is SGSSynteny.v0.1_lib
- The lib directory has to be in the same folder as the .jar file
- Download picard-tools (SGSSynteny.jar was tested with picard-tools 1.89)
- Place picard-1.89.jar and sam-1.89.jar in ./my_gene_loss/SGSSynteny_lib
- Now you are ready to run SGSSynteny
Input and output files for SGSSynteny.v0.1.jar
- Input files:
- Sorted, indexed .bam file with sequencing reads mapped to the reference genome sequence, multiple .bam files can be provided as comma separated list
- Gff3 file with reference genome annotation, has to contain gene, mRNA and exon fields
- Output files
- Result files for each chromosome separately - .cluster files
- File with overall stats - stats.csv
Command line options for SGSSynteny.jar
Required:
bamPath - path to bam file, only folder path, do not specify bam file names here, folder has to contain both .bam and .bai files; has to end with “/” or “\”
bamFileList - comma separated list of all the bam files to be used
gffFile - path to .gff3 file, including file name; has to contain at least genes and exons features
outDirPath - directory for the output files; has to end with “/” or “\”
Optional:
expectCov - expected coverage [null]
minFracHor - minimum horizontal coverage required to consider genes as syntenic [0.3]
minCovVer - minimum coverage depth required to consider genes as syntenic [2.0]
chromosomeList - comma separated list of chromosomes, used `all` for all the chromosomes in .bam file [all]
DBepsilon - Eps value for DBSCAN (radius) [26]
DBmin - minPts value for DBSCAN (min cluster size) [24]
genesOrExons - used whole genes or exons for coverage calculations [exons]
mergeDistance - distance (no of genes) separating clusters for them to be merged [30]
esimateMinCovVer - estimate min coverage depth used for clustering based on x points with highest coverage depth, esimateMinCovVer=0.45 – use 45% of points with highest coverage [null]
To see help run: java -jar SGSSynteny.jar help
Sample command
- Please make sure that all your supplied paths end with / or \
java -Xmx16g -jar SGSSynteny.jar bamPath=/home/my_bams/ gffFile=/home/references/Bdistachyon_192_gene_exons.gff3 outDirPath=/home/results/ chromosomeList=Bd1,Bd2,Bd3,Bd4,Bd5 bamFileList=my_bam.sorted.bam DBepsilon=30 DBmin=25 expectCov=500 minCovVer=2.0 minFracHor=0.4
Output files format
All the output files are comma separated text files.
- .cluster files - files with results for each chromosome (files use chromosome names as in .bam files)
- stats.csv - file with summary information about all genes
Plotting results
Results are visualized using R script.
Results per chromosome:
What you need:
- script graph_synteny.R
- .clusters files (either basic or extended) with results from SGSSynteny.jar: Chr1.clusters, Chr2.clusters etc.
- directory (location) where files with results from SGSSynteny.jar: Chr1.clusters, Chr2.clusters etc. can be found
graph_synteny.R takes three arguments in this order:
1. location of directory where .clusters file are located
2. lower limit of the Y axis
3. output path ending with /
Rscript --vanilla graph_synteny.R /home/uqagnieszka/results 0.4 /home/uqagnieszka/graphs/
FAQ
- If memory consumption is a problem please consider increasing -Xmx or splitting your .bam files
Back to Main_Page