Difference between revisions of "SGSSynteny"
| (3 intermediate revisions by the same user not shown) | |||
| Line 9: | Line 9: | ||
* Latest Version 0.1 (29/04/2014):  | * Latest Version 0.1 (29/04/2014):  | ||
** [http://appliedbioinformatics.com.au/download/SGSSynteny.v0.1.tar.gz SGSSynteny.v0.1.tar.gz] should contain  | ** [http://appliedbioinformatics.com.au/download/SGSSynteny.v0.1.tar.gz SGSSynteny.v0.1.tar.gz] should contain  | ||
| − | ***   | + | *** two main programs: SGSSynteny.v0.1.jar, graph_synteny.v0.1.R  | 
*** readme file    | *** readme file    | ||
*** folder with source code  | *** folder with source code  | ||
| Line 32: | Line 32: | ||
* Output files  | * Output files  | ||
** Result files for each chromosome separately - .cluster files  | ** Result files for each chromosome separately - .cluster files  | ||
| − | ** File with overall stats - stats.  | + | ** File with overall stats - stats.txt  | 
== Command line options for SGSSynteny.jar==  | == Command line options for SGSSynteny.jar==  | ||
| Line 77: | Line 77: | ||
All the output files are comma separated text files.  | All the output files are comma separated text files.  | ||
*.cluster files - files with results for each chromosome (files use chromosome names as in .bam files)  | *.cluster files - files with results for each chromosome (files use chromosome names as in .bam files)  | ||
| − | *stats.  | + | *stats.txt - file with summary information about all genes  | 
==Plotting results==  | ==Plotting results==  | ||
| Line 98: | Line 98: | ||
3. output path '''ending with /'''    | 3. output path '''ending with /'''    | ||
| − |   Rscript --vanilla graph_synteny.R /home/uqagnieszka/results 0.4 /home/uqagnieszka/graphs  | + |   Rscript --vanilla graph_synteny.R /home/uqagnieszka/results 0.4 /home/uqagnieszka/graphs/  | 
== FAQ ==  | == FAQ ==  | ||
Latest revision as of 04:20, 18 August 2014
Contents
What does SGSSynteny depend on?
SGSGeneLoss depends on the following:
- Java 1.6 or higher
 - R/3.1.0
 - picard-tools
 - ggplot2
 
Download
-  Latest Version 0.1 (29/04/2014):
-  SGSSynteny.v0.1.tar.gz should contain
- two main programs: SGSSynteny.v0.1.jar, graph_synteny.v0.1.R
 - readme file
 - folder with source code
 
 
 -  SGSSynteny.v0.1.tar.gz should contain
 
From now on in this manula SGSSynteny.v0.1.jar and graph_synteny.v0.1.R are referred to as SGSSynteny.jar and graph_synteny.R
To run the programs you have to use full names SGSSynteny.v0.1.jar and graph_synteny.v0.1.R
How to install?
- SGSSynteny.tar.gz
 - Unpack SGSSynteny.tar.gz and place SGSSynteny.jar and all the R scripts in chosen directory/directories, for example ./my_synteny
 -  Move into ./my_synteny and create SGSSynteny_lib directory (on linux: cd ./my_synteny, mkdir SGSSynteny_lib directory)
- The name of the lib directory is the name of the .jar file witout .jar extension + _lib, so if you are using SGSSynteny.v0.1.jar the lib directory is SGSSynteny.v0.1_lib
 - The lib directory has to be in the same folder as the .jar file
 
 - Download picard-tools (SGSSynteny.jar was tested with picard-tools 1.89)
 - Place picard-1.89.jar and sam-1.89.jar in ./my_gene_loss/SGSSynteny_lib
 - Now you are ready to run SGSSynteny
 
Input and output files for SGSSynteny.v0.1.jar
-  Input files:
- Sorted, indexed .bam file with sequencing reads mapped to the reference genome sequence, multiple .bam files can be provided as comma separated list
 - Gff3 file with reference genome annotation, has to contain gene, mRNA and exon fields
 
 -  Output files
- Result files for each chromosome separately - .cluster files
 - File with overall stats - stats.txt
 
 
Command line options for SGSSynteny.jar
Required:
bamPath - path to bam file, only folder path, do not specify bam file names here, folder has to contain both .bam and .bai files; has to end with “/” or “\”
bamFileList - comma separated list of all the bam files to be used
gffFile - path to .gff3 file, including file name; has to contain at least genes and exons features
outDirPath - directory for the output files; has to end with “/” or “\”
Optional:
expectCov - expected coverage [null]
minFracHor - minimum horizontal coverage required to consider genes as syntenic [0.3]
minCovVer - minimum coverage depth required to consider genes as syntenic [2.0]
chromosomeList - comma separated list of chromosomes, used `all` for all the chromosomes in .bam file [all]
DBepsilon - Eps value for DBSCAN (radius) [26]
DBmin - minPts value for DBSCAN (min cluster size) [24]
genesOrExons - used whole genes or exons for coverage calculations [exons]
mergeDistance - distance (no of genes) separating clusters for them to be merged [30]
esimateMinCovVer - estimate min coverage depth used for clustering based on x points with highest coverage depth, esimateMinCovVer=0.45 – use 45% of points with highest coverage [null]
To see help run: java -jar SGSSynteny.jar help
Sample command
- Please make sure that all your supplied paths end with / or \
 
java -Xmx16g -jar SGSSynteny.jar bamPath=/home/my_bams/ gffFile=/home/references/Bdistachyon_192_gene_exons.gff3 outDirPath=/home/results/ chromosomeList=Bd1,Bd2,Bd3,Bd4,Bd5 bamFileList=my_bam.sorted.bam DBepsilon=30 DBmin=25 expectCov=500 minCovVer=2.0 minFracHor=0.4
Output files format
All the output files are comma separated text files.
- .cluster files - files with results for each chromosome (files use chromosome names as in .bam files)
 - stats.txt - file with summary information about all genes
 
Plotting results
Results are visualized using R script.
Results per chromosome:
What you need:
- script graph_synteny.R
 - .clusters files (either basic or extended) with results from SGSSynteny.jar: Chr1.clusters, Chr2.clusters etc.
 - directory (location) where files with results from SGSSynteny.jar: Chr1.clusters, Chr2.clusters etc. can be found
 
graph_synteny.R takes three arguments in this order:
1. location of directory where .clusters file are located
2. lower limit of the Y axis
3. output path ending with /
Rscript --vanilla graph_synteny.R /home/uqagnieszka/results 0.4 /home/uqagnieszka/graphs/
FAQ
- If memory consumption is a problem please consider increasing -Xmx or splitting your .bam files
 
Back to Main_Page