Difference between revisions of "SGSSynteny"

Latest revision as of 04:20, 18 August 2014

What does SGSSynteny depend on?

SGSGeneLoss depends on the following:

Download

Latest Version 0.1 (29/04/2014):
- SGSSynteny.v0.1.tar.gz should contain
  - two main programs: SGSSynteny.v0.1.jar, graph_synteny.v0.1.R
  - readme file
  - folder with source code

From now on in this manula SGSSynteny.v0.1.jar and graph_synteny.v0.1.R are referred to as SGSSynteny.jar and graph_synteny.R

To run the programs you have to use full names SGSSynteny.v0.1.jar and graph_synteny.v0.1.R

How to install?

SGSSynteny.tar.gz
Unpack SGSSynteny.tar.gz and place SGSSynteny.jar and all the R scripts in chosen directory/directories, for example ./my_synteny
Move into ./my_synteny and create SGSSynteny_lib directory (on linux: cd ./my_synteny, mkdir SGSSynteny_lib directory)
- The name of the lib directory is the name of the .jar file witout .jar extension + _lib, so if you are using SGSSynteny.v0.1.jar the lib directory is SGSSynteny.v0.1_lib
- The lib directory has to be in the same folder as the .jar file
Download picard-tools (SGSSynteny.jar was tested with picard-tools 1.89)
Place picard-1.89.jar and sam-1.89.jar in ./my_gene_loss/SGSSynteny_lib
Now you are ready to run SGSSynteny

Input and output files for SGSSynteny.v0.1.jar

Input files:
- Sorted, indexed .bam file with sequencing reads mapped to the reference genome sequence, multiple .bam files can be provided as comma separated list
- Gff3 file with reference genome annotation, has to contain gene, mRNA and exon fields
Output files
- Result files for each chromosome separately - .cluster files
- File with overall stats - stats.txt

Command line options for SGSSynteny.jar

Required:

bamPath - path to bam file, only folder path, do not specify bam file names here, folder has to contain both .bam and .bai files; has to end with “/” or “\”

bamFileList - comma separated list of all the bam files to be used

gffFile - path to .gff3 file, including file name; has to contain at least genes and exons features

outDirPath - directory for the output files; has to end with “/” or “\”

Optional:

expectCov - expected coverage [null]

minFracHor - minimum horizontal coverage required to consider genes as syntenic [0.3]

minCovVer - minimum coverage depth required to consider genes as syntenic [2.0]

chromosomeList - comma separated list of chromosomes, used `all` for all the chromosomes in .bam file [all]

DBepsilon - Eps value for DBSCAN (radius) [26]

DBmin - minPts value for DBSCAN (min cluster size) [24]

genesOrExons - used whole genes or exons for coverage calculations [exons]

mergeDistance - distance (no of genes) separating clusters for them to be merged [30]

esimateMinCovVer - estimate min coverage depth used for clustering based on x points with highest coverage depth, esimateMinCovVer=0.45 – use 45% of points with highest coverage [null]

To see help run: java -jar SGSSynteny.jar help

Sample command

Please make sure that all your supplied paths end with / or \

java -Xmx16g -jar SGSSynteny.jar bamPath=/home/my_bams/ gffFile=/home/references/Bdistachyon_192_gene_exons.gff3 outDirPath=/home/results/ chromosomeList=Bd1,Bd2,Bd3,Bd4,Bd5  bamFileList=my_bam.sorted.bam  DBepsilon=30 DBmin=25 expectCov=500 minCovVer=2.0 minFracHor=0.4

Output files format

All the output files are comma separated text files.

.cluster files - files with results for each chromosome (files use chromosome names as in .bam files)
stats.txt - file with summary information about all genes

Plotting results

Results are visualized using R script.

Results per chromosome:

What you need:

script graph_synteny.R
.clusters files (either basic or extended) with results from SGSSynteny.jar: Chr1.clusters, Chr2.clusters etc.
directory (location) where files with results from SGSSynteny.jar: Chr1.clusters, Chr2.clusters etc. can be found

graph_synteny.R takes three arguments in this order:

1. location of directory where .clusters file are located

2. lower limit of the Y axis

3. output path ending with /

Rscript --vanilla graph_synteny.R /home/uqagnieszka/results 0.4 /home/uqagnieszka/graphs/

FAQ

If memory consumption is a problem please consider increasing -Xmx or splitting your .bam files

Back to Main_Page

@@ Line 9: / Line 9: @@
 * Latest Version 0.1 (29/04/2014):
 ** [http://appliedbioinformatics.com.au/download/SGSSynteny.v0.1.tar.gz SGSSynteny.v0.1.tar.gz] should contain
-*** four main programs: SGSSynteny.v0.1.jar, graph_synteny.R
+*** two main programs: SGSSynteny.v0.1.jar, graph_synteny.v0.1.R
 *** readme file
 *** folder with source code
@@ Line 32: / Line 32: @@
 * Output files
 ** Result files for each chromosome separately - .cluster files
-** File with overall stats - stats.csv
+** File with overall stats - stats.txt
 == Command line options for SGSSynteny.jar==
@@ Line 77: / Line 77: @@
 All the output files are comma separated text files.
 *.cluster files - files with results for each chromosome (files use chromosome names as in .bam files)
-*stats.csv - file with summary information about all genes
+*stats.txt - file with summary information about all genes
 ==Plotting results==
@@ Line 98: / Line 98: @@
 . output path '''ending with /'''
-  Rscript --vanilla graph_synteny.R /home/uqagnieszka/results 0.4 /home/uqagnieszka/graphs
+  Rscript --vanilla graph_synteny.R /home/uqagnieszka/results 0.4 /home/uqagnieszka/graphs/
 == FAQ ==

Difference between revisions of "SGSSynteny"

Latest revision as of 04:20, 18 August 2014

Contents

What does SGSSynteny depend on?

Download

How to install?

Input and output files for SGSSynteny.v0.1.jar

Command line options for SGSSynteny.jar

Sample command

Output files format

Plotting results

FAQ

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools