Difference between revisions of "SGSGeneLoss"
Philippbayer (talk | contribs) |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
* [http://www.java.com/en/ Java 1.6] or higher | * [http://www.java.com/en/ Java 1.6] or higher | ||
* [http://www.r-project.org/ R/3.1.0] | * [http://www.r-project.org/ R/3.1.0] | ||
− | * [http://sourceforge.net/projects/picard/files/picard-tools/ picard-tools] | + | * [http://sourceforge.net/projects/picard/files/picard-tools/ picard-tools] v1.89 (or directly from https://sourceforge.net/projects/picard/files/picard-tools/1.89/ ) |
* [http://ggplot2.org/ ggplot2] | * [http://ggplot2.org/ ggplot2] | ||
* [http://www.bioconductor.org/packages/release/bioc/html/ggbio.html ggbio] | * [http://www.bioconductor.org/packages/release/bioc/html/ggbio.html ggbio] | ||
Line 118: | Line 118: | ||
*chrs.csv from SGSGeneLoss.jar run | *chrs.csv from SGSGeneLoss.jar run | ||
*file assigning numeric order to chromosomes (this is done because some chromosomes have complicated names and sorting in ASCII order does not always work) - file should look like this, chromosome names will be replaced by corresponding numbers | *file assigning numeric order to chromosomes (this is done because some chromosomes have complicated names and sorting in ASCII order does not always work) - file should look like this, chromosome names will be replaced by corresponding numbers | ||
− | + | chr,no | |
chr1,1 | chr1,1 | ||
chr2,2 | chr2,2 | ||
Line 142: | Line 142: | ||
* If memory consumption is a problem please consider increasing -Xmx or splitting your .bam files | * If memory consumption is a problem please consider increasing -Xmx or splitting your .bam files | ||
+ | * Please cite Golicz, A.A., Martinez, P.A., Zander, M., Patel, D.A., Van De Wouw, A.P., Visendi, P., Fitzgerald, T.L. et al. (2015) Gene loss in the fungal canola pathogen Leptosphaeria maculans. Funct. Integr. Genomics, 15, 189–196. | ||
Back to [[Main_Page]] | Back to [[Main_Page]] |
Latest revision as of 10:51, 14 February 2020
Contents
What does SGSGeneLoss depend on?
SGSGeneLoss depends on the following:
- Java 1.6 or higher
- R/3.1.0
- picard-tools v1.89 (or directly from https://sourceforge.net/projects/picard/files/picard-tools/1.89/ )
- ggplot2
- ggbio
Download
- Latest Version 0.1 (29/04/2014):
- SGSGeneLoss.v0.1.tar.gz should contain
- three main programs: SGSGeneLoss.v0.1.jar, graph_chromosomes.v0.1.R, graph_circles.v0.1.R
- readme file
- folder with source code
- SGSGeneLoss.v0.1.tar.gz should contain
From now on in this manual SGSGeneLoss.v0.1.jar, graph_chromosomes.v0.1.R, graph_circles.v0.1.R are referred to as SGSGeneLoss.jar, graph_chromosomes.R, graph_circles.R
To run the programs you have to use full names SGSGeneLoss.v0.1.jar, graph_chromosomes.v0.1.R, graph_circles.v0.1.R
How to install?
- SGSGeneLoss.tar.gz
- Unpack SGSGeneLoss.tar.gz and place SGSGeneLoss.jar and all the R scripts in chosen directory/directories, for example ./my_geneloss
- Move into ./my_geneloss and create SGSGeneLoss_lib directory (on linux: cd ./my_geneloss, mkdir SGSGeneLoss_lib directory)
- The name of the lib directory is the name of the .jar file without .jar extension + _lib, so if you are using SGSGeneLoss.v0.1.jar the lib directory is: SGSGeneLoss.v0.1_lib
- The lib directory has to be in the same folder as the .jar file
- Download picard-tools (SGSGeneLoss was tested with picard-tools 1.89)
- Place picard-1.89.jar and sam-1.89.jar in ./my_geneloss/SGSGeneLoss_lib
- Now you are ready to run SGSGeneLoss
Input and output files for SGSGeneLoss.jar
- Input files:
- Sorted, indexed .bam file with sequencing reads mapped to the reference genome sequence, multiple .bam files can be provided as comma separated list
- Gff3 file with reference genome annotation, has to contain gene, mRNA and exon fields
- Output files
- Result files for each chromosome separately - .excov
- File with overall stats - stats.txt
- File with summary for all the chromosomes used - chrs.csv (this file is used by one of the R scripts)
- File with list of genes lost for all the chromosomes - graph.csv (this file is used by one of the R scripts)
Command line options for SGSGeneLoss.jar
Required:
bamPath - path to your bam file/files, has to end with / or \ bamPath=/home/my_bams/
bamFileList - a single .bam file or a comma separated list, only file names, bam and corresponding .bai files have to be in a directory provided in bamPath bamFileList=bam1.bam,bam2.bam
gffFile - location of gff3 file gffFile=/home/my_gffs/annot.gff3
outDirPath - location output directory, has to end with / or \ outDirPath=/home/my_results
Optional:
minCov - minimal coverage threshold to consider position covered [minCov=1]
chromosomeList - comma separated list of chromosomes to be used for analysis, use all, for all chromosomes [chromosomeList=all]
lostCutoff - coverage cutoff to consider gene as lost for calculating stats [lostCutoff=0.0]
covCats - coverage categories for visualization [cavCats=0,10,20,30,40,70]
extendedFmt - used extended format, additional info included in output files [regular format]
To see help run: java -jar SGSGeneLoss.jar help
Sample command
- Move into directory where SGSGeneLoss.jar is
- Please make sure that all your supplied paths end with / or \
java -Xmx4g -jar SGSGeneLoss.jar bamPath=/home/uqagnieszka/bams/ bamFileList=arabidopsis.sorted.bam gffFile=/home/gff_files/Athaliana_167_gene_exons.gff3 outDirPath=/home/uqagnieszka/results/ chromosomeList=all
java -Xmx4g -jar SGSGeneLoss.jar bamPath=/home/uqagnieszka/bams/ bamFileList=arabidopsis.sorted.bam,arabidopsis2.sorted.bam gffFile=/home/gff_files/Athaliana_167_gene_exons.gff3 outDirPath=/home/uqagnieszka/results/ chromosomeList=Chr1,Chr2 minCov=2 lostCutoff=0.05 covCats=0,2,5,10,20 extendedFmt
Output files format
All the output files are comma separated text files.
- .excov files - files with results for each chromosome (files use chromosome names as in .bam files), files come in two formats basic (default) or extended (extendedFmt)
- basic format: chromosome,ID,is_lost,start_position,end_postion,frac_exons_covered,frac_gene_covered,ave_cov_depth_exons,cov_cat,ave_cove_depth_gene
- extended format: contains additional columns with information about each of the exons
- stats.csv - file with summary information about all genes
- chrs.csv - file with summary information about chromosomes
- chr,start,end,len
- graph.csv - file with list of genes lost as determined by lostCutoff
- chr,id,start,end
Plotting results
Results are visualized using R scripts.
Two ways of visualization are possible:
- results per chromosome
- results for all chromosomes as a circular graph
Results per chromosome:
What you need:
- script graph_chromosomes.R
- .excov files (either basic or extended) with results from SGSGeneLoss.jar: Chr1.excov, Chr2.excov etc.
- directory (location) where files with results from SGSGeneLoss.jar: Chr1.excov, Chr2.excov etc. can be found
graph_chromosomes.R takes two arguments in this order:
1. location of directory where .excov file are located
2. gene loss cutoff
3. output path ending with /
Rscript --vanilla graph_chromosomes.R /home/uqagnieszka/results 0.0 /home/uqagnieszka/graphs/
Summary results for all chromosomes, possibly multiple samples:
What you need:
- script graph_circles.R
- graph.csv from SGSGeneLoss.jar run
- chrs.csv from SGSGeneLoss.jar run
- file assigning numeric order to chromosomes (this is done because some chromosomes have complicated names and sorting in ASCII order does not always work) - file should look like this, chromosome names will be replaced by corresponding numbers
chr,no chr1,1 chr2,2 chr10,10
graph_circles.R takes five arguments in this order:
1. file with chromosome info - chrs.csv from SGSGeneLoss.jar run
2. file with chromosome order
3. file with genes lost - graph.csv from SGSGeneLoss.jar run; it can be a comma separated list of multiple files (for example multiple samples). Circles will be drawn in the following order:
first file in the list is the innermost circle, so if you have graph1.csv,graph2.csv,graph3.csv, order of circles will reflect order of files, starting from the inside
4. Output path ending with /
5. output file
Rscript --vanilla graph_circles.R chrs.csv chrs_order.csv graph1.csv,graph2.csv,graph3.csv /home/results/graphs/ out.png
FAQ
- If memory consumption is a problem please consider increasing -Xmx or splitting your .bam files
- Please cite Golicz, A.A., Martinez, P.A., Zander, M., Patel, D.A., Van De Wouw, A.P., Visendi, P., Fitzgerald, T.L. et al. (2015) Gene loss in the fungal canola pathogen Leptosphaeria maculans. Funct. Integr. Genomics, 15, 189–196.
Back to Main_Page