SGSGeneLoss
Contents
What does SGSGeneLoss depend on?
SGSGeneLoss depends on the following:
- Java 1.6 or higher
- R/3.1.0
- picard-tools
- ggplot2
- ggbio
Download
- Latest Version 0.1 (29/04/2014):
- SGSGeneLoss.v0.1.tar.gz should contain
- three main programs: SGSGeneLoss.v0.1.jar, graph_chromosomes.v0.1.R, graph_circles.v0.1.R
- readme file
- folder with source code
- SGSGeneLoss.v0.1.tar.gz should contain
From now on in this manual SGSGeneLoss.v0.1.jar, graph_chromosomes.v0.1.R, graph_circles.v0.1.R are referred to as SGSGeneLoss.jar, graph_chromosomes.R, graph_circles.R
To run the programs you have to use full names SGSGeneLoss.v0.1.jar, graph_chromosomes.v0.1.R, graph_circles.v0.1.R
How to install?
- SGSGeneLoss.tar.gz
- Unpack SGSGeneLoss.tar.gz and place SGSGeneLoss.jar and all the R scripts in chosen directory/directories, for example ./my_geneloss
- Move into ./my_geneloss and create SGSGeneLoss_lib directory (on linux: cd ./my_geneloss, mkdir SGSGeneLoss_lib directory)
- The name of the lib directory is the name of the .jar file without .jar extension + _lib, so if you are using SGSGeneLoss.v0.1.jar the lib directory is: SGSGeneLoss.v0.1_lib
- Download picard-tools (SGSGeneLoss was tested with picard-tools 1.89)
- Place picard-1.89.jar and sam-1.89.jar in ./my_geneloss/SGSGeneLoss_lib
- Now you are ready to run SGSGeneLoss
Input and output files for SGSGeneLoss.jar
- Input files:
- Sorted, indexed .bam file with sequencing reads mapped to the reference genome sequence, multiple .bam files can be provided as comma separated list
- Gff3 file with reference genome annotation, has to contain gene, mRNA and exon fields
- Output files
- Result files for each chromosome separately
- File with overall stats - stats.txt
- File with summary for all the chromosomes used - chrs.txt (this file is used by one of the R scripts)
- File with list of genes lost for all the chromosomes - graph.txt (this file is used by one of the R scripts)
Command line options for SGSGeneLoss.jar
Required:
bamPath - path to your bam file/files, has to end with / or \ bamPath=/home/my_bams/
bamFileList - a single .bam file or a comma separated list, only file names, bam and corresponding .bai files have to be in a directory provided in bamPath bamFileList=bam1.bam,bam2.bam
gffFile - location of gff3 file gffFile=/home/my_gffs/annot.gff3
outDirPath - location output directory, has to end with / or \ outDirPath=/home/my_results
Optional:
minCov - minimal coverage threshold to consider position covered [minCov=1]
chromosomeList - comma separated list of chromosomes to be used for analysis, use all, for all chromosomes [chromosomeList=all]
lostCutoff - coverage cutoff to consider gene as lost for calculating stats [lostCutoff=0.0]
covCats - coverage categories for visualization [cavCats=0,10,20,30,40,70]
extendedFmt - used extended format, additional info included in output files [regular format]
To see help run: java -jar SGSGeneLoss.jar help
Sample command
- Move into directory where SGSGeneLoss.jar is
- Please make sure that all your supplied paths end with / or \
java -Xmx4g -jar SGSGeneLoss.jar bamPath=/home/uqagnieszka/bams/ bamFileList=arabidopsis.sorted.bam gffFile=/home/gff_files/Athaliana_167_gene_exons.gff3 outDirPath=/home/uqagnieszka/results/ chromosomeList=all
java -Xmx4g -jar SGSGeneLoss.jar bamPath=/home/uqagnieszka/bams/ bamFileList=arabidopsis.sorted.bam, arabidopsis2.sorted.bam gffFile=/home/gff_files/Athaliana_167_gene_exons.gff3 outDirPath=/home/uqagnieszka/results/ chromosomeList=Chr1,Chr2 minCov=2 lostCutoff=0.05 covCats=0,2,5,10,20 extendedFmt
Output files format
All the output files are comma separated text files.
- .excov files - files with results for each chromosome (files use chromosome names as in .bam files), files come in two formats basic (default) or extended (extendedFmt)
- basic format: chromosome,ID,is_lost,start_position,end_postion,frac_exons_covered,frac_gene_covered,ave_cov_depth_exons,cov_cat,ave_cove_depth_gene
- extended format: contains additional columns with information about each of the exons
- stats.txt - file with summary information about all genes
- chrs.txt - file with summary information about chromosomes
- chr,start,end,len
- graph.txt - file with list of genes lost as determined by lostCutoff
- chr,id,start,end
Plotting results
Results are visualized using R scripts.
Two ways of visualization are possible:
- results per chromosome
- results for all chromosomes as a circular graph
Results per chromosome:
What you need:
- scripts graph_chromosomes.R, graph_main.R in the same directory
- .excov files (either basic or extended) with results from SGSGeneLoss.jar: Chr1.excov, Chr2.excov etc.
- directory (location) where files with results from SGSGeneLoss.jar: Chr1.excov, Chr2.excov etc. can be found
- file listing all the result files for which you want graphs drawn, one per line - for example graph_list.txt file which looks like this:
Chr1.excov Chr2.excov Chr3.excov
graph_chromosomes.R takes three arguments in this order:
1. location of directory where .excov file are located
2. file listing all the result files for which you want graphs drawn
3. gene loss cutoff
Rscript --vanilla graph_chromosomes.R /home/uqagnieszka/results /home/uqagnieszka/results/graph_list.txt 0.0
Summary results for all chromosomes, possibly multiple samples:
What you need:
- script graph_circles.R
- graph.txt from SGSGeneLoss.jar run
- chrs.txt from SGSGeneLoss.jar run
- file assigning numeric order to chromosomes (this is done because some chromosomes have complicated names and sorting in ASCII order does not always work) - file should look like this, chromosome names will be replaced by corresponding numbers
chrs,no chr1,1 chr2,2 chr10,10
graph_circles.R takes four arguments in this order:
1. file with chromosome info - chrs.csv from SGSGeneLoss.jar run
2. file with chromosome order
3. file with genes lost - graph.csv from SGSGeneLoss.jar run; it can be a comma separated list of multiple files (for example multiple samples). Circles will be drawn in the following order:
first file in the list is the innermost circle, so if you have graph1.txt,graph2.txt,graph3.txt, order of circles will reflect order of files, starting from the inside
4. output file
Rscript --vanilla graph_circles.R chrs.csv chrs_order.csv graph1.csv,graph2.csv,graph3.csv out.png
FAQ
- If memory consumption is a problem please consider increasing -Xmx or splitting your .bam files
Back to Main_Page