Difference between revisions of "SGSGeneLoss"
(Created page with "== What does SGSGeneLoss depend on? == SGSGeneLoss depends on the following: * [http://www.java.com/en/ Java 1.6] or higher * [http://www.r-project.org/ R/3.1.0] * [http://source...") |
|||
Line 9: | Line 9: | ||
== Download == | == Download == | ||
* Latest Version 0.1 (29/04/2014): | * Latest Version 0.1 (29/04/2014): | ||
− | ** [http://appliedbioinformatics.com.au/download/SGSGeneLoss.tar.gz SGSGeneLoss.tar.gz] should contain | + | ** [http://appliedbioinformatics.com.au/download/SGSGeneLoss.v0.1.tar.gz SGSGeneLoss.v0.1.tar.gz] should contain |
− | *** | + | *** three main programs: SGSGeneLoss.v0.1.jar, graph_chromosomes.v0.1.R, graph_circles.v0.1.R |
− | *** readme file | + | *** readme file |
− | *** | + | *** folder with source code |
+ | |||
+ | From now on in this manual SGSGeneLoss.v0.1.jar, graph_chromosomes.v0.1.R, graph_circles.v0.1.R are referred to as SGSGeneLoss.jar, graph_chromosomes.R, graph_circles.R | ||
+ | |||
+ | To run the programs you have to use full names SGSGeneLoss.v0.1.jar, graph_chromosomes.v0.1.R, graph_circles.v0.1.R | ||
== How to install? == | == How to install? == | ||
Line 18: | Line 22: | ||
* Unpack SGSGeneLoss.tar.gz and place SGSGeneLoss.jar and all the R scripts in chosen directory/directories, for example ./my_geneloss | * Unpack SGSGeneLoss.tar.gz and place SGSGeneLoss.jar and all the R scripts in chosen directory/directories, for example ./my_geneloss | ||
* Move into ./my_geneloss and create SGSGeneLoss_lib directory (on linux: cd ./my_geneloss, mkdir SGSGeneLoss_lib directory) | * Move into ./my_geneloss and create SGSGeneLoss_lib directory (on linux: cd ./my_geneloss, mkdir SGSGeneLoss_lib directory) | ||
+ | ** The name of the lib directory is the name of the .jar file without .jar extension + _lib, so if you are using SGSGeneLoss.v0.1.jar the lib directory is: SGSGeneLoss.v0.1_lib | ||
* Download picard-tools (SGSGeneLoss was tested with picard-tools 1.89) | * Download picard-tools (SGSGeneLoss was tested with picard-tools 1.89) | ||
− | * Place picard-1.89.jar and sam-1.89.jar in ./ | + | * Place picard-1.89.jar and sam-1.89.jar in ./my_geneloss/SGSGeneLoss_lib |
* Now you are ready to run SGSGeneLoss | * Now you are ready to run SGSGeneLoss | ||
Line 123: | Line 128: | ||
graph_circles.R takes four arguments in this order: | graph_circles.R takes four arguments in this order: | ||
− | 1. file with chromosome info - chrs. | + | 1. file with chromosome info - chrs.csv from SGSGeneLoss.jar run |
2. file with chromosome order | 2. file with chromosome order | ||
− | 3. file with genes lost - graph. | + | 3. file with genes lost - graph.csv from SGSGeneLoss.jar run; it can be a comma separated list of multiple files (for example multiple samples). Circles will be drawn in the following order: |
first file in the list is the innermost circle, so if you have graph1.txt,graph2.txt,graph3.txt, order of circles will reflect order of files, starting from the inside | first file in the list is the innermost circle, so if you have graph1.txt,graph2.txt,graph3.txt, order of circles will reflect order of files, starting from the inside | ||
Line 133: | Line 138: | ||
4. output file | 4. output file | ||
− | Rscript --vanilla graph_circles.R chrs. | + | Rscript --vanilla graph_circles.R chrs.csv chrs_order.csv graph1.csv,graph2.csv,graph3.csv out.png |
== FAQ == | == FAQ == |
Revision as of 01:45, 16 June 2014
Contents
What does SGSGeneLoss depend on?
SGSGeneLoss depends on the following:
- Java 1.6 or higher
- R/3.1.0
- picard-tools
- ggplot2
- ggbio
Download
- Latest Version 0.1 (29/04/2014):
- SGSGeneLoss.v0.1.tar.gz should contain
- three main programs: SGSGeneLoss.v0.1.jar, graph_chromosomes.v0.1.R, graph_circles.v0.1.R
- readme file
- folder with source code
- SGSGeneLoss.v0.1.tar.gz should contain
From now on in this manual SGSGeneLoss.v0.1.jar, graph_chromosomes.v0.1.R, graph_circles.v0.1.R are referred to as SGSGeneLoss.jar, graph_chromosomes.R, graph_circles.R
To run the programs you have to use full names SGSGeneLoss.v0.1.jar, graph_chromosomes.v0.1.R, graph_circles.v0.1.R
How to install?
- SGSGeneLoss.tar.gz
- Unpack SGSGeneLoss.tar.gz and place SGSGeneLoss.jar and all the R scripts in chosen directory/directories, for example ./my_geneloss
- Move into ./my_geneloss and create SGSGeneLoss_lib directory (on linux: cd ./my_geneloss, mkdir SGSGeneLoss_lib directory)
- The name of the lib directory is the name of the .jar file without .jar extension + _lib, so if you are using SGSGeneLoss.v0.1.jar the lib directory is: SGSGeneLoss.v0.1_lib
- Download picard-tools (SGSGeneLoss was tested with picard-tools 1.89)
- Place picard-1.89.jar and sam-1.89.jar in ./my_geneloss/SGSGeneLoss_lib
- Now you are ready to run SGSGeneLoss
Input and output files for SGSGeneLoss.jar
- Input files:
- Sorted, indexed .bam file with sequencing reads mapped to the reference genome sequence, multiple .bam files can be provided as comma separated list
- Gff3 file with reference genome annotation, has to contain gene, mRNA and exon fields
- Output files
- Result files for each chromosome separately
- File with overall stats - stats.txt
- File with summary for all the chromosomes used - chrs.txt (this file is used by one of the R scripts)
- File with list of genes lost for all the chromosomes - graph.txt (this file is used by one of the R scripts)
Command line options for SGSGeneLoss.jar
Required:
bamPath - path to your bam file/files, has to end with / or \ bamPath=/home/my_bams/
bamFileList - a single .bam file or a comma separated list, only file names, bam and corresponding .bai files have to be in a directory provided in bamPath bamFileList=bam1.bam,bam2.bam
gffFile - location of gff3 file gffFile=/home/my_gffs/annot.gff3
outDirPath - location output directory, has to end with / or \ outDirPath=/home/my_results
Optional:
minCov - minimal coverage threshold to consider position covered [minCov=1]
chromosomeList - comma separated list of chromosomes to be used for analysis, use all, for all chromosomes [chromosomeList=all]
lostCutoff - coverage cutoff to consider gene as lost for calculating stats [lostCutoff=0.0]
covCats - coverage categories for visualization [cavCats=0,10,20,30,40,70]
extendedFmt - used extended format, additional info included in output files [regular format]
To see help run: java -jar SGSGeneLoss.jar help
Sample command
- Move into directory where SGSGeneLoss.jar is
- Please make sure that all your supplied paths end with / or \
java -Xmx4g -jar SGSGeneLoss.jar bamPath=/home/uqagnieszka/bams/ bamFileList=arabidopsis.sorted.bam gffFile=/home/gff_files/Athaliana_167_gene_exons.gff3 outDirPath=/home/uqagnieszka/results/ chromosomeList=all
java -Xmx4g -jar SGSGeneLoss.jar bamPath=/home/uqagnieszka/bams/ bamFileList=arabidopsis.sorted.bam, arabidopsis2.sorted.bam gffFile=/home/gff_files/Athaliana_167_gene_exons.gff3 outDirPath=/home/uqagnieszka/results/ chromosomeList=Chr1,Chr2 minCov=2 lostCutoff=0.05 covCats=0,2,5,10,20 extendedFmt
Output files format
All the output files are comma separated text files.
- .excov files - files with results for each chromosome (files use chromosome names as in .bam files), files come in two formats basic (default) or extended (extendedFmt)
- basic format: chromosome,ID,is_lost,start_position,end_postion,frac_exons_covered,frac_gene_covered,ave_cov_depth_exons,cov_cat,ave_cove_depth_gene
- extended format: contains additional columns with information about each of the exons
- stats.txt - file with summary information about all genes
- chrs.txt - file with summary information about chromosomes
- chr,start,end,len
- graph.txt - file with list of genes lost as determined by lostCutoff
- chr,id,start,end
Plotting results
Results are visualized using R scripts.
Two ways of visualization are possible:
- results per chromosome
- results for all chromosomes as a circular graph
Results per chromosome:
What you need:
- scripts graph_chromosomes.R, graph_main.R in the same directory
- .excov files (either basic or extended) with results from SGSGeneLoss.jar: Chr1.excov, Chr2.excov etc.
- directory (location) where files with results from SGSGeneLoss.jar: Chr1.excov, Chr2.excov etc. can be found
- file listing all the result files for which you want graphs drawn, one per line - for example graph_list.txt file which looks like this:
Chr1.excov Chr2.excov Chr3.excov
graph_chromosomes.R takes three arguments in this order:
1. location of directory where .excov file are located
2. file listing all the result files for which you want graphs drawn
3. gene loss cutoff
Rscript --vanilla graph_chromosomes.R /home/uqagnieszka/results /home/uqagnieszka/results/graph_list.txt 0.0
Summary results for all chromosomes, possibly multiple samples:
What you need:
- script graph_circles.R
- graph.txt from SGSGeneLoss.jar run
- chrs.txt from SGSGeneLoss.jar run
- file assigning numeric order to chromosomes (this is done because some chromosomes have complicated names and sorting in ASCII order does not always work) - file should look like this, chromosome names will be replaced by corresponding numbers
chrs,no chr1,1 chr2,2 chr10,10
graph_circles.R takes four arguments in this order:
1. file with chromosome info - chrs.csv from SGSGeneLoss.jar run
2. file with chromosome order
3. file with genes lost - graph.csv from SGSGeneLoss.jar run; it can be a comma separated list of multiple files (for example multiple samples). Circles will be drawn in the following order:
first file in the list is the innermost circle, so if you have graph1.txt,graph2.txt,graph3.txt, order of circles will reflect order of files, starting from the inside
4. output file
Rscript --vanilla graph_circles.R chrs.csv chrs_order.csv graph1.csv,graph2.csv,graph3.csv out.png
FAQ
- If memory consumption is a problem please consider increasing -Xmx or splitting your .bam files
Back to Main_Page