Difference between revisions of "DiffKAP"
Philippbayer (talk | contribs) |
(→Download) |
||
(3 intermediate revisions by 2 users not shown) | |||
Line 3: | Line 3: | ||
With the lack of reference assemblies currently limiting meta-transcriptome studies, we have established a Differential k-mer Analysis Pipeline (DiffKAP) for gene expression analysis, which does not require the generation of a reference for read mapping. By reducing each read to component k-mers and comparing the relative abundance of these sub-sequences, we overcome statistical limitations of whole read comparative analysis. | With the lack of reference assemblies currently limiting meta-transcriptome studies, we have established a Differential k-mer Analysis Pipeline (DiffKAP) for gene expression analysis, which does not require the generation of a reference for read mapping. By reducing each read to component k-mers and comparing the relative abundance of these sub-sequences, we overcome statistical limitations of whole read comparative analysis. | ||
− | The DiffKAP application consists of a series of scripts written in | + | The DiffKAP application consists of a series of scripts written in Perl and Linux shell scripts and requires Jellyfish [Marcais 2011] and BLASTx as well as access to a copy of a blast-formatted protein database. The scripts are freely available for non-commercial use. |
+ | |||
== What does DiffKAP depend on? == | == What does DiffKAP depend on? == | ||
Line 10: | Line 11: | ||
* blastx for sequence alignment | * blastx for sequence alignment | ||
* Some non-standard Perl modules: | * Some non-standard Perl modules: | ||
− | ** | + | ** bioperl |
*** Bio::SeqIO | *** Bio::SeqIO | ||
*** Bio::SearchIO | *** Bio::SearchIO | ||
Line 17: | Line 18: | ||
** Config::IniFiles | ** Config::IniFiles | ||
** GD::Graph::linespoints (for the script identifyKmerSize) | ** GD::Graph::linespoints (for the script identifyKmerSize) | ||
− | + | ||
− | |||
− | |||
− | |||
== Download == | == Download == | ||
* Latest Version 0.9 (23/09/2013): | * Latest Version 0.9 (23/09/2013): | ||
− | ** [http://appliedbioinformatics.com.au/download/DiffKAP_0.9.zip DiffKAP package] | + | ** [http://appliedbioinformatics.com.au/download/DiffKAP/DiffKAP_0.9.zip DiffKAP package] |
− | ** [http://appliedbioinformatics.com.au/download/sampleProj_results.tar.gz Results of the sample data] | + | ** [http://appliedbioinformatics.com.au/download/DiffKAP/DiffKAP_sampleProj_testData.tar.gz Test Data] |
+ | ** [http://appliedbioinformatics.com.au/download/DiffKAP/sampleProj_results.tar.gz Results of the sample data] | ||
* Archived Versions: | * Archived Versions: | ||
** | ** | ||
Line 37: | Line 36: | ||
** an example data folder containing a small subset of a metatranscriptomic data | ** an example data folder containing a small subset of a metatranscriptomic data | ||
* read the README | * read the README | ||
− | * Install the DiffKAP setup script by executing: DiffKAP_setup | + | * Install the DiffKAP setup script by executing: DiffKAP_setup |
* *** If you like, you can add the DiffKAP path to $PATH or just use an absolute path for running DiffKAP *** | * *** If you like, you can add the DiffKAP path to $PATH or just use an absolute path for running DiffKAP *** | ||
== How to run? == | == How to run? == | ||
− | |||
− | |||
# Create your project configuration file by using the example config file in the sample data directory as a template. | # Create your project configuration file by using the example config file in the sample data directory as a template. | ||
# Run the pipeline: Run DiffKAP with your config file as an input argument, for example: DiffKAP ~/sampleProj/sampleProj.cfg | # Run the pipeline: Run DiffKAP with your config file as an input argument, for example: DiffKAP ~/sampleProj/sampleProj.cfg | ||
− | + | * Results will be generated in the [OUT_DIR]/results where [OUT_DIR] is defined in the config file. | |
− | |||
− | * Results will be generated in the [OUT_DIR] where [OUT_DIR] is defined in the config file. | ||
− | |||
− | |||
− | |||
− | |||
* The processing log is stored in /tmp/DiffKAP.log by default. | * The processing log is stored in /tmp/DiffKAP.log by default. | ||
+ | |||
== How to interpret the results? == | == How to interpret the results? == | ||
* You can download the results of the sample data [http://www.appliedbioinformatics.com.au/index.php/DiffKAP#Download here]. | * You can download the results of the sample data [http://www.appliedbioinformatics.com.au/index.php/DiffKAP#Download here]. | ||
− | * The script "DiffKAP" generates 4 types of files in folder [OUT_DIR]/ | + | * The script "DiffKAP" generates 4 types of files in folder [OUT_DIR]/results: |
*# 5 DER files with the word 'AllDER' in the filenames. Explanation of some columns: | *# 5 DER files with the word 'AllDER' in the filenames. Explanation of some columns: | ||
*#* Median-T1: The median k-mer occurrence represented in Treatment 1 (corresponding to T1_ID in the config file) for all kmers in the read. | *#* Median-T1: The median k-mer occurrence represented in Treatment 1 (corresponding to T1_ID in the config file) for all kmers in the read. | ||
Line 78: | Line 70: | ||
== Reference == | == Reference == | ||
* Marçais, G. and Kingsford, C. (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers, Bioinformatics, 27, 764-770. | * Marçais, G. and Kingsford, C. (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers, Bioinformatics, 27, 764-770. | ||
− | * | + | |
+ | |||
+ | |||
+ | == Citation == | ||
+ | * Rosic N, Kaniewska P, Chan C-K, Ling E, Edwards D, Dove S, Hoegh-Guldberg O: Early transcriptional changes in the reef-building coral Acropora aspera in response to thermal and nutrient stress. BMC Genomics 2014, 15(1):1052 | ||
+ | |||
Back to [[Main_Page]] | Back to [[Main_Page]] |
Latest revision as of 08:26, 14 August 2017
Next generation DNA sequencing technologies such as RNA-Seq currently dominate genome wide gene expression studies. A standard approach to analyse this data requires mapping sequence reads to a reference and counting the number of reads which map to each gene. However, for many transcriptome studies a suitable reference genome is unavailable, especially for meta-transcriptome studies which assay gene expression from mixed populations of organisms. Where a reference is unavailable, it is possible to generate a reference by the de novo assembly of the sequence reads. However, the accurate assembly of such data is challenging, especially for meta-transcriptome data, and resulting assemblies frequently suffer from collapsed regions or chimeric sequences.
With the lack of reference assemblies currently limiting meta-transcriptome studies, we have established a Differential k-mer Analysis Pipeline (DiffKAP) for gene expression analysis, which does not require the generation of a reference for read mapping. By reducing each read to component k-mers and comparing the relative abundance of these sub-sequences, we overcome statistical limitations of whole read comparative analysis.
The DiffKAP application consists of a series of scripts written in Perl and Linux shell scripts and requires Jellyfish [Marcais 2011] and BLASTx as well as access to a copy of a blast-formatted protein database. The scripts are freely available for non-commercial use.
Contents
What does DiffKAP depend on?
DiffKAP depends on the following things:
- Jellyfish for fast kmer counting
- blastx for sequence alignment
- Some non-standard Perl modules:
- bioperl
- Bio::SeqIO
- Bio::SearchIO
- Parallel::ForkManager
- Statistics::Descriptive
- Config::IniFiles
- GD::Graph::linespoints (for the script identifyKmerSize)
- bioperl
Download
- Latest Version 0.9 (23/09/2013):
- Archived Versions:
How to install?
- Download the DiffKAP package.
- Uncompress it into:
- a DiffKAP setup file
- a README file
- a VERSION file
- an example data folder containing a small subset of a metatranscriptomic data
- read the README
- Install the DiffKAP setup script by executing: DiffKAP_setup
- *** If you like, you can add the DiffKAP path to $PATH or just use an absolute path for running DiffKAP ***
How to run?
- Create your project configuration file by using the example config file in the sample data directory as a template.
- Run the pipeline: Run DiffKAP with your config file as an input argument, for example: DiffKAP ~/sampleProj/sampleProj.cfg
- Results will be generated in the [OUT_DIR]/results where [OUT_DIR] is defined in the config file.
- The processing log is stored in /tmp/DiffKAP.log by default.
How to interpret the results?
- You can download the results of the sample data here.
- The script "DiffKAP" generates 4 types of files in folder [OUT_DIR]/results:
- 5 DER files with the word 'AllDER' in the filenames. Explanation of some columns:
- Median-T1: The median k-mer occurrence represented in Treatment 1 (corresponding to T1_ID in the config file) for all kmers in the read.
- Median-T2: Similar to Median-T1 but for Treatment 2.
- Ratio of Median: The ratio of Median-T1 to Median-T2.
- CV-T1: The coefficient of variation of all kmer occurrence represented in Treatment 1 for all kmers in the read. To show how confident the Median-T1 representing all kmers in the read.
- CV-T2: Similar to CV-T1 but for Treatment 2.
- 5 annotated DER files with the word 'AnnotatedDER' in the filenames. These files are similar to the 5 DER above but contain only the annotated DER.
- A gene-centric summary with the word 'DEG' in the filename:
- In a tabular form showing the number of DER in the specific files (in columns 3-7) annotated to the specific gene.
- It shows the unique gene list.
- The 'Total' column is the total number of DER annotated to such gene.
- It is sorted by the 'Total' column in reversed order.
- A result summary file with 'summary.log' in the filename.
- 5 DER files with the word 'AllDER' in the filenames. Explanation of some columns:
FAQ
Reference
- Marçais, G. and Kingsford, C. (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers, Bioinformatics, 27, 764-770.
Citation
- Rosic N, Kaniewska P, Chan C-K, Ling E, Edwards D, Dove S, Hoegh-Guldberg O: Early transcriptional changes in the reef-building coral Acropora aspera in response to thermal and nutrient stress. BMC Genomics 2014, 15(1):1052
Back to Main_Page