Difference between revisions of "PAKAP"
From Applied Bioinformatics Group
Philippbayer (talk | contribs) (Created page with "INTRODUCTION (from paper?) We have established a Present/Absent Kmer Analysis Pipeline (PAKAP). By reducing each read to component k-mers and comparing the relative abundance o...") |
Philippbayer (talk | contribs) (→How to run?) |
||
Line 40: | Line 40: | ||
-o output_folder/ | -o output_folder/ | ||
+ | or, easier: | ||
+ | |||
+ | python pipeline.py | ||
+ | --s1 ./truncated/truncated_2.150.1sd/truncated_2.150.1sd.R?.fasta | ||
+ | --s2 ./whole/whole_2.150.1sd/whole_2.150.1sd.R?.fasta | ||
+ | -c default.config | ||
+ | -o output_folder/ | ||
+ | |||
+ | This will read the configuration from default.config and generate all output files in output_folder. | ||
== How to interpret the results? == | == How to interpret the results? == |
Revision as of 05:03, 28 April 2014
INTRODUCTION (from paper?)
We have established a Present/Absent Kmer Analysis Pipeline (PAKAP). By reducing each read to component k-mers and comparing the relative abundance of these sub-sequences, we overcome statistical limitations of whole read comparative analysis.
PAKAP consists of a series of scripts written in Perl, Python and Bash scripts and requires Jellyfish [Marcais 2011] as well as optionally SOAPaligner. The scripts are freely available for non-commercial use.
Contents
What does PAKAP depend on?
- Jellyfish for fast kmer counting
- Some non-standard Perl modules:
- bioperl
- Bio::SeqIO
- Bio::SearchIO
- Parallel::ForkManager
- Statistics::Descriptive
- Config::IniFiles
- GD::Graph::linespoints (for the script identifyKmerSize)
- bioperl
- Optional: SOAPaligner
Download
- Latest Version 1.0:
- INSERT LINK
How to install?
- Download the [link_here DiffKAP package].
How to run?
- Create your project configuration file by using the example config file. Here it is:
The command is:
python pipeline.py --s1 ./truncated/truncated_2.150.1sd/truncated_2.150.1sd.R1.fasta ./truncated/truncated_2.150.1sd/truncated_2.150.1sd.R2.fasta --s2 ./whole/whole_2.150.1sd/whole_2.150.1sd.R1.fasta ./whole/whole_2.150.1sd/whole_2.150.1sd.R2.fasta -c default.config -o output_folder/
or, easier:
python pipeline.py --s1 ./truncated/truncated_2.150.1sd/truncated_2.150.1sd.R?.fasta --s2 ./whole/whole_2.150.1sd/whole_2.150.1sd.R?.fasta -c default.config -o output_folder/
This will read the configuration from default.config and generate all output files in output_folder.
How to interpret the results?
- You can download the results of the sample data here.
FAQ
Reference
- Marçais, G. and Kingsford, C. (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers, Bioinformatics, 27, 764-770.
Back to Main_Page