Difference between revisions of "PAKAP"

From Applied Bioinformatics Group
Jump to: navigation, search
(Created page with "INTRODUCTION (from paper?) We have established a Present/Absent Kmer Analysis Pipeline (PAKAP). By reducing each read to component k-mers and comparing the relative abundance o...")
 
(How to run?)
Line 40: Line 40:
 
         -o output_folder/
 
         -o output_folder/
  
 +
or, easier:
 +
 +
    python pipeline.py
 +
        --s1 ./truncated/truncated_2.150.1sd/truncated_2.150.1sd.R?.fasta
 +
        --s2 ./whole/whole_2.150.1sd/whole_2.150.1sd.R?.fasta
 +
        -c default.config 
 +
        -o output_folder/
 +
 +
This will read the configuration from default.config and generate all output files in output_folder.
  
 
== How to interpret the results? ==
 
== How to interpret the results? ==

Revision as of 05:03, 28 April 2014

INTRODUCTION (from paper?)


We have established a Present/Absent Kmer Analysis Pipeline (PAKAP). By reducing each read to component k-mers and comparing the relative abundance of these sub-sequences, we overcome statistical limitations of whole read comparative analysis.

PAKAP consists of a series of scripts written in Perl, Python and Bash scripts and requires Jellyfish [Marcais 2011] as well as optionally SOAPaligner. The scripts are freely available for non-commercial use.


What does PAKAP depend on?

  • Jellyfish for fast kmer counting
  • Some non-standard Perl modules:
    • bioperl
      • Bio::SeqIO
      • Bio::SearchIO
    • Parallel::ForkManager
    • Statistics::Descriptive
    • Config::IniFiles
    • GD::Graph::linespoints (for the script identifyKmerSize)
  • Optional: SOAPaligner

Download

  • Latest Version 1.0:
    • INSERT LINK

How to install?

  • Download the [link_here DiffKAP package].


How to run?

  • Create your project configuration file by using the example config file. Here it is:


The command is:

   python pipeline.py 
        --s1 ./truncated/truncated_2.150.1sd/truncated_2.150.1sd.R1.fasta ./truncated/truncated_2.150.1sd/truncated_2.150.1sd.R2.fasta 
        --s2 ./whole/whole_2.150.1sd/whole_2.150.1sd.R1.fasta ./whole/whole_2.150.1sd/whole_2.150.1sd.R2.fasta 
        -c default.config  
        -o output_folder/

or, easier:

   python pipeline.py 
        --s1 ./truncated/truncated_2.150.1sd/truncated_2.150.1sd.R?.fasta
        --s2 ./whole/whole_2.150.1sd/whole_2.150.1sd.R?.fasta
        -c default.config  
        -o output_folder/

This will read the configuration from default.config and generate all output files in output_folder.

How to interpret the results?

  • You can download the results of the sample data here.


FAQ


Reference

  • Marçais, G. and Kingsford, C. (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers, Bioinformatics, 27, 764-770.


Back to Main_Page