Difference between revisions of "Darmor Tapidor"

From Applied Bioinformatics Group
Jump to: navigation, search
(Annotation)
 
(40 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
This page collects the files for Bayer et al. <i>B. napus</i> Darmor/Tapidor genome paper
 
This page collects the files for Bayer et al. <i>B. napus</i> Darmor/Tapidor genome paper
 +
 +
== Software ==
 +
 +
Collinearity analysis - parses MCScanX results and checks for missing genes in expected regions [http://appliedbioinformatics.com.au/download/DarmorTapidor/Collinearity_scripts.zip Collinearity_scripts.zip]
 +
 +
LASTZSorter.py - sorts contigs based on LASTZ alignment with reference [http://appliedbioinformatics.com.au/download/DarmorTapidor/LASTZSorter.py LASTZSorter.py]
 +
 +
contigPlacer - places contigs based on recombination patterns [https://github.com/philippbayer/contigplacer contigPlacer]
 +
 +
R-scripts used for plotting - Venn-diagrams, boxplots [http://appliedbioinformatics.com.au/download/DarmorTapidor/R_plotting_scripts.zip R_plotting_scripts.zip]
 +
 +
The SkimGBS pipeline is available here: http://appliedbioinformatics.com.au/index.php/SkimGBS
 +
 +
== Results ==
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/MSTMap_Results.txt.gz Tapidor genetic map from MSTMap (txt)] - [http://appliedbioinformatics.com.au/download/DarmorTapidor/MSTMap_Input.zip MSTMap_Input.zip] Input file for MSTMap
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_Tapidor_Ningyou_SNPs.zip Darmor SNPs anchored on Darmor v8.1 reference (gff3)]
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_Tapidor_Ningyou_SNPs.zip Tapidor SNPs anchored on Tapidor v6.3 reference (gff3)]
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Repetitive_Collapsed_Genes.zip Repetitive_Collapsed_Genes.zip] List of genes in repetitive and collapsed regions
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Repetitive_Collapsed_Regions.zip Repetitive_Collapsed_Regions.zip] Coordinates of repetitive and collapsed regions in Darmor and Tapidor (bed)
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/SwissProt_Pfam_hits_Repetitive_Collapsed_Genes.zip SwissProt_Pfam_hits_Repetitive_Collapsed_Genes.zip] Pfam and Swissprot results for repetitive and collapsed genes
  
 
== Annotation ==
 
== Annotation ==
  
Genes and proteins were renamed from the MAKER names to the [http://www.brassica.info/info/genome_annotation.php Brassica standard]:
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/ListOfBadPfamDomains.txt List of Transposase related PFAM IDs used for filtering]
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/GO_Arabidopsis_Terms.zip GO_Arabidopsis_Terms.zip] Swiss-Prot/''Arabidopsis'' based GO terms for Darmor and Tapidor annotation
 +
 
 +
=== Tapidor v6.3 ===
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.fasta.gz Tapidor_v63_assembly.fasta.gz] - assembly as pseudo-molecules
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v6.3_contig_order.zip Tapidor_v6.3_contig_order.zip] - contig positions in assembly as gff3
 +
 
 +
 
 +
==== Unfiltered annotation ====
 +
 
 +
Straight from AUGUSTUS, with MAKER's AED scores
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz Tapidor_v63_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz] - annotation in GFF format
  
<GENUS 1 LETTER> [<species 2 letters>]<GENOME 1 LETTER>|<X>.<Chromosome number (leading zero)>g<5 digit gene model number>.g<version number>g<1 LETTER designating Genotype/line/cultivar>
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz] - predicted proteins
  
So "maker-chrC04_contigs_placed_v81-snap-gene-0.93-mRNA-1 protein" becomes BnaC04g31331.2D for Darmor "new version" (of this paper), or BnaC04g31332.1T for Tapidor (first version), "maker-chrC04_contigs_placed_v81-snap-gene-0.93-mRNA-2 becomes BnaC04g31332.2D etc.
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz] - predicted transcripts
  
  
=== Tapidor ===
+
==== Filtered annotation ====
  
 +
No AED=1 scores, transcripts longer than 100 bp, no Transposase domains
  
All files are here [http://appliedbioinformatics.com.au/download/Tapidor_Assembly_with_Annotations.tar.gz CLICK]
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz Tapidor_v63_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz] - filtered predicted annotation in GFF format
  
Contained are:
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz] - filtered predicted transcripts
  
Tapidor_v63_assembly.fasta.gz - assembly as pseudo-molecules
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz] - filtered predicted proteins
  
Tapidor_v63_assembly.all_renamed.gff.gz - this contains all unfiltered MAKER gene models as reported by MAKER's <nowiki>gff3_merge -n -g</nowiki>
 
  
Tapidor_v63_assembly.all.maker.proteins_renamed.fasta.gz - all unfiltered proteins as reported by <nowiki>fasta_merge</nowiki>
+
=== Darmor v8.1 ===
  
Tapidor_v63_assembly.all.maker.transcripts_renamed.fasta.gz - all unfiltered transcripts
+
WARNING - the Brassica community annotation standard says to number the genes by their order on the pseudomolecules. I've done this here as well. Since we tried to place as many contigs as possible that means that the order shifted a lot, so you 'cannot' just look for the same gene numbers when you compare with the v4.1 annotation, you have to use BLAST or similar to search for your candidate genes.
  
Tapidor_v63_assembly.all.maker.proteins_250bp_repeats_filtered_renamed.fasta.gz   - all proteins as reported by MAKER, removed when covered for more than 50% by a RepBase repeat, or when shorter than 250 bp
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.fasta.gz Darmor_v81_assembly_fasta.gz] - assembly as pseudo-molecules
  
Tapidor_v63_assembly.all.maker.transcripts_250bp_repeats_filtered_renamed.fasta.gz - same as above but transcripts
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v8.1_contig_order.zip Darmor_v8.1_contig_order.zip] - order of contigs as gff3 files
  
Tapidor_v63_assembly.all_250bp_repeats_filtered_renamed.gff.gz - same as above but gff3 file
+
==== Unfiltered annotation ====
  
=== Darmor ===
+
Straight from AUGUSTUS, with MAKER's AED scores
  
Files are here: [http://appliedbioinformatics.com.au/download/Darmor_Assembly_with_Annotations.tar.gz CLICK]
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz Darmor_v81_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz] - annotation in GFF format
  
Same files as above:
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz] - proteins, fasta
  
Darmor_v81_assembly.fasta.gz - assembly as pseudo-molecules
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz] - transcripts, fasta
  
Darmor_v81_assembly.all_renamed.gff.gz - unfiltered MAKER gene models
+
==== Filtered annotation ====
  
Darmor_v81_assembly.all.maker.proteins_renamed.fasta.gz
+
No AED=1 scores, transcripts longer than 100 bp, no Transposase domains
  
Darmor_v81_assembly.all.maker.transcripts_renamed.fasta.gz
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz Darmor_v81_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz] - filtered predicted annotation in GFF format
  
Darmor_v81_assembly.all_250bp_repeats_filtered_renamed.gff.gz                    
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz] - filtered predicted transcripts
  
Darmor_v81_assembly.all.maker.proteins_250bp_repeats_filtered_renamed.fasta.gz    
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz] - filtered predicted proteins
                           
 
Darmor_v81_assembly.all.maker.transcripts_250bp_repeats_filtered_renamed.fasta.gz
 

Latest revision as of 15:07, 15 June 2018

This page collects the files for Bayer et al. B. napus Darmor/Tapidor genome paper

Software

Collinearity analysis - parses MCScanX results and checks for missing genes in expected regions Collinearity_scripts.zip

LASTZSorter.py - sorts contigs based on LASTZ alignment with reference LASTZSorter.py

contigPlacer - places contigs based on recombination patterns contigPlacer

R-scripts used for plotting - Venn-diagrams, boxplots R_plotting_scripts.zip

The SkimGBS pipeline is available here: http://appliedbioinformatics.com.au/index.php/SkimGBS

Results

Tapidor genetic map from MSTMap (txt) - MSTMap_Input.zip Input file for MSTMap

Darmor SNPs anchored on Darmor v8.1 reference (gff3)

Tapidor SNPs anchored on Tapidor v6.3 reference (gff3)

Repetitive_Collapsed_Genes.zip List of genes in repetitive and collapsed regions

Repetitive_Collapsed_Regions.zip Coordinates of repetitive and collapsed regions in Darmor and Tapidor (bed)

SwissProt_Pfam_hits_Repetitive_Collapsed_Genes.zip Pfam and Swissprot results for repetitive and collapsed genes

Annotation

List of Transposase related PFAM IDs used for filtering

GO_Arabidopsis_Terms.zip Swiss-Prot/Arabidopsis based GO terms for Darmor and Tapidor annotation

Tapidor v6.3

Tapidor_v63_assembly.fasta.gz - assembly as pseudo-molecules

Tapidor_v6.3_contig_order.zip - contig positions in assembly as gff3


Unfiltered annotation

Straight from AUGUSTUS, with MAKER's AED scores

Tapidor_v63_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz - annotation in GFF format

Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz - predicted proteins

Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz - predicted transcripts


Filtered annotation

No AED=1 scores, transcripts longer than 100 bp, no Transposase domains

Tapidor_v63_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz - filtered predicted annotation in GFF format

Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz - filtered predicted transcripts

Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz - filtered predicted proteins


Darmor v8.1

WARNING - the Brassica community annotation standard says to number the genes by their order on the pseudomolecules. I've done this here as well. Since we tried to place as many contigs as possible that means that the order shifted a lot, so you 'cannot' just look for the same gene numbers when you compare with the v4.1 annotation, you have to use BLAST or similar to search for your candidate genes.

Darmor_v81_assembly_fasta.gz - assembly as pseudo-molecules

Darmor_v8.1_contig_order.zip - order of contigs as gff3 files

Unfiltered annotation

Straight from AUGUSTUS, with MAKER's AED scores

Darmor_v81_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz - annotation in GFF format

Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz - proteins, fasta

Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz - transcripts, fasta

Filtered annotation

No AED=1 scores, transcripts longer than 100 bp, no Transposase domains

Darmor_v81_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz - filtered predicted annotation in GFF format

Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz - filtered predicted transcripts

Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz - filtered predicted proteins