Difference between revisions of "Darmor Tapidor"

From Applied Bioinformatics Group
Jump to: navigation, search
(Filtered annotation)
 
(34 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
This page collects the files for Bayer et al. <i>B. napus</i> Darmor/Tapidor genome paper
 
This page collects the files for Bayer et al. <i>B. napus</i> Darmor/Tapidor genome paper
 +
 +
== Software ==
 +
 +
Collinearity analysis - parses MCScanX results and checks for missing genes in expected regions [http://appliedbioinformatics.com.au/download/DarmorTapidor/Collinearity_scripts.zip Collinearity_scripts.zip]
 +
 +
LASTZSorter.py - sorts contigs based on LASTZ alignment with reference [http://appliedbioinformatics.com.au/download/DarmorTapidor/LASTZSorter.py LASTZSorter.py]
 +
 +
contigPlacer - places contigs based on recombination patterns [https://github.com/philippbayer/contigplacer contigPlacer]
 +
 +
R-scripts used for plotting - Venn-diagrams, boxplots [http://appliedbioinformatics.com.au/download/DarmorTapidor/R_plotting_scripts.zip R_plotting_scripts.zip]
 +
 +
The SkimGBS pipeline is available here: http://appliedbioinformatics.com.au/index.php/SkimGBS
 +
 +
== Results ==
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/MSTMap_Results.txt.gz Tapidor genetic map from MSTMap (txt)] - [http://appliedbioinformatics.com.au/download/DarmorTapidor/MSTMap_Input.zip MSTMap_Input.zip] Input file for MSTMap
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_Tapidor_Ningyou_SNPs.zip Darmor SNPs anchored on Darmor v8.1 reference (gff3)]
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_Tapidor_Ningyou_SNPs.zip Tapidor SNPs anchored on Tapidor v6.3 reference (gff3)]
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Repetitive_Collapsed_Genes.zip Repetitive_Collapsed_Genes.zip] List of genes in repetitive and collapsed regions
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Repetitive_Collapsed_Regions.zip Repetitive_Collapsed_Regions.zip] Coordinates of repetitive and collapsed regions in Darmor and Tapidor (bed)
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/SwissProt_Pfam_hits_Repetitive_Collapsed_Genes.zip SwissProt_Pfam_hits_Repetitive_Collapsed_Genes.zip] Pfam and Swissprot results for repetitive and collapsed genes
  
 
== Annotation ==
 
== Annotation ==
  
Genes and proteins were renamed from the MAKER names to the [http://www.brassica.info/info/genome_annotation.php Brassica standard]:
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/ListOfBadPfamDomains.txt List of Transposase related PFAM IDs used for filtering]
  
<GENUS 1 LETTER> [<species 2 letters>]<GENOME 1 LETTER>|<X>.<Chromosome number (leading zero)>g<5 digit gene model number>.g<version number>g<1 LETTER designating Genotype/line/cultivar>
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/GO_Arabidopsis_Terms.zip GO_Arabidopsis_Terms.zip] Swiss-Prot/''Arabidopsis'' based GO terms for Darmor and Tapidor annotation
  
So "maker-chrC04_contigs_placed_v81-snap-gene-0.93-mRNA-1 protein" becomes BnaC04g31331.2D for Darmor "new version" (of this paper), or BnaC04g31332.1T for Tapidor (first version), "maker-chrC04_contigs_placed_v81-snap-gene-0.93-mRNA-2 becomes BnaC04g31332.2D etc.
+
=== Tapidor v6.3 ===
  
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.fasta.gz Tapidor_v63_assembly.fasta.gz] - assembly as pseudo-molecules
  
=== Tapidor ===
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v6.3_contig_order.zip Tapidor_v6.3_contig_order.zip] - contig positions in assembly as gff3
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.fasta.gz Tapidor_v63_assembly.fasta.gz] - assembly as pseudo-molecules
 
  
 
==== Unfiltered annotation ====
 
==== Unfiltered annotation ====
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all_renamed.gff.gz Tapidor_v63_assembly.all_renamed.gff.gz] - annotation in GFF format
+
Straight from AUGUSTUS, with MAKER's AED scores
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz Tapidor_v63_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz] - annotation in GFF format
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.proteins_renamed.fasta.gz Tapidor_v63_assembly.all.maker.proteins_renamed.fasta.gz] - predicted proteins
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz] - predicted proteins
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz] - predicted transcripts
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.transcripts_renamed.fasta.gz Tapidor_v63_assembly.all.maker.transcripts_renamed.fasta.gz] - predicted transcripts
 
  
 
==== Filtered annotation ====
 
==== Filtered annotation ====
  
No AED=1 scores, no overlap with repeatmodeler output, transcripts longer than 100 bp, no Transposase domains
+
No AED=1 scores, transcripts longer than 100 bp, no Transposase domains
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz Tapidor_v63_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz] - filtered predicted annotation in GFF format
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz] - filtered predicted transcripts
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz] - filtered predicted proteins
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all_noAED1_RepMakAll_no_RepMakOverlap_biggerequal100bp_no_transposase_renamed.gff.gz Tapidor_v63_assembly.all_noAED1_RepMakAll_no_RepMakOverlap_biggerequal100bp_no_transposase_renamed.gff.gz] - filtered predicted annotation in GFF format
 
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.transcripts_noAED1_RepMakAll_no_RepMakOverlap_biggerequal100bp_no_transposase_renamed.fasta.gz Tapidor_v63_assembly.all.maker.transcripts_noAED1_RepMakAll_no_RepMakOverlap_biggerequal100bp_no_transposase_renamed.fasta.gz] - filtered predicted transcripts
+
=== Darmor v8.1 ===
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.proteins_noAED1_RepMakAll_no_RepMakOverlap_biggerequal100bp_no_transposase_renamed.fasta.gz Tapidor_v63_assembly.all.maker.proteins_noAED1_RepMakAll_no_RepMakOverlap_biggerequal100bp_no_transposase_renamed.fasta.gz] - filtered predicted proteins
+
WARNING - the Brassica community annotation standard says to number the genes by their order on the pseudomolecules. I've done this here as well. Since we tried to place as many contigs as possible that means that the order shifted a lot, so you 'cannot' just look for the same gene numbers when you compare with the v4.1 annotation, you have to use BLAST or similar to search for your candidate genes.
  
=== Darmor ===
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.fasta.gz Darmor_v81_assembly_fasta.gz] - assembly as pseudo-molecules
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.fasta.gz Darmor_v81_assembly.fasta.gz] - assembly in pseudomolecules
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v8.1_contig_order.zip Darmor_v8.1_contig_order.zip] - order of contigs as gff3 files
  
 
==== Unfiltered annotation ====
 
==== Unfiltered annotation ====
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all_renamed.gff.gz Darmor_v81_assembly.all_renamed.gff.gz] - unfiltered annotation in GFF
+
Straight from AUGUSTUS, with MAKER's AED scores
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.proteins_renamed.fasta.gz Darmor_v81_assembly.all.maker.proteins_renamed.fasta.gz] - unfiltered predicted proteins
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz Darmor_v81_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz] - annotation in GFF format
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.transcripts_renamed.fasta.gz Darmor_v81_assembly.all.maker.transcripts_renamed.fasta.gz] - unfiltered predicted transcripts
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz] - proteins, fasta
  
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz] - transcripts, fasta
  
 
==== Filtered annotation ====
 
==== Filtered annotation ====
  
No AED=1 scores, no overlap with repeatmodeler output, transcripts longer than 100 bp, no Transposase domains  
+
No AED=1 scores, transcripts longer than 100 bp, no Transposase domains  
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all_noAED1_RepMakAll_no_RepMakOverlap_biggerequal100bp_no_transposase_renamed.gff.gz Darmor_v81_assembly.all_noAED1_RepMakAll_no_RepMakOverlap_biggerequal100bp_no_transposase_renamed.gff.gz] - filtered annotation in GFF
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz Darmor_v81_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz] - filtered predicted annotation in GFF format
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.proteins_noAED1_RepMakAll_no_RepMakOverlap_biggerequal100bp_no_transposase_renamed.fasta.gz Darmor_v81_assembly.all.maker.proteins_noAED1_RepMakAll_no_RepMakOverlap_biggerequal100bp_no_transposase_renamed.fasta.gz] - filtered predicted proteins
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz] - filtered predicted transcripts
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.transcripts_noAED1_RepMakAll_no_RepMakOverlap_biggerequal100bp_no_transposase_renamed.fasta.gz Darmor_v81_assembly.all.maker.transcripts_noAED1_RepMakAll_no_RepMakOverlap_biggerequal100bp_no_transposase_renamed.fasta.gz] - transcripts
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz] - filtered predicted proteins

Latest revision as of 15:07, 15 June 2018

This page collects the files for Bayer et al. B. napus Darmor/Tapidor genome paper

Software

Collinearity analysis - parses MCScanX results and checks for missing genes in expected regions Collinearity_scripts.zip

LASTZSorter.py - sorts contigs based on LASTZ alignment with reference LASTZSorter.py

contigPlacer - places contigs based on recombination patterns contigPlacer

R-scripts used for plotting - Venn-diagrams, boxplots R_plotting_scripts.zip

The SkimGBS pipeline is available here: http://appliedbioinformatics.com.au/index.php/SkimGBS

Results

Tapidor genetic map from MSTMap (txt) - MSTMap_Input.zip Input file for MSTMap

Darmor SNPs anchored on Darmor v8.1 reference (gff3)

Tapidor SNPs anchored on Tapidor v6.3 reference (gff3)

Repetitive_Collapsed_Genes.zip List of genes in repetitive and collapsed regions

Repetitive_Collapsed_Regions.zip Coordinates of repetitive and collapsed regions in Darmor and Tapidor (bed)

SwissProt_Pfam_hits_Repetitive_Collapsed_Genes.zip Pfam and Swissprot results for repetitive and collapsed genes

Annotation

List of Transposase related PFAM IDs used for filtering

GO_Arabidopsis_Terms.zip Swiss-Prot/Arabidopsis based GO terms for Darmor and Tapidor annotation

Tapidor v6.3

Tapidor_v63_assembly.fasta.gz - assembly as pseudo-molecules

Tapidor_v6.3_contig_order.zip - contig positions in assembly as gff3


Unfiltered annotation

Straight from AUGUSTUS, with MAKER's AED scores

Tapidor_v63_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz - annotation in GFF format

Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz - predicted proteins

Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz - predicted transcripts


Filtered annotation

No AED=1 scores, transcripts longer than 100 bp, no Transposase domains

Tapidor_v63_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz - filtered predicted annotation in GFF format

Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz - filtered predicted transcripts

Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz - filtered predicted proteins


Darmor v8.1

WARNING - the Brassica community annotation standard says to number the genes by their order on the pseudomolecules. I've done this here as well. Since we tried to place as many contigs as possible that means that the order shifted a lot, so you 'cannot' just look for the same gene numbers when you compare with the v4.1 annotation, you have to use BLAST or similar to search for your candidate genes.

Darmor_v81_assembly_fasta.gz - assembly as pseudo-molecules

Darmor_v8.1_contig_order.zip - order of contigs as gff3 files

Unfiltered annotation

Straight from AUGUSTUS, with MAKER's AED scores

Darmor_v81_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz - annotation in GFF format

Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz - proteins, fasta

Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz - transcripts, fasta

Filtered annotation

No AED=1 scores, transcripts longer than 100 bp, no Transposase domains

Darmor_v81_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz - filtered predicted annotation in GFF format

Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz - filtered predicted transcripts

Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz - filtered predicted proteins