Difference between revisions of "Darmor Tapidor"

From Applied Bioinformatics Group
Jump to: navigation, search
(Results)
 
(15 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
Collinearity analysis - parses MCScanX results and checks for missing genes in expected regions [http://appliedbioinformatics.com.au/download/DarmorTapidor/Collinearity_scripts.zip Collinearity_scripts.zip]
 
Collinearity analysis - parses MCScanX results and checks for missing genes in expected regions [http://appliedbioinformatics.com.au/download/DarmorTapidor/Collinearity_scripts.zip Collinearity_scripts.zip]
  
LASTZSorter.py - sorts contigs based on LASTZ output [http://appliedbioinformatics.com.au/download/DarmorTapidor/LASTZSorter.py LASTZSorter.py]
+
LASTZSorter.py - sorts contigs based on LASTZ alignment with reference [http://appliedbioinformatics.com.au/download/DarmorTapidor/LASTZSorter.py LASTZSorter.py]
  
 
contigPlacer - places contigs based on recombination patterns [https://github.com/philippbayer/contigplacer contigPlacer]
 
contigPlacer - places contigs based on recombination patterns [https://github.com/philippbayer/contigplacer contigPlacer]
  
 
R-scripts used for plotting - Venn-diagrams, boxplots [http://appliedbioinformatics.com.au/download/DarmorTapidor/R_plotting_scripts.zip R_plotting_scripts.zip]
 
R-scripts used for plotting - Venn-diagrams, boxplots [http://appliedbioinformatics.com.au/download/DarmorTapidor/R_plotting_scripts.zip R_plotting_scripts.zip]
 +
 +
The SkimGBS pipeline is available here: http://appliedbioinformatics.com.au/index.php/SkimGBS
  
 
== Results ==
 
== Results ==
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/MSTMap_Results.txt.gz Tapidor genetic map from MSTMap]
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/MSTMap_Results.txt.gz Tapidor genetic map from MSTMap (txt)] - [http://appliedbioinformatics.com.au/download/DarmorTapidor/MSTMap_Input.zip MSTMap_Input.zip] Input file for MSTMap
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_Tapidor_Ningyou_SNPs.zip Darmor SNPs anchored on Darmor v8.1 reference]
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_Tapidor_Ningyou_SNPs.zip Darmor SNPs anchored on Darmor v8.1 reference (gff3)]
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_Tapidor_Ningyou_SNPs.zip Tapidor SNPs anchored on Tapidor v6.3 reference]
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_Tapidor_Ningyou_SNPs.zip Tapidor SNPs anchored on Tapidor v6.3 reference (gff3)]
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Repetitive_Collapsed_Genes.zip Repetitive_Collapsed_Genes.zip] List of genes in repetitive and collapsed regions
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Repetitive_Collapsed_Regions.zip Repetitive_Collapsed_Regions.zip] Coordinates of repetitive and collapsed regions in Darmor and Tapidor (bed)
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/SwissProt_Pfam_hits_Repetitive_Collapsed_Genes.zip SwissProt_Pfam_hits_Repetitive_Collapsed_Genes.zip] Pfam and Swissprot results for repetitive and collapsed genes
  
 
== Annotation ==
 
== Annotation ==
Line 23: Line 31:
 
[http://appliedbioinformatics.com.au/download/DarmorTapidor/ListOfBadPfamDomains.txt List of Transposase related PFAM IDs used for filtering]
 
[http://appliedbioinformatics.com.au/download/DarmorTapidor/ListOfBadPfamDomains.txt List of Transposase related PFAM IDs used for filtering]
  
=== Tapidor ===
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/GO_Arabidopsis_Terms.zip GO_Arabidopsis_Terms.zip] Swiss-Prot/''Arabidopsis'' based GO terms for Darmor and Tapidor annotation
 +
 
 +
=== Tapidor v6.3 ===
  
 
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.fasta.gz Tapidor_v63_assembly.fasta.gz] - assembly as pseudo-molecules
 
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.fasta.gz Tapidor_v63_assembly.fasta.gz] - assembly as pseudo-molecules
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v6.3_contig_order.zip Tapidor_v6.3_contig_order.zip] - contig positions in assembly as gff3
 +
 +
 +
==== Unfiltered annotation ====
 +
 +
Straight from AUGUSTUS, with MAKER's AED scores
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz Tapidor_v63_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz] - annotation in GFF format
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz] - predicted proteins
 +
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz] - predicted transcripts
 +
  
 
==== Filtered annotation ====
 
==== Filtered annotation ====
Line 31: Line 55:
 
No AED=1 scores, transcripts longer than 100 bp, no Transposase domains
 
No AED=1 scores, transcripts longer than 100 bp, no Transposase domains
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.augustus_masked_filtered.gff.gz Tapidor_v63_assembly.augustus_masked_filtered.gff.gz] - filtered predicted annotation in GFF format
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz Tapidor_v63_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz] - filtered predicted annotation in GFF format
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz] - filtered predicted transcripts
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz] - filtered predicted proteins
 +
 
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.transcripts_filtered.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.transcripts_filtered.fasta.gz] - filtered predicted transcripts
+
=== Darmor v8.1 ===
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.proteins_filtered.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.proteins_filtered.fasta.gz] - filtered predicted proteins
+
WARNING - the Brassica community annotation standard says to number the genes by their order on the pseudomolecules. I've done this here as well. Since we tried to place as many contigs as possible that means that the order shifted a lot, so you 'cannot' just look for the same gene numbers when you compare with the v4.1 annotation, you have to use BLAST or similar to search for your candidate genes.
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.proteins_Pfam_results.gff.gz Tapidor_v63_assembly.all.maker.augustus_masked.proteins_Pfam_results.gff.gz] - PFam results for filtered predicted proteins
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.fasta.gz Darmor_v81_assembly_fasta.gz] - assembly as pseudo-molecules
  
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v8.1_contig_order.zip Darmor_v8.1_contig_order.zip] - order of contigs as gff3 files
  
=== Darmor ===
+
==== Unfiltered annotation ====
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.fasta.gz Darmlr_v81_assembly_fasta.gz] - assembly as pseudomolecules
+
Straight from AUGUSTUS, with MAKER's AED scores
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz Darmor_v81_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz] - annotation in GFF format
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz] - proteins, fasta
 +
 
 +
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz] - transcripts, fasta
  
 
==== Filtered annotation ====
 
==== Filtered annotation ====
Line 48: Line 84:
 
No AED=1 scores, transcripts longer than 100 bp, no Transposase domains  
 
No AED=1 scores, transcripts longer than 100 bp, no Transposase domains  
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.augustus_masked_filtered.gff.gz  Darmor_v81_assembly.augustus_masked_filtered.gff.gz] - filtered predicted annotation in GFF format
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz Darmor_v81_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz] - filtered predicted annotation in GFF format
 
 
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.transcripts_filtered.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.transcripts_filtered.fasta.gz] - filtered predicted transcripts
 
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.proteins_filtered.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.proteins_filtered.fasta.gz] - filtered predicted proteins
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz] - filtered predicted transcripts
  
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.proteins_Pfam_results.gff.gz Darmor_v81_assembly.all.maker.augustus_masked.proteins_Pfam_results.gff.gz] - PFam results for filtered predicted proteins
+
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz] - filtered predicted proteins

Latest revision as of 15:07, 15 June 2018

This page collects the files for Bayer et al. B. napus Darmor/Tapidor genome paper

Software

Collinearity analysis - parses MCScanX results and checks for missing genes in expected regions Collinearity_scripts.zip

LASTZSorter.py - sorts contigs based on LASTZ alignment with reference LASTZSorter.py

contigPlacer - places contigs based on recombination patterns contigPlacer

R-scripts used for plotting - Venn-diagrams, boxplots R_plotting_scripts.zip

The SkimGBS pipeline is available here: http://appliedbioinformatics.com.au/index.php/SkimGBS

Results

Tapidor genetic map from MSTMap (txt) - MSTMap_Input.zip Input file for MSTMap

Darmor SNPs anchored on Darmor v8.1 reference (gff3)

Tapidor SNPs anchored on Tapidor v6.3 reference (gff3)

Repetitive_Collapsed_Genes.zip List of genes in repetitive and collapsed regions

Repetitive_Collapsed_Regions.zip Coordinates of repetitive and collapsed regions in Darmor and Tapidor (bed)

SwissProt_Pfam_hits_Repetitive_Collapsed_Genes.zip Pfam and Swissprot results for repetitive and collapsed genes

Annotation

List of Transposase related PFAM IDs used for filtering

GO_Arabidopsis_Terms.zip Swiss-Prot/Arabidopsis based GO terms for Darmor and Tapidor annotation

Tapidor v6.3

Tapidor_v63_assembly.fasta.gz - assembly as pseudo-molecules

Tapidor_v6.3_contig_order.zip - contig positions in assembly as gff3


Unfiltered annotation

Straight from AUGUSTUS, with MAKER's AED scores

Tapidor_v63_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz - annotation in GFF format

Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz - predicted proteins

Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz - predicted transcripts


Filtered annotation

No AED=1 scores, transcripts longer than 100 bp, no Transposase domains

Tapidor_v63_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz - filtered predicted annotation in GFF format

Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz - filtered predicted transcripts

Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz - filtered predicted proteins


Darmor v8.1

WARNING - the Brassica community annotation standard says to number the genes by their order on the pseudomolecules. I've done this here as well. Since we tried to place as many contigs as possible that means that the order shifted a lot, so you 'cannot' just look for the same gene numbers when you compare with the v4.1 annotation, you have to use BLAST or similar to search for your candidate genes.

Darmor_v81_assembly_fasta.gz - assembly as pseudo-molecules

Darmor_v8.1_contig_order.zip - order of contigs as gff3 files

Unfiltered annotation

Straight from AUGUSTUS, with MAKER's AED scores

Darmor_v81_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz - annotation in GFF format

Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz - proteins, fasta

Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz - transcripts, fasta

Filtered annotation

No AED=1 scores, transcripts longer than 100 bp, no Transposase domains

Darmor_v81_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz - filtered predicted annotation in GFF format

Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz - filtered predicted transcripts

Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz - filtered predicted proteins