Difference between revisions of "Darmor Tapidor"
Philippbayer (talk | contribs) (→Results) |
Philippbayer (talk | contribs) |
||
(15 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
Collinearity analysis - parses MCScanX results and checks for missing genes in expected regions [http://appliedbioinformatics.com.au/download/DarmorTapidor/Collinearity_scripts.zip Collinearity_scripts.zip] | Collinearity analysis - parses MCScanX results and checks for missing genes in expected regions [http://appliedbioinformatics.com.au/download/DarmorTapidor/Collinearity_scripts.zip Collinearity_scripts.zip] | ||
− | LASTZSorter.py - sorts contigs based on LASTZ | + | LASTZSorter.py - sorts contigs based on LASTZ alignment with reference [http://appliedbioinformatics.com.au/download/DarmorTapidor/LASTZSorter.py LASTZSorter.py] |
contigPlacer - places contigs based on recombination patterns [https://github.com/philippbayer/contigplacer contigPlacer] | contigPlacer - places contigs based on recombination patterns [https://github.com/philippbayer/contigplacer contigPlacer] | ||
R-scripts used for plotting - Venn-diagrams, boxplots [http://appliedbioinformatics.com.au/download/DarmorTapidor/R_plotting_scripts.zip R_plotting_scripts.zip] | R-scripts used for plotting - Venn-diagrams, boxplots [http://appliedbioinformatics.com.au/download/DarmorTapidor/R_plotting_scripts.zip R_plotting_scripts.zip] | ||
+ | |||
+ | The SkimGBS pipeline is available here: http://appliedbioinformatics.com.au/index.php/SkimGBS | ||
== Results == | == Results == | ||
− | [http://appliedbioinformatics.com.au/download/DarmorTapidor/MSTMap_Results.txt.gz Tapidor genetic map from MSTMap] | + | [http://appliedbioinformatics.com.au/download/DarmorTapidor/MSTMap_Results.txt.gz Tapidor genetic map from MSTMap (txt)] - [http://appliedbioinformatics.com.au/download/DarmorTapidor/MSTMap_Input.zip MSTMap_Input.zip] Input file for MSTMap |
− | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_Tapidor_Ningyou_SNPs.zip Darmor SNPs anchored on Darmor v8.1 reference] | + | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_Tapidor_Ningyou_SNPs.zip Darmor SNPs anchored on Darmor v8.1 reference (gff3)] |
− | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_Tapidor_Ningyou_SNPs.zip Tapidor SNPs anchored on Tapidor v6.3 reference] | + | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_Tapidor_Ningyou_SNPs.zip Tapidor SNPs anchored on Tapidor v6.3 reference (gff3)] |
+ | |||
+ | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Repetitive_Collapsed_Genes.zip Repetitive_Collapsed_Genes.zip] List of genes in repetitive and collapsed regions | ||
+ | |||
+ | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Repetitive_Collapsed_Regions.zip Repetitive_Collapsed_Regions.zip] Coordinates of repetitive and collapsed regions in Darmor and Tapidor (bed) | ||
+ | |||
+ | [http://appliedbioinformatics.com.au/download/DarmorTapidor/SwissProt_Pfam_hits_Repetitive_Collapsed_Genes.zip SwissProt_Pfam_hits_Repetitive_Collapsed_Genes.zip] Pfam and Swissprot results for repetitive and collapsed genes | ||
== Annotation == | == Annotation == | ||
Line 23: | Line 31: | ||
[http://appliedbioinformatics.com.au/download/DarmorTapidor/ListOfBadPfamDomains.txt List of Transposase related PFAM IDs used for filtering] | [http://appliedbioinformatics.com.au/download/DarmorTapidor/ListOfBadPfamDomains.txt List of Transposase related PFAM IDs used for filtering] | ||
− | === Tapidor === | + | [http://appliedbioinformatics.com.au/download/DarmorTapidor/GO_Arabidopsis_Terms.zip GO_Arabidopsis_Terms.zip] Swiss-Prot/''Arabidopsis'' based GO terms for Darmor and Tapidor annotation |
+ | |||
+ | === Tapidor v6.3 === | ||
[http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.fasta.gz Tapidor_v63_assembly.fasta.gz] - assembly as pseudo-molecules | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.fasta.gz Tapidor_v63_assembly.fasta.gz] - assembly as pseudo-molecules | ||
+ | |||
+ | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v6.3_contig_order.zip Tapidor_v6.3_contig_order.zip] - contig positions in assembly as gff3 | ||
+ | |||
+ | |||
+ | ==== Unfiltered annotation ==== | ||
+ | |||
+ | Straight from AUGUSTUS, with MAKER's AED scores | ||
+ | |||
+ | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz Tapidor_v63_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz] - annotation in GFF format | ||
+ | |||
+ | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz] - predicted proteins | ||
+ | |||
+ | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz] - predicted transcripts | ||
+ | |||
==== Filtered annotation ==== | ==== Filtered annotation ==== | ||
Line 31: | Line 55: | ||
No AED=1 scores, transcripts longer than 100 bp, no Transposase domains | No AED=1 scores, transcripts longer than 100 bp, no Transposase domains | ||
− | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly. | + | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz Tapidor_v63_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz] - filtered predicted annotation in GFF format |
+ | |||
+ | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz] - filtered predicted transcripts | ||
+ | |||
+ | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz] - filtered predicted proteins | ||
+ | |||
− | + | === Darmor v8.1 === | |
− | + | WARNING - the Brassica community annotation standard says to number the genes by their order on the pseudomolecules. I've done this here as well. Since we tried to place as many contigs as possible that means that the order shifted a lot, so you 'cannot' just look for the same gene numbers when you compare with the v4.1 annotation, you have to use BLAST or similar to search for your candidate genes. | |
− | [http://appliedbioinformatics.com.au/download/DarmorTapidor/ | + | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.fasta.gz Darmor_v81_assembly_fasta.gz] - assembly as pseudo-molecules |
+ | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v8.1_contig_order.zip Darmor_v8.1_contig_order.zip] - order of contigs as gff3 files | ||
− | === | + | ==== Unfiltered annotation ==== |
− | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.fasta.gz | + | Straight from AUGUSTUS, with MAKER's AED scores |
+ | |||
+ | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz Darmor_v81_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz] - annotation in GFF format | ||
+ | |||
+ | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz] - proteins, fasta | ||
+ | |||
+ | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz] - transcripts, fasta | ||
==== Filtered annotation ==== | ==== Filtered annotation ==== | ||
Line 48: | Line 84: | ||
No AED=1 scores, transcripts longer than 100 bp, no Transposase domains | No AED=1 scores, transcripts longer than 100 bp, no Transposase domains | ||
− | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly. | + | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz Darmor_v81_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz] - filtered predicted annotation in GFF format |
− | |||
− | |||
− | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked. | + | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz] - filtered predicted transcripts |
− | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked. | + | [http://appliedbioinformatics.com.au/download/DarmorTapidor/Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz] - filtered predicted proteins |
Latest revision as of 15:07, 15 June 2018
This page collects the files for Bayer et al. B. napus Darmor/Tapidor genome paper
Contents
Software
Collinearity analysis - parses MCScanX results and checks for missing genes in expected regions Collinearity_scripts.zip
LASTZSorter.py - sorts contigs based on LASTZ alignment with reference LASTZSorter.py
contigPlacer - places contigs based on recombination patterns contigPlacer
R-scripts used for plotting - Venn-diagrams, boxplots R_plotting_scripts.zip
The SkimGBS pipeline is available here: http://appliedbioinformatics.com.au/index.php/SkimGBS
Results
Tapidor genetic map from MSTMap (txt) - MSTMap_Input.zip Input file for MSTMap
Darmor SNPs anchored on Darmor v8.1 reference (gff3)
Tapidor SNPs anchored on Tapidor v6.3 reference (gff3)
Repetitive_Collapsed_Genes.zip List of genes in repetitive and collapsed regions
Repetitive_Collapsed_Regions.zip Coordinates of repetitive and collapsed regions in Darmor and Tapidor (bed)
SwissProt_Pfam_hits_Repetitive_Collapsed_Genes.zip Pfam and Swissprot results for repetitive and collapsed genes
Annotation
List of Transposase related PFAM IDs used for filtering
GO_Arabidopsis_Terms.zip Swiss-Prot/Arabidopsis based GO terms for Darmor and Tapidor annotation
Tapidor v6.3
Tapidor_v63_assembly.fasta.gz - assembly as pseudo-molecules
Tapidor_v6.3_contig_order.zip - contig positions in assembly as gff3
Unfiltered annotation
Straight from AUGUSTUS, with MAKER's AED scores
Tapidor_v63_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz - annotation in GFF format
Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz - predicted proteins
Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz - predicted transcripts
Filtered annotation
No AED=1 scores, transcripts longer than 100 bp, no Transposase domains
Tapidor_v63_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz - filtered predicted annotation in GFF format
Tapidor_v63_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz - filtered predicted transcripts
Tapidor_v63_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz - filtered predicted proteins
Darmor v8.1
WARNING - the Brassica community annotation standard says to number the genes by their order on the pseudomolecules. I've done this here as well. Since we tried to place as many contigs as possible that means that the order shifted a lot, so you 'cannot' just look for the same gene numbers when you compare with the v4.1 annotation, you have to use BLAST or similar to search for your candidate genes.
Darmor_v81_assembly_fasta.gz - assembly as pseudo-molecules
Darmor_v8.1_contig_order.zip - order of contigs as gff3 files
Unfiltered annotation
Straight from AUGUSTUS, with MAKER's AED scores
Darmor_v81_assembly.augustus_masked.propermodels.sorted.renamed.gff.gz - annotation in GFF format
Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.fasta.gz - proteins, fasta
Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.fasta.gz - transcripts, fasta
Filtered annotation
No AED=1 scores, transcripts longer than 100 bp, no Transposase domains
Darmor_v81_assembly.augustus_masked.propermodels.sorted_filtered.renamed.gff.gz - filtered predicted annotation in GFF format
Darmor_v81_assembly.all.maker.augustus_masked.transcripts.renamed.filtered.fasta.gz - filtered predicted transcripts
Darmor_v81_assembly.all.maker.augustus_masked.proteins.renamed.filtered.fasta.gz - filtered predicted proteins