SNP discovery

From Applied Bioinformatics Group
Jump to: navigation, search

Launch autoSNPdb

Molecular genetic markers describe genetic variations and provide a link between observed phenotypes and the underlying genotype. Single Nucleotide Polymorphisms (SNPs) may be considered the ultimate genetic marker as they represent the finest resolution of a DNA sequence, are generally abundant in populations and have a low mutation rate. However, SNP markers can be costly to develop, especially where resequencing from multiple individuals is required. The mining of readily available sequence data significantly reduces the costs associated with SNP discovery). Several methods have been developed for SNP discovery from sequence data.

Where sequence trace files are available for comparison to filter out polymorphisms in traces of dubious quality, software such as PolyBayes and Polyphred are the most efficient means to differentiate between true SNPs and sequence error. In cases where trace files are unavailable, the identification of sequence errors can be based on two further methods to determine SNP confidence; redundancy of the polymorphism in an alignment, and co-segregation of SNPs with haplotype.

The frequency of occurrence of a polymorphism at a particular locus provides a measure of confidence in the SNP representing a true polymorphism and is referred to as the SNP redundancy score. In addition, true SNPs that represent divergence between homologous genes co-segregate to define a conserved haplotype. A co-segregation score based on whether a SNP position contributes to defining a haplotype is a further independent measure of SNP confidence. The SNP score and co-segregation score to-gether provide a valuable means for estimating confidence in the validity of SNPs within aligned sequences independent of sequence trace files. Two methods currently apply a combination of redundancy and haplotype co-segregation; autoSNP (Barker et al, 2003,Batley et al, 2003), and SNPServer (Savage et al, 2005).

We have implemented the SNP discovery software autoSNP within a re-lational database to enable the efficient mining of the identified polymorph-isms and the detailed interrogation of the data. AutoSNP was selected because it does not require sequence trace files and is thus applicable to a broader range of species and datasets. The results from autoSNP have previously been integrated with additional data such as gene annotation (Love et al. 2004) and the wheat SNP database cere-alsdb. However, this is the first development of an integrated system for SNP discovery, analysis and interrogation.

The implementation of autoSNPdb allows researchers to query the re-sults of SNP analysis to characterise SNPs between specific groups of individuals or within genes with predicted function. The system is flexible and researchers may add additional levels of annotation, and perform novel queries specific to their area of interest.


  • Hayward A, Dalton-Morgan J, Mason A, Zander M, Edwards D and Batley J. SNP discovery and applications in Brassica napus. Journal of Plant Biotechnology. (Accepted March 2012)
  • Raman R, Taylor B, Marcroft S, Stiller J, Eckermann P, Coombes N, Rehman A, Lindbeck K, Luckett D, Wratten N, Batley J, Edwards D, Wang X, Raman H. Molecular mapping of qualitative and quantitative loci for resistance to blackleg disease in canola (Brassica napus L). Theoretical and Applied Genetics (accepted February 2012)
  • Lai K, Lorenc MT and Edwards D. (2012) Genomic databases for crop improvement. Agronomy 2: 67-73
  • Azam S, Thakur V, Pradeep R, Shah T, Jayashree B, BhanuPrakash A, Farmer AD, Studholme DJ, May GD, Edwards D, Jones JDG and Varshney R. (2012) Coverage based consensus calling (CBCC) of short sequence reads and comparison of CBCC-results for the identification of SNPs in chickpea, a crop species without the reference genome. American Journal of Botany 99 (2): 186-192
  • Lee H, Lai K, Lorenc MT, Imelfort M, Duran C and Edwards D. (2012) Bioinformatics tools and databases for analysis of next generation sequence data. Briefings in Functional Genomics 11 (1), 12-24
  • Duran C, Eales D, Marshall D, Imelfort M, Stiller J, Berkman P, Clark T, McKenzie M, Appleby N, Batley J, Basford K, and Edwards D. (2010) Future tools for association mapping in crop plants. Genome 53: 1017-1023
  • Duran C, Appleby N,, Edwards D and Batley J. (2009) Molecular genetic markers: discovery, applications, data storage and visualisation. Current Bioinformatics 4:16-27
  • Duran C, Appleby N, Clark T, Wood D, Imelfort M, Batley J and Edwards D. (2009) AutoSNPdb: An Annotated Single Nucleotide Polymorphism Database for Crop Plants. Nucleic Acids Research 37: 951–953
  • Imelfort M, Duran C, Batley J and Edwards D. (2009) Discovering genetic polymorphisms in next generation sequencing data. Plant Biotechnology Journal 7 (4): 312 – 317
  • Duran C, Appleby N, Vardy M, Imelfort M, Edwards D and Batley J. (2009) Single Nucleotide Polymorphism Discovery in Barley using AutoSNPdb. Plant Biotechnology Journal 7 (4): 326 – 333
  • Savage D, Batley J, Erwin T, Logan E, Love CG, Lim GAC, Mongin E, Barker GLA, Spangenberg GC and Edwards D. (2005) SNPServer: A Realtime SNP Discovery tool. Nucleic Acids Research 33: D656-D659
  • Batley J, Barker G, O'Sullivan H, Edwards KJ and Edwards D. (2003) Mining for Single Nucleotide Polymorphisms and Insertions/Deletions in Maize Expressed Sequence Tag Data. Plant Physiology 132: 84-91
  • Barker G, Batley J, O'Sullivan H, Edwards KJ and Edwards D. (2003) Redundancy Based Detection of Sequence Polymorphisms in Expressed Sequence Tag Data using AutoSNP. Bioinformatics 19: 421-422

Back to main page