SNP discovery

From Applied Bioinformatics Group
Revision as of 07:00, 10 September 2010 by Appbio (talk | contribs) (Created page with "autoSNPdb Launch [ autoSNPdb] Molecular genetic markers describe genetic variations and provide a link between observed phenotypes and the underlyi...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Launch autoSNPdb

Molecular genetic markers describe genetic variations and provide a link between observed phenotypes and the underlying genotype. Single Nucleotide Polymorphisms (SNPs) may be considered the ultimate genetic marker as they represent the finest resolution of a DNA sequence, are generally abundant in populations and have a low mutation rate (Syvanen, 2001). However, SNP markers can be costly to develop, especially where resequencing from multiple individuals is required. The mining of readily available sequence data significantly reduces the costs associated with SNP discovery (Taillon-Miller 1998). Several methods have been developed for SNP discovery from sequence data.

Where sequence trace files are available for comparison to filter out polymorphisms in traces of dubious quality, software such as PolyBayes and Polyphred are the most efficient means to differentiate between true SNPs and sequence error (Marth et al. 1999). In cases where trace files are unavailable, the identification of sequence errors can be based on two further methods to determine SNP confidence; redundancy of the polymorphism in an alignment, and co-segregation of SNPs with haplotype.

The frequency of occurrence of a polymorphism at a particular locus provides a measure of confidence in the SNP representing a true polymorphism and is referred to as the SNP redundancy score. In addition, true SNPs that represent divergence between homologous genes co-segregate to define a conserved haplotype. A co-segregation score based on whether a SNP position contributes to defining a haplotype is a further independent measure of SNP confidence. The SNP score and co-segregation score to-gether provide a valuable means for estimating confidence in the validity of SNPs within aligned sequences independent of sequence trace files. Two methods currently apply a combination of redundancy and haplotype co-segregation; autoSNP (Barker et al, 2003,Batley et al, 2003), and SNPServer (Savage et al, 2005).

We have implemented the SNP discovery software autoSNP within a re-lational database to enable the efficient mining of the identified polymorph-isms and the detailed interrogation of the data. AutoSNP was selected because it does not require sequence trace files and is thus applicable to a broader range of species and datasets. The results from autoSNP have previously been integrated with additional data such as gene annotation (Love et al. 2004, Lazarri et al. 2005) and the wheat SNP database cere-alsdb. However, this is the first development of an integrated system for SNP discovery, analysis and interrogation.

The implementation of autoSNPdb allows researchers to query the re-sults of SNP analysis to characterise SNPs between specific groups of individuals or within genes with predicted function. The system is flexible and researchers may add additional levels of annotation, and perform novel queries specific to their area of interest. References

   * Barker, G., Batley, J., O'Sullivan, H., Edwards, K.J. and Edwards, D. (2003) Redundancy based detection of sequence polymorphisms in expressed sequence tag data using autoSNP, Bioinformatics, 19, 421-422.
   * Batley, J., Barker, G., O'Sullivan, H., Edwards, K.J. and Edwards, D. (2003) Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence tag data, Plant Physiology, 132, 84-91.
   * Lazzari, B., Caprera, A., Vecchietti, A., Stella, A., Milanesi, L. and Pozzi, C. (2005) ESTree db: a tool for peach functional genomics, Bmc Bioinformatics, 6.
   * Love, C.G., Robinson, A.J., Lim, G.A.C., Hopkins, C.J., Batley, J., Barker, G., Span-genberg, G.C. and Edwards, D. (2005) Brassica ASTRA: an integrated database for Brassica genomic research, Nucleic Acids Research, 33, D656-D659.
   * Marth, G.T., Korf, I., Yandell, M.D., Yeh, R.T., Gu, Z.J., Zakeri, H., Stitziel, N.O., Hillier, L., Kwok, P.Y. and Gish W.R. (1999) A general approach to single nu-cleotide polymorphism discovery. Nat. Genet., 23, 452-456.
   * Savage, D., Batley, J., Erwin, T., Logan, E., Love, C.G., Lim, G.A.C., Mongin, E., Barker, G., Spangenberg, G.C. and Edwards, D. (2005) SNPServer: a real-time SNP discovery tool, Nucleic Acids Research, 33, W493-W495.
   * Syvanen, A.C. (2001) Accessing genetic variation: Genotyping single nucleotide polymorphisms, Nature Reviews Genetics, 2, 930-942.
   * Taillon-Miller, P., Gu, Z.J., Li, Q., Hillier, L. and Kwok, P.Y. (1998) Overlapping genomic sequences: A treasure trove of single-nucleotide polymorphisms, Genome Research, 8, 748-754