Dryad Home > Main > Dryad Data Packages > View Item

Data from: Accurate estimation of substitution rates with neighbour-dependent models in a phylogenetic context

When using this data, please cite the original article:

Bérard J, Guéguen L (2012) Accurate estimation of substitution rates with neighbour-dependent models in a phylogenetic context. Systematic Biology, online in advance of print. doi:10.1093/sysbio/sys024

Additionally, please cite the Dryad data package:

Bérard J, Guéguen L (2012) Data from: Accurate estimation of substitution rates with neighbour-dependent models in a phylogenetic context. Dryad Digital Repository. doi:10.5061/dryad.5vp21b10
Cite | Share
Download the data package citation in the following formats:
   RIS (compatible with EndNote, Reference Manager, ProCite, RefWorks)
   BibTex (compatible with BibDesk, LaTeX)
Dryad Package Identifier doi:10.5061/dryad.5vp21b10    93 views  
Abstract Most models and algorithms developed to perform statistical inference from DNA data make the assumption that substitution processes affecting distinct nucleotide sites are stochastically independent. This assumption ensures both mathematical and computational tractability, but is in disagreement with observed data in many situations -- one well-known example being CpG dinucleotide hypermutability in mammalian genomes. In this paper, we consider the class of RN95+YpR substitution models, which allows neighbour-dependent effects -- including CpG hypermutability -- to be taken into account, through transitions between pyrimidine-purine dinucleotides. We show that it is possible to adapt inference methods originally developed under the assumption of independence between sites to RN95+YpR models, using a mathematically rigorous framework provided by specific structural properties of this class of models. We assess how efficient this approach is at inferring the CpG hypermutability rate from aligned DNA sequences. The method is tested on simulated data and compared against several alternatives; the results suggest that it delivers a high degree of accuracy at a low computational cost. We then apply our method to an alignment of ten DNA sequences from primate species. Model comparisons within the RN95+YpR class show the importance of taking into account neighbour-dependent effects. An application of the method to the detection of hypomethylated islands is discussed.
Keywords neighbour-dependent substitution, CpG hypermutability, CpG islands, maximum likelihood phylogeny,
Date Deposited 2012-01-27T19:18:53Z
Show Full Metadata

appendix    4 views   7 downloads View File Details
Download: appendix.pdf ( 191.5Kb )
To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data.  



ENm001    6 views   6 downloads View File Details
Sequence of the ENm001 region of the ENCODE Pilot Project (position 115,810,521 to position 117,687,946 on chromosome 7) from the hg19 version of the human genome, together with aligned sequences from nine other primate species (Chimpanzee, Gorilla, Orang-utan, Macaque, Baboon, Marmoset, Tarsier, Gray mouse lemur, Galago) as available from the Galaxy web tool (http://galaxy.psu.edu/).
Download: ENm001.fa ( 20.49Mb )
To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data.  



ENm001_AR    6 views   6 downloads View File Details
Portion of ENm001 made from "Ancestral" repeated elements, according to the RepeatMasker annotations on the human sequence from the UCSC Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables) Pieces of the alignment corresponding to simple repeats, low complexity regions, members of the Alu family, and RNA elements that diverged less than 25% and L1 elements that diverged less than 20% from the reference RepBase sequence, were removed.
Download: ENm001_AR.fa ( 4.661Mb )
To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data.  


My Account

Browse

Information