Jump to ContentJump to Main Navigation
Show Summary Details

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year

IMPACT FACTOR increased in 2015: 1.265
5-year IMPACT FACTOR: 1.423
Rank 42 out of 123 in category Statistics & Probability in the 2015 Thomson Reuters Journal Citation Report/Science Edition

SCImago Journal Rank (SJR) 2015: 0.954
Source Normalized Impact per Paper (SNIP) 2015: 0.554
Impact per Publication (IPP) 2015: 1.061

Mathematical Citation Quotient (MCQ) 2015: 0.06

See all formats and pricing
Volume 10, Issue 1 (Nov 2011)

Modeling Read Counts for CNV Detection in Exome Sequencing Data

Michael I. Love
  • Max Planck Institute for Molecular Genetics
/ Alena Myšičková
  • Max Planck Institute for Molecular Genetics
/ Ruping Sun
  • Max Planck Institute for Molecular Genetics
/ Vera Kalscheuer
  • Max Planck Institute for Molecular Genetics
/ Martin Vingron
  • Max Planck Institute for Molecular Genetics
/ Stefan A. Haas
  • Max Planck Institute for Molecular Genetics
Published Online: 2011-11-08 | DOI: https://doi.org/10.2202/1544-6115.1732

Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods.

Keywords: exorne sequencing; targeted sequencing; CNV; copy number variant; HMM; hidden Markov model

We thank our collaborators on the XLID project, Prof. Dr. H.-Hilger Ropers, Wei Chen, Hao Hu, Reinhard Ullmann and the EUROMRX consortium for providing the XLID data, validation of CNVs and for helpful discussion. We also thank Ho-Ryun Chung for suggestions. Part of this work was financed by the European Union’s Seventh Framework Program under grant agreement number 241995, project GENCODYS.

  • 1000 Genomes Project Consortium (2010): “A map of human genome variation from population-scale sequencing,” Nature, 467, 1061–1073.

  • Alkan, C., J. M. Kidd, T. Marques-Bonet, G. Aksay, F. Antonacci, F. Hormozdiari, J. O. Kitzman, C. Baker, M. Malig, O. Mutlu, S. C. Sahinalp, R. A. Gibbs, and E. E. Eichler (2009): “Personalized copy number and segmental duplication maps using next-generation sequencing,” Nature Genetics, 41, 1061–1067. [PubMed] [Crossref] [Web of Science]

  • Anders, S. and W. Huber (2010): “Differential expression analysis for sequence count data.” Genome biology, 11, R106+. [Crossref]

  • Benjamini, Y. and T. P. Speed (2011): “Estimation and correction for GC-content bias in high throughput sequencing,” Technical report, University of California at Berkeley.

  • Bliss, C. I. and R. A. Fisher (1953): “Fitting the Negative Binomial Distribution to Biological Data,” Biometrics, 9.

  • Boeva, V., A. Zinovyev, K. Bleakley, J.-P. Vert, I. Janoueix-Lerosey, O. Delattre, and E. Barillot (2011): “Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization,” Bioinformatics, 27, 268–269. [PubMed] [Web of Science] [Crossref]

  • Campbell, P. J., P. J. Stephens, E. D. Pleasance, S. O’Meara, H. Li, T. Santarius, L. A. Stebbings, C. Leroy, S. Edkins, C. Hardy, J. W. Teague, A. Menzies, I. Goodhead, D. J. Turner, C. M. Clee, M. A. Quail, A. Cox, C. Brown, R. Durbin, M. E. Hurles, P. A. W. Edwards, G. R. Bignell, M. R. Stratton, and P. A. Futreal (2008): “Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing,” Nature Genetics, 40, 722–729. [Crossref] [Web of Science] [PubMed]

  • Chiang, D. Y., G. Getz, D. B. Jaffe, M. J. T. O’Kelly, X. Zhao, S. L. Carter, C. Russ, C. Nusbaum, M. Meyerson, and E. S. Lander (2008): “High-resolution mapping of copy-number alterations with massively parallel sequencing,” Nature Methods, 6, 99–103. [Crossref] [PubMed] [Web of Science]

  • Conrad, D. F., D. Pinto, R. Redon, L. Feuk, O. Gokcumen, Y. Zhang, J. Aerts, T. D. Andrews, C. Barnes, P. Campbell, T. Fitzgerald, M. Hu, C. H. Ihm, K. Kristiansson, D. G. MacArthur, J. R. MacDonald, I. Onyiah, A. W. Pang, S. Robson, K. Stirrups, A. Valsesia, K. Walter, J. Wei, C. Tyler-Smith, N. P. Carter, C. Lee, S. W. Scherer, and M. E. Hurles (2010): “Origins and functional impact of copy number variation in the human genome,” Nature, 464, 704–712. [Web of Science]

  • Fridlyand, J. (2004): “Hidden Markov models approach to the analysis of array CGH data,” Journal of Multivariate Analysis, 90, 132–153.

  • Gentleman, R., V. Carey, D. Bates, B. Bolstad, M. Dettling, S. Dudoit, B. Ellis, L. Gautier, Y. Ge, J. Gentry, K. Hornik, T. Hothorn, W. Huber, S. Iacus, R. Irizarry, F. Leisch, C. Li, M. Maechler, A. Rossini, G. Sawitzki, C. Smith, G. Smyth, L. Tierney, J. Yang, and J. Zhang (2004): “Bioconductor: open software development for computational biology and bioinformatics,” Genome Biology, 5, R80+. [Crossref]

  • Glessner, J. T., K. Wang, G. Cai, O. Korvatska, C. E. Kim, S. Wood, H. Zhang, A. Estes, C. W. Brune, J. P. Bradfield, M. Imielinski, E. C. Frackelton, J. Reichert, E. L. Crawford, J. Munson, P. M. A. Sleiman, R. Chiavacci, K. Annaiah, K. Thomas, C. Hou, W. Glaberson, J. Flory, F. Otieno, M. Garris, L. Soorya, L. Klei, J. Piven, K. J. Meyer, E. Anagnostou, T. Sakurai, R. M. Game, D. S. Rudd, D. Zurawiecki, C. J. McDougle, L. K. Davis, J. Miller, D. J. Posey, S. Michaels, A. Kolevzon, J. M. Silverman, R. Bernier, S. E. Levy, R. T. Schultz, G. Dawson, T. Owley, W. M. McMahon, T. H. Wassink, J. A. Sweeney, J. I. Nurnberger, H. Coon, J. S. Sutcliffe, N. J. Minshew, S. F. A. Grant, M. Bucan, E. H. Cook, J. D. Buxbaum, B. Devlin, G. D. Schellenberg, and H. Hakonarson (2009): “Autism genome-wide copy number variation reveals ubiquitin and neuronal genes,” Nature, 459, 569–573. [Web of Science]

  • Gonzalez, E., H. Kulkarni, H. Bolivar, A. Mangano, R. Sanchez, G. Catano, R. J. Nibbs, B. I. Freedman, M. P. Quinones, M. J. Bamshad, K. K. Murthy, B. H. Rovin, W. Bradley, R. A. Clark, S. A. Anderson, R. J. O’Connell, B. K. Agan, S. S. Ahuja, R. Bologna, L. Sen, M. J. Dolan, and S. K. Ahuja (2005): “The Influence of CCL3L1 Gene-Containing Segmental Duplications on HIV-1/AIDS Susceptibility,” Science, 307, 1434–1440.

  • Harismendy, O., P. Ng, R. Strausberg, X. Wang, T. Stockwell, K. Beeson, N. Schork, S. Murray, E. Topol, S. Levy, and K. Frazer (2009): “Evaluation of next generation sequencing platforms for population targeted sequencing studies,” Genome Biology, 10, R32+. [Web of Science] [Crossref]

  • Hedges, D. J., T. Guettouche, S. Yang, G. Bademci, A. Diaz, A. Andersen, W. F. Hulme, S. Linker, A. Mehta, Y. J. K. Edwards, G. W. Beecham, E. R. Martin, M. A. Pericak-Vance, S. Zuchner, J. M. Vance, and J. R. Gilbert (2011): “Comparison of Three Targeted Enrichment Strategies on the SOLiD Sequencing Platform,” PLoS ONE, 6, e18595+. [Crossref] [Web of Science]

  • Herman, D. S., G. K. Hovingh, O. Iartchouk, H. L. Rehm, R. Kucherlapati, J. G. Seidman, and C. E. Seidman (2009): “Filter-based hybridization capture of subgenomes enables resequencing and copy-number detection.” Nature methods, 6, 507–510. [Crossref] [PubMed] [Web of Science]

  • Ivakhno, S., T. Royce, A. J. Cox, D. J. Evers, R. K. Cheetham, and S. Tavaré (2010): “CNAsega novel framework for identification of copy number changes in cancer from second-generation sequencing data,” Bioinformatics, 26, 3051–3058. [Crossref] [PubMed]

  • Kleinjan, D.-J. and V. van Heyningen (1998): “Position Effect in Human Genetic Disease,” Human Molecular Genetics, 7, 1611–1618. [PubMed] [Crossref]

  • Li, Y., N. Vinckenbosch, G. Tian, E. Huerta-Sanchez, T. Jiang, H. Jiang, A. Albrechtsen, G. Andersen, H. Cao, T. Korneliussen, N. Grarup, Y. Guo, I. Hellman, X. Jin, Q. Li, J. Liu, X. Liu, T. Sparso, M. Tang, H. Wu, R. Wu, C. Yu, H. Zheng, A. Astrup, L. Bolund, J. Holmkvist, T. Jorgensen, K. Kristiansen, O. Schmitz, T. W. Schwartz, X. Zhang, R. Li, H. Yang, J. Wang, T. Hansen, O. Pedersen, R. Nielsen, and J. Wang (2010): “Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants,” Nature Genetics, 42, 969–972. [Web of Science] [Crossref]

  • Madrigal, I., L. Rodríguez-Revenga, L. Armengol, E. González, B. Rodriguez, C. Badenas, A. Sánchez, F. Martínez, M. Guitart, I. Fernández, J. A. Arranz, M. Tejada, L. A. Pérez-Jurado, X. Estivill, and M. Milà (2007): “X-chromosome tiling path array detection of copy number variants in patients with chromosome X-linked mental retardation.” BMC genomics, 8, 443+. [Web of Science] [PubMed] [Crossref]

  • Marioni, J. C., N. P. Thorne, and S. Tavaré (2006): “BioHMM: a heterogeneous hidden Markov model for segmenting array CGH data.” Bioinformatics, 22, 1144–1146. [PubMed] [Crossref]

  • Medvedev, P., M. Stanciu, and M. Brudno (2009): “Computational methods for discovering structural variation with next-generation sequencing,” Nature Methods, 6, S13–S20. [Crossref] [PubMed] [Web of Science]

  • Miller, C. A., O. Hampton, C. Coarfa, and A. Milosavljevic (2011): “ReadDepth: A Parallel R Package for Detecting Copy Number Alterations from Short Sequencing Reads,” PLoS ONE, 6, e16327+. [Web of Science] [Crossref]

  • Nord, A., M. Lee, M. C. King, and T. Walsh (2011): “Accurate and exact CNV identification from targeted high-throughput sequence data,” BMC Genomics, 12, 184+. [Crossref] [Web of Science] [PubMed]

  • O’Roak, B. J., P. Deriziotis, C. Lee, L. Vives, J. J. Schwartz, S. Girirajan, E. Karakoc, A. P. MacKenzie, S. B. Ng, C. Baker, M. J. Rieder, D. A. Nickerson, R. Bernier, S. E. Fisher, J. Shendure, and E. E. Eichler (2011): “Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations,” Nature Genetics, 43, 585–589. [Crossref]

  • Pang, A., J. MacDonald, D. Pinto, J. Wei, M. Rafiq, D. Conrad, H. Park, M. Hurles, C. Lee, J. C. Venter, E. Kirkness, S. Levy, L. Feuk, and S. Scherer (2010): “Towards a comprehensive structural variation map of an individual human genome,” Genome Biology, 11, R52+. [Web of Science] [Crossref]

  • Pruitt, K. D., J. Harrow, R. A. Harte, C. Wallin, M. Diekhans, D. R. Maglott, S. Searle, C. M. Farrell, J. E. Loveland, B. J. Ruef, E. Hart, M.-M. M. Suner, M. J. Landrum, B. Aken, S. Ayling, R. Baertsch, J. Fernandez-Banet, J. L. Cherry, V. Curwen, M. Dicuccio, M. Kellis, J. Lee, M. F. Lin, M. Schuster, A. Shkeda, C. Amid, G. Brown, O. Dukhanina, A. Frankish, J. Hart, B. L. Maidak, J. Mudge, M. R. Murphy, T. Murphy, J. Rajan, B. Rajput, L. D. Riddick, C. Snow, C. Steward, D. Webb, J. A. Weber, L. Wilming, W. Wu, E. Birney, D. Haussler, T. Hubbard, J. Ostell, R. Durbin, and D. Lipman (2009): “The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.” Genome research, 19, 1316–1323. [Web of Science] [PubMed] [Crossref]

  • R Development Core Team (2011): R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria.

  • Rabiner, L. R. (1989): “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, 77, 257–286. [Crossref]

  • Robinson, M. D., D. J. McCarthy, and G. K. Smyth (2010): “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.” Bioinformatics (Oxford, England), 26, 139–140. [Crossref]

  • Sathirapongsasuti, J. F., H. Lee, B. A. Horst, G. Brunner, A. J. Cochran, S. Binder, J. Quackenbush, and S. F. Nelson (2011): “Exome Sequencing-Based Copy-Number Variation and Loss of Heterozygosity Detection: ExomeCNV.” Bioinformatics (Oxford, England). [Crossref] [Web of Science]

  • Sebat, J., B. Lakshmi, D. Malhotra, J. Troge, C. Lese-Martin, T. Walsh, B. Yamrom, S. Yoon, A. Krasnitz, J. Kendall, A. Leotta, D. Pai, R. Zhang, Y.-H. H. Lee, J. Hicks, S. J. Spence, A. T. Lee, K. Puura, T. Lehtimäki, D. Ledbetter, P. K. Gregersen, J. Bregman, J. S. Sutcliffe, V. Jobanputra, W. Chung, D. Warburton, M.-C. C. King, D. Skuse, D. H. Geschwind, T. C. Gilliam, K. Ye, and M. Wigler (2007): “Strong association of de novo copy number mutations with autism.” Science (New York, N.Y.), 316, 445–449. [Web of Science]

  • Shen, J. J. and N. R. Zhang (2011): “Change-Point Model on Non-Homogeneous Poisson Processes with Application in Copy Number Profiling by Next-Generation DNA Sequencing,” Technical report, Division of Biostatistics, Stanford University.

  • St Clair, D. (2009): “Copy number variation and schizophrenia.” Schizophrenia bulletin, 35, 9–12. [Crossref] [Web of Science]

  • Venkatraman, E. S. and A. B. Olshen (2007): “A faster circular binary segmentation algorithm for the analysis of array CGH data,” Bioinformatics, 23, 657–663. [PubMed] [Web of Science] [Crossref]

  • Weese, D., A.-K. Emde, T. Rausch, A. Döring, and K. Reinert (2009): “RazerSfast read mapping with sensitivity control,” Genome Research, 19, 1646–1654. [Web of Science] [Crossref] [PubMed]

  • Xie, C. and M. Tammi (2009): “CNV-seq, a new method to detect copy number variation using high-throughput sequencing,” BMC Bioinformatics, 10, 80+. [PubMed] [Crossref] [Web of Science]

  • Yoon, S., Z. Xuan, V. Makarov, K. Ye, and J. Sebat (2009): “Sensitive and accurate detection of copy number variants using read depth of coverage,” Genome Research, 19, 1586–1592. [Crossref] [PubMed] [Web of Science]

  • Zhang, J., L. Feuk, G. E. Duggan, R. Khaja, and S. W. Scherer (2006): “Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome,” Cytogenetic and Genome Research, 115, 205–214. [Crossref]

About the article

Published Online: 2011-11-08

Citation Information: Statistical Applications in Genetics and Molecular Biology, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.2202/1544-6115.1732. Export Citation

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Jae-Yong Nam, Nayoung K. D. Kim, Sang Cheol Kim, Je-Gun Joung, Ruibin Xi, Semin Lee, Peter J. Park, and Woong-Yang Park
Briefings in Bioinformatics, 2016, Volume 17, Number 2, Page 185
Pubudu Saneth Samarakoon, Hanne Sørmo Sorte, Asbjørg Stray-Pedersen, Olaug Kristin Rødningen, Torbjørn Rognes, and Robert Lyle
BMC Genomics, 2016, Volume 17, Number 1
Vera M. Kalscheuer, Victoria M. James, Miranda L. Himelright, Philip Long, Renske Oegema, Corinna Jensen, Melanie Bienek, Hao Hu, Stefan A. Haas, Maya Topf, A. Jeannette M. Hoogeboom, Kirsten Harvey, Randall Walikonis, and Robert J. Harvey
Frontiers in Molecular Neuroscience, 2016, Volume 8
Daniel Backenroth, Jason Homsy, Laura R. Murillo, Joe Glessner, Edwin Lin, Martina Brueckner, Richard Lifton, Elizabeth Goldmuntz, Wendy K. Chung, and Yufeng Shen
Nucleic Acids Research, 2014, Volume 42, Number 12, Page e97
Yan Guo, Quanghu Sheng, David C. Samuels, Brian Lehmann, Joshua A. Bauer, Jennifer Pietenpol, and Yu Shyr
BioMed Research International, 2013, Volume 2013, Page 1

Comments (0)

Please log in or register to comment.
Log in