Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido


IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

Online
ISSN
1544-6115
See all formats and pricing
More options …
Volume 17, Issue 6

Issues

Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

A novel method to accurately calculate statistical significance of local similarity analysis for high-throughput time series

Fang Zhang / Ang Shan / Yihui Luan
Published Online: 2018-11-17 | DOI: https://doi.org/10.1515/sagmb-2018-0019

Abstract

In recent years, a large number of time series microbial community data has been produced in molecular biological studies, especially in metagenomics. Among the statistical methods for time series, local similarity analysis is used in a wide range of environments to capture potential local and time-shifted associations that cannot be distinguished by traditional correlation analysis. Initially, the permutation test is popularly applied to obtain the statistical significance of local similarity analysis. More recently, a theoretical method has also been developed to achieve this aim. However, all these methods require the assumption that the time series are independent and identically distributed. In this paper, we propose a new approach based on moving block bootstrap to approximate the statistical significance of local similarity scores for dependent time series. Simulations show that our method can control the type I error rate reasonably, while theoretical approximation and the permutation test perform less well. Finally, our method is applied to human and marine microbial community datasets, indicating that it can identify potential relationship among operational taxonomic units (OTUs) and significantly decrease the rate of false positives.

This article offers supplementary material which is provided at the end of the article.

Keywords: local similarity analysis; moving block bootstrap; statistical significance

References

  • Andersson, M. G. I., M. Berga, E. S. Lindström and S. Langenheder (2014): “The spatial structure of bacterial communities is influenced by historical environmental conditions,” Ecology, 95, 1134–1140.Web of SciencePubMedCrossrefGoogle Scholar

  • Balasubramaniyan, R., E. Hüllermeier, N. Weskamp and J. Kämper (2005): “Clustering of gene expression data using a local shape-based similarity measure,” Bioinformatics, 21, 1069–1077.CrossrefPubMedGoogle Scholar

  • Barberán, A., S. T. Bates, E. O. Casamayor and N. Fierer (2011): “Using network analysis to explore co-occurrence patterns in soil microbial communities,” ISME J., 6, 343–351.Web of SciencePubMedGoogle Scholar

  • Beman, J. M., J. A. Steele and J. A. Fuhrman (2011): “Co-occurrence patterns for abundant marine archaeal and bacterial lineages in the deep chlorophyll maximum of coastal california,” ISME J., 5, 1077–1085.CrossrefWeb of SciencePubMedGoogle Scholar

  • Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: A practical and powerful approach to multiple testing,” J. R. Stat. Soc. B, 57, 289–300.Google Scholar

  • Berkowitz, J. and L. Kilian (2000): “Recent developments in bootstrapping time series,” Economet. Rev., 19, 1–48.CrossrefGoogle Scholar

  • Caporaso, J. G., C. L. Lauber, E. K. Costello, D. Berg-Lyons, A. Gonzalez, J. Stombaugh, D. Knights, P. Gajer, J. Ravel, N. Fierer, J. I. Gordon and R. Knight (2011): “Moving pictures of the human microbiome,” Genome Biol., 12, R50.Web of ScienceCrossrefPubMedGoogle Scholar

  • Carlstein, E. (1986): “The use of subseries values for estimating the variance of a general statistic from a stationary sequence,” Ann. Stat., 14, 1171–1179.CrossrefGoogle Scholar

  • Chaffron, S., H. Rehrauer, J. Pernthaler and C. von Mering (2010): “A global network of coexisting microbes from environmental and whole-genome sequence data,” Genome Res., 20, 947–959.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Cram, J. A., L. C. Xia, D. M. Needham, R. Sachdeva, F. Sun and J. A. Fuhrman (2015): “Cross-depth analysis of marine bacterial networks suggests downward propagation of temporal changes,” ISME J., 9, 2573–2586.Web of SciencePubMedCrossrefGoogle Scholar

  • Durno, W. E., Hanson, N. W., Konwar, K. M & Hallam, S. J. 2013, ‘Expanding the boundaries of local similarity analysis’, BMC Genomics, vol. 14, pp. S3–.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Faust, K., J. F. Sathirapongsasuti, J. Izard, N. Segata, D. Gevers, J. Raes and C. Huttenhower (2012): “Microbial co-occurrence relationships in the human microbiome,” PLOS Comput. Biol., 8, 1–17.Web of ScienceGoogle Scholar

  • Faust, K., L. Lahti, D. Gonze, W. M. de Vos and J. Raes (2015): “Metagenomics meets time series analysis: unraveling microbial community dynamics,” Curr. Opin. Microbiol., 25, 56–66.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Fierer, N., D. Nemergut, R. Knight and J. M. Craine (2010): “Changes through time: integrating microorganisms into the study of succession,” Res. Microbiol., 161, 635–642.Web of ScienceCrossrefGoogle Scholar

  • Fuhrman, J. A., I. Hewson, M. S. Schwalbach, J. A. Steele, M. V. Brown and S. Naeem (2006): “Annually reoccurring bacterial communities are predictable from ocean conditions,” Proc. Natl. Acad. Sci. USA, 103, 13104–13109.CrossrefGoogle Scholar

  • Gilbert, J. A., J. A. Steele, J. G. Caporaso, L. Steinbrück, J. Reeder, B. Temperton, S. Huse, A. C. McHardy, R. Knight, I. Joint, P. Somerfield, J. A. Fuhrman and D. Field (2012): “Defining seasonal marine microbial community dynamics,” ISME J., 6, 298–308.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Giovannoni, S. J. and K. L. Vergin (2012): “Seasonality in ocean microbial communities,” Science, 335, 671–676.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Gonçalves, J. and S. Madeira (2014): “Latebiclustering: Efficient heuristic algorithm for time-lagged bicluster identification,” IEEE/ACM T. Comput. Bi, 11, 801–813.CrossrefWeb of ScienceGoogle Scholar

  • Ji, L. and K.-L. Tan (2004): “Mining gene expression data for positive and negative co-regulated gene clusters,” Bioinformatics, 20, 2711–2718.CrossrefPubMedGoogle Scholar

  • Künsch, H. R. (1989): “The jackknife and the bootstrap for general stationary observations,” Ann. Stat., 17, 1217–1241.CrossrefGoogle Scholar

  • Liu, R. Y. and K. Singh (1992): Moving blocks jackknife and bootstrap capture weak dependence, New York: John Wiley, pp. 225–248.Google Scholar

  • Lagnoux, A., S. Mercier, P. Vallois (2017): “Statistical significance based on length and position of the local score in a model of i.i.d. sequences,” Bioinformatics, 33, 654–660.Web of ScienceGoogle Scholar

  • Ljung, G. M. and G. E. P. Box (1978): “On a measure of lack of fit in time series models,” Biometrika, 65, 297–303.CrossrefGoogle Scholar

  • Madeira, S. C., M. C. Teixeira, I. Sa-Correia and A. L. Oliveira (2010): “Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm,” IEEE/ACM T. Comput. Bi, 7, 153–165.CrossrefWeb of ScienceGoogle Scholar

  • Mudelsee, M. (2010): Climate Time Series Analysis: Classical Statistical and Bootstrap Methods, Dordrecht: Atmospheric and Oceanographic Sciences Library, Springer.Google Scholar

  • Palmer, C., E. M. Bik, D. B. DiGiulio, D. A. Relman and P. O. Brown (2007): “Development of the human infant intestinal microbiota,” PLOS Biol., 5, 1–18.Web of ScienceGoogle Scholar

  • Pei, Y., Q. Gao, J. Li and X. Zhao (2014): “Identifying local co-regulation relationships in gene expression data,” J. Theor. Biol., 360, 200–207.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Qian, J., M. Dolled-Filhart, J. Lin, H. Yu and M. Gerstein (2001): “Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions11edited by f. cohen,” J. Mol. Biol., 314, 1053–1066.CrossrefGoogle Scholar

  • Qin, J., R. Li, J. Raes, M. Arumugam, K. S. Burgdorf, C. Manichanh, T. Nielsen, N. Pons, F. Levenez, T. Yamada, D. R. Mende, J. Li, J. Xu, S. Li, D. Li, J. Cao, B. Wang, H. Liang, H. Zheng, Y. Xie, J. Tap, P. Lepage, M. Bertalan, J.-M. Batto, T. Hansen, D. Le Paslier, A. Linneberg, H. B. Nielsen, E. Pelletier, P. Renault, T. Sicheritz-Ponten, K. Turner, H. Zhu, C. Yu, S. Li, M. Jian, Y. Zhou, Y. Li, X. Zhang, S. Li, N. Qin, H. Yang, J. Wang, S. Brunak, J. Doré, F. Guarner, K. Kristiansen, O. Pedersen, J. Parkhill, J. Weissenbach, M. Consortium, P. Bork, S. D. Ehrlich and J. Wang (2010): “A human gut microbial gene catalogue established by metagenomic sequencing,” Nature, 464, 59–65.Web of ScienceCrossrefPubMedGoogle Scholar

  • Ruan, Q., D. Dutta, M. S. Schwalbach, J. A. Steele, J. A. Fuhrman and F. Sun (2006): “Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors,” Bioinformatics, 22, 2532–2538.PubMedCrossrefGoogle Scholar

  • Shade, A., J. S. Read, N. D. Youngblut, N. Fierer, R. Knight, T. K. Kratz, N. R. Lottig, E. E. Roden, E. H. Stanley, J. Stombaugh, R. J. Whitaker, C. H. Wu and K. D. McMahon (2012): “Lake microbial communities are resilient after a whole-ecosystem disturbance,” ISME J., 6, 2153–2167.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Shade, A., J. Gregory Caporaso, J. Handelsman, R. Knight and N. Fierer (2013): “A meta-analysis of changes in bacterial and archaeal communities with time,” ISME J., 7, 1493–1506.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Sherman, M., F. M. Speed Jr and F. M. Speed (1998): “Analysis of tidal data via the blockwise bootstrap,” J. Appl. Stat., 25, 333–340.CrossrefGoogle Scholar

  • Steele, J. A., P. D. Countway, L. Xia, P. D. Vigil, J. M. Beman, D. Y. Kim, C.-E. T. Chow, R. Sachdeva, A. C. Jones, M. S. Schwalbach, J. M. Rose, I. Hewson, A. Patel, F. Sun, D. A. Caron and J. A. Fuhrman (2011): “Marine bacterial, archaeal and protistan association networks reveal ecological linkages,” ISME J., 5, 1414–1425.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Storey, J. D. (2002): “A direct approach to false discovery rates,” J. R. Stat. Soc. B, 64, 479–498.CrossrefGoogle Scholar

  • Storey, J. D., A. J. Bass, A. Dabney and D. Robinson (2015): qvalue: Q-value estimation for false discovery rate control. R package version 2.6.0.

  • The Human Microbiome Project Consortium. (2012): “Structure, function and diversity of the healthy human microbiome,” Nature, 486, 207–214.PubMedWeb of ScienceGoogle Scholar

  • Trosvik, P., N. C. Stenseth and K. Rudi (2010): “Convergent temporal dynamics of the human infant gut microbiota,” ISME J., 4, 151–158.Web of ScienceCrossrefPubMedGoogle Scholar

  • Weiss, S., W. V. Treuren, C. Lozupone, K. Faust, J. Friedman, D. Ye, L. C. Xia, Z. Z. Xu, L. Ursell, E. J. Alm, A. Birmingham, J. A. Cram, J. A. Fuhrman, J. Raes, F. Sun, J. Zhou and R. Knight (2016): “Correlation detection strategies in microbial data sets vary widely in sensitivityand precision.” ISME J., 10, 1669–1681.CrossrefGoogle Scholar

  • Waterman, M. S. (1995): Introduction to Computational Biology: Maps, Sequences and Genomes, NY, USA: Chapman and Hall/CRC.Google Scholar

  • Xia, L. C., J. A. Steele, J. A. Cram, Z. G. Cardon, S. L. Simmons, J. J. Vallino, J. A. Fuhrman and F. Sun (2011): “Extended local similarity analysis (elsa) of microbial community and other time series data with replicates,” BMC Syst. Biol., 5, S15.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Xia, L. C., D. Ai, J. Cram, J. A. Fuhrman and F. Sun (2013): “Efficient statistical significance approximation for local similarity analysis of high-throughput time series data,” Bioinformatics, 29, 230–237.Web of ScienceCrossrefPubMedGoogle Scholar

  • Xia, L. C., D. Ai, J. A. Cram, X. Liang, J. A. Fuhrman and F. Sun (2015): “Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of markov chains,” BMC Bioinformatics, 16, 301.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Zhou, J., Y. Deng, P. Zhang, K. Xue, Y. Liang, J. D. Van Nostrand, Y. Yang, Z. He, L. Wu, D. A. Stahl, T. C. Hazen, J. M. Tiedje and A. P. Arkin (2014): “Stochasticity, succession, and environmental perturbations in a fluidic ecosystem,” Proc. Natl. Acad. Sci. USA, 111, 836–845.CrossrefGoogle Scholar

About the article

Published Online: 2018-11-17


Funding Source: Natural Science Foundation of China

Award identifier / Grant number: 11371227, 61432010, 11626247

The research was supported by the Natural Science Foundation of China Grants (Funder Id: 10.13039/501100001809, 11371227, 61432010, 11626247).


Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 17, Issue 6, 20180019, ISSN (Online) 1544-6115, DOI: https://doi.org/10.1515/sagmb-2018-0019.

Export Citation

©2018 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Supplementary Article Materials

Comments (0)

Please log in or register to comment.
Log in