Accessible Unlicensed Requires Authentication Published by De Gruyter February 29, 2016

A Markov random field-based approach for joint estimation of differentially expressed genes in mouse transcriptome data

Zhixiang Lin, Mingfeng Li, Nenad Sestan and Hongyu Zhao

Abstract

The statistical methodology developed in this study was motivated by our interest in studying neurodevelopment using the mouse brain RNA-Seq data set, where gene expression levels were measured in multiple layers in the somatosensory cortex across time in both female and male samples. We aim to identify differentially expressed genes between adjacent time points, which may provide insights on the dynamics of brain development. Because of the extremely small sample size (one male and female at each time point), simple marginal analysis may be underpowered. We propose a Markov random field (MRF)-based approach to capitalizing on the between layers similarity, temporal dependency and the similarity between sex. The model parameters are estimated by an efficient EM algorithm with mean field-like approximation. Simulation results and real data analysis suggest that the proposed model improves the power to detect differentially expressed genes than simple marginal analysis. Our method also reveals biologically interesting results in the mouse brain RNA-Seq data set.


Corresponding author: Hongyu Zhao, Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520, USA; and Department of Genetics, Yale School of Medicine, New Haven, Connecticut 06520, USA, e-mail:

Acknowledgments

We thank Matthew W. State for the financial support of the first author. All computations except that in Figure 4 were performed on the Yale University Biomedical High Performance Computing Center. This study was supported by National Institutes of Health [GM59507, CA154295, MH106934 and NS051869 (N.S.)], and National Science Foundation [DMS 1106738].

Conflict of interest statement: None declared.

References

Anders, S. and W. Huber (2010): “Differential expression analysis for sequence count data,” Genome Biol., 11, R106.Search in Google Scholar

Besag, J. (1986): “On the statistical analysis of dirty pictures,” J. R. Stat. Soc. B Methodol., 259–302.Search in Google Scholar

Celeux, G., F. Forbes and N. Peyrard (2003): “Em procedures using mean field-like approximations for markov model-based image segmentation,” Pattern Recogn., 36, 131–144.Search in Google Scholar

Chandler, D. (1987): Introduction to modern statistical mechanics, Oxford University Press, Oxford, UK, Vol. 5, pp. 119–158.Search in Google Scholar

Chen, M., J. Cho and H. Zhao (2011): “Incorporating biological pathways via a markov random field model in genome-wide association studies,” PLoS Genet., 7, e1001353.Search in Google Scholar

Efron, B. (2004): “Large-scale simultaneous hypothesis testing,” J. Am. Stat. Assoc., 99, 96–104.Search in Google Scholar

Fernández-Medarde, A., A. Porteros, J. De Las Rivas, A. Núñez, J. Fuster and E. Santos (2007): “Laser microdissection and microarray analysis of the hippocampus of ras-grf1 knockout mice reveals gene expression changes affecting signal transduction pathways related to memory and learning,” Neuroscience, 146, 272–285.Search in Google Scholar

Fertuzinhos, S., M. Li, Y. I. Kawasawa, V. Ivic, D. Franjic, D. Singh, M. Crair and N. Šestan (2014): “Laminar and temporal expression dynamics of coding and noncoding rnas in the mouse neocortex,” Cell Rep., 6, 938–950.Search in Google Scholar

Geschwind, D. H. and P. Levitt (2007): “Autism spectrum disorders: developmental disconnection syndromes,” Curr. Opin. Neurobiol., 17, 103–111.Search in Google Scholar

Glaus, P., A. Honkela and M. Rattray (2012): “Identifying differentially expressed transcripts from rna-seq data with biological variation,” Bioinformatics, 28, 1721–1728.Search in Google Scholar

Grimm, J., M. Sachs, S. Britsch, S. Di Cesare, T. Schwarz-Romond, K. Alitalo and W. Birchmeier (2001): “Novel p62dok family members, dok-4 and dok-5, are substrates of the c-ret receptor tyrosine kinase and mediate neuronal differentiation,” J. Cell Biol., 154, 345–354.Search in Google Scholar

Huang, D., B. T. Sherman and R. A. Lempicki (2008): “Systematic and integrative analysis of large gene lists using david bioinformatics resources,” Nat. Protoc., 4, 44–57.Search in Google Scholar

Kwan, K. Y., N. Sestan and E. Anton (2012): “Transcriptional co-regulation of neuronal migration and laminar identity in the neocortex,” Development, 139, 1535–1546.Search in Google Scholar

Leng, N., J. A. Dawson, J. A. Thomson, V. Ruotti, A. I. Rissman, B. M. Smits, J. D. Haag, M. N. Gould, R. M. Stewart and C. Kendziorski (2013): “Ebseq: an empirical bayes hierarchical model for inference in rna-seq experiments,” Bioinformatics, 29, 1035–1043.Search in Google Scholar

Li, B. and C. N. Dewey (2011): “Rsem: accurate transcript quantification from rna-seq data with or without a reference genome,” BMC bioinformatics, 12, 323.Search in Google Scholar

Li, C., Z. Wei and H. Li (2010a): “Network-based empirical bayes methods for linear models with applications to genomic data,” J. Biopharm. Stat., 20, 209–222.Search in Google Scholar

Li, H., Z. Wei and J. Maris (2010b): “A hidden markov random field model for genome-wide association studies,” Biostatistics, 11, 139–150.Search in Google Scholar

Lin, Z., S. J. Sanders, M. Li, N. Sestan, M. W. State and H. Zhao (2015): “A markov random field-based approach to characterizing human brain development using spatial-temporal transcriptome data,” Annal. Appl. Stat., 9, 429–451.Search in Google Scholar

McCarthy, D. J., Y. Chen and G. K. Smyth (2012): “Differential expression analysis of multifactor rna-seq experiments with respect to biological variation,” Nuc. Acids Res., 40, 4288–4297.Search in Google Scholar

Nariai, N., O. Hirose, K. Kojima and M. Nagasaki (2013): “Tigar: transcript isoform abundance estimation method with gapped alignment of rna-seq data by variational bayesian inference,” Bioinformatics, 29, 2292–2299.Search in Google Scholar

Newton, M. A., C. M. Kendziorski, C. S. Richmond, F. R. Blattner and K.-W. Tsui (2001): “On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data,” J. Comput. Biol., 8, 37–52.Search in Google Scholar

Nicolae, M., S. Mangul, I. I. Mandoiu and A. Zelikovsky (2011): “Estimation of alternative splicing isoform frequencies from rna-seq data.” Algorithms Mol. Biol., 6, 9.Search in Google Scholar

Pletikos, M., A. M. Sousa, G. Sedmak, K. A. Meyer, Y. Zhu, F. Cheng, M. Li, Y. I. Kawasawa and N. Sestan (2014): “Temporal specification and bilaterality of human neocortical topographic gene expression,” Neuron, 81, 321–332.Search in Google Scholar

Robinson, M. D. and G. K. Smyth (2007): “Moderated statistical tests for assessing differences in tag abundance,” Bioinformatics, 23, 2881–2887.Search in Google Scholar

Robinson, M. D. and G. K. Smyth (2008): “Small-sample estimation of negative binomial dispersion, with applications to sage data,” Biostatistics, 9, 321–332.Search in Google Scholar

Robinson, M. D., D. J. McCarthy and G. K. Smyth (2010): “Edger: a bioconductor package for differential expression analysis of digital gene expression data,” Bioinformatics, 26, 139–140.Search in Google Scholar

Rossell, D., C. S.-O. Attolini, M. Kroiss and A. Stöcker (2014): “Quantifying alternative splicing from paired-end rna-sequencing data,” Ann. Appl. Stat., 8, 309.Search in Google Scholar

Sestan, N. and M. W. State (2012): “The emerging biology of autism spectrum disorders,” Science, 337, 1301.Search in Google Scholar

Sherman, B. T., R. A. Lempicki and W. Huang da (2009): “Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists,” Nuc. Acids Res., 37, 1–13.Search in Google Scholar

Trapnell, C., D. G. Hendrickson, M. Sauvageau, L. Goff, J. L. Rinn and L. Pachter (2013): “Differential analysis of gene regulation at transcript resolution with rna-seq,” Nat. Biotechnol., 31, 46–53.Search in Google Scholar

Trapnell, C., B. A. Williams, G. Pertea, A. Mortazavi, G. Kwan, M. J. Van Baren, S. L. Salzberg, B. J. Wold and L. Pachter (2010): “Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation,” Nat Biotechnol., 28, 511–515.Search in Google Scholar

Walsh, C. A., E. M. Morrow and J. L. Rubenstein (2008): “Autism and brain development,” Cell, 135, 396–400.Search in Google Scholar

Wang, Z., M. Gerstein and M. Snyder (2009): “Rna-seq: a revolutionary tool for transcriptomics,” Nat. Rev. Genet., 10, 57–63.Search in Google Scholar

Wei, Z. and H. Li (2007): “A markov random field model for network-based analysis of genomic data,” Bioinformatics, 23, 1537–1544.Search in Google Scholar

Wei, Z. and H. Li (2008): “A hidden spatial-temporal markov random field model for network-based analysis of time course gene expression data,” Ann. Appl. Stat., 2, 408–429.Search in Google Scholar

Zhang, J. (1992): “The mean field theory in em procedures for markov random fields,” IEEE T. Signal Proces., 40, 2570–2583.Search in Google Scholar

Zhou, X., H. Lindsay and M. D. Robinson (2014): “Robustly detecting differential expression in rna sequencing data using observation weights,” Nuc. Acids Res., 42, e91.Search in Google Scholar

Supplemental Material:

The online version of this article (DOI: 10.1515/sagmb-2015-0070) offers supplementary material, available to authorized users.

Published Online: 2016-2-29
Published in Print: 2016-4-1

©2016 by De Gruyter