## 1 Introduction

Gene-regulatory networks (GRNs) are typically formulated as directed mathematical graphs whereby nodes stand for target genes, transcription factors (TFs), and microRNAs and edges stand for activating or repressing regulatory interactions. By edges we refer to directed edges here. TFs either activate or repress the transcription of target genes. MicroRNAs typically induce the degradation of messenger RNAs of their target genes. Hence, modern GRNs address the regulation of messenger RNA levels at transcriptional and post-transcriptional levels [1], [2]. Our group recently introduced a webserver termed TFmiR [1] that enables users to construct and analyze disease-specific TF and miRNA co-regulatory networks. Please see the methods section for more details on TFmiR.

Shen-Orr and Alon were the first to identify regulatory motifs in a GRN of *E.coli* that only consisted of TFs and target genes [3]. They discovered that feed-forward loops (FFLs) involving two TFs whereby TF1 regulates TF2 and both TFs jointly regulate a target gene are statistically significantly enriched in real GRNs with respect to randomized GRNs. Besides, they also discovered that single-input modules and densely overlapping regions are enriched too, but we will focus on FFL-type motifs here. Recently, several authors have expanded the concept of FFL-motifs to GRNs with TFs, miRNAs, and target genes [1], [2], [4]. In this context, proper randomization of GRNs becomes even more important for determining which FFL motifs are enriched in the real GRN. In our original TFmiR paper [1], we did not distinguish between the three possible types of regulatory links, TF → target gene, TF → miRNA, and miRNA → target gene, during randomization. However, Ohler and co-workers recently pointed out that an edge-type preserving randomization strategy may be beneficial whereby switching of edge end-points only takes place between two edges that both belong to either one of the three groups of regulatory links [4].

Another important technical question is how to quantify proper randomization. In our original TFmiR paper, we randomized 2 × |E| times, where |E| is the number of links in the GRN. It was argued that 100 × |E| switches of edge end points ensure proper randomization [5]. Based on two GRNs with different link densities, we present here a thorough analysis what motifs are statistically enriched in these GRNs under the edge-type conserving and non-conserving randomization strategies and how proper randomization can be quantified. For comparison, we also used the established motif-discovery tool FANMOD [6].

## 5 Discussion

If the network of interest contains more than one node or edge type, different randomization strategies can be applied for motif discovery. In this study, different strategies led to quite different enriched 3-node motif types.

The reason why FFLs were statistically significantly enriched only in the GBM network could originate from the difference in constructing the GBM and BC networks, where only significant TF-miRNA co-occurring pairs were considered in the regulatory network of GBM. This means that the TF → gene ← miRNA triad is enriched *a priori* in this network. Our study suggests that the way of network construction and also the density of the network may affect the results of motif finding. For the considered BC-networks, only subgraphs of types other than FFLs were found to be significantly enriched. Our motif finding tool identified composite-miRNA-mediated and cascade-miRNA-mediated as statistically significant motifs (by the non-conserving method). Although the results are similar in BC-networks, the conserving method identified the co-regulation motif type to be significant in the filtered BC-disease network that was not found significant in the BC-complete network. We thus speculate that motif searches in filtered (i.e. more specific) networks may identify biologically more meaningful motifs.

We suggest variance of motif counts and similarity of original and randomized networks as suitable auxiliary measures to judge whether randomization generates properly mixed networks. Our study suggests that the density of networks does not affect the minimum required Q to obtain properly mixed randomized networks.

In conclusion, the non-conserving method leads to detecting more subgraph types as being statistically significant compared with the conserving method. For the 2.5 networks studied here, we noticed that (a) the conserving randomization method identified significant motifs containing a larger fraction of the most central nodes (Figure 5) than the non-conserving method, and (b) both methods gave the same number of significant Gene Ontology terms, although the conserving method considered much fewer genes for this than the non-conserving method. Certainly, the same analysis should be extended to a representative number of comparable GRNs. So far, it seems that the conserving method gives biologically more meaningful results.

[1]

Hamed M, Spaniol C, Nazarieh M, Helms V. TFmiR: a web server for constructing and analyzing disease-specific transcription factor and miRNA co-regulatory networks. Nucleic Acids Res. 2015;43:W283–288. PubMedCrossrefWeb of ScienceGoogle Scholar

[2]

Zhang H, Kuang S, Xiong X, Gao T, Liu C, Guo A. Transcription factor and microRNA co-regulatory loops: important regulatory motifs in biological processes and diseases. Brief Bioinform. 2015;16:45–58. PubMedCrossrefWeb of ScienceGoogle Scholar

[3]

Shen-Orr SS, Milo R, Mangan S, Alon U. Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet. 2002;31:64–68. PubMedCrossrefGoogle Scholar

[4]

Megraw M, Mukherjee S, Ohler U. Sustained-input switches for transcription factors and microRNAs are central building blocks of eukaryotic gene circuits. Genome Biol. 2013;14:R85. PubMedCrossrefWeb of ScienceGoogle Scholar

[5]

Milo R, Kashtan N, Itzkovitz S, Newman ME, Alon U. On the uniform generation of random graphs with prescribed degree sequences., 2004 Available from: https://arxiv.org/abs/cond-mat/0312028v2.

[6]

Wernicke S, Rasche F. FANMOD: a tool for fast network motif detection. Bioinformatics. 2006;22:1152–1153. CrossrefPubMedGoogle Scholar

[7]

Kashtan N, Itzkovitz S, Milo R, Alon U. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics. 2004;20:1746–1758. CrossrefPubMedGoogle Scholar

[8]

Wernicke S. A Faster algorithm for detecting network motifs. In: Casadio R, Myers G, editors., editor(s). Algorithms in bioinformatics. Springer, 2005:165–177. Google Scholar

[9]

Sun J, Gong X, Purow B, Zhao Z. Uncovering microRNA and transcription factor mediated regulatory networks in glioblastoma. PLoS Comput Biol. 2012;8:e10024888. Web of ScienceGoogle Scholar

[10]

Hamed M, Spaniol C, Zapp A, Helms V. Integrative network-based approach identifies key genetic elements in breast invasive carcinoma. BMC Genomics. 2015;16:S2. CrossrefWeb of SciencePubMedGoogle Scholar

[11]

Lu M, Zhang Q, Deng M, Miao J, Guo Y, Gao W. An analysis of human microRNA and disease associations. PLoS ONE. 2008;3(10):e34203. Web of ScienceGoogle Scholar

[12]

Bauer-Mehren A, Rautschka M, Sanz F, Furlong LI. DisGeNET: a Cytoscape plugin to visualize, integrate, search and analyze gene-disease networks. Bioinformatics. 2010;26:2924–2926. PubMedCrossrefWeb of ScienceGoogle Scholar

[13]

Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006;34:D108–110. CrossrefPubMedGoogle Scholar

[14]

Shannon P, Markiel A, Ozier O, Baliga N, Wang J, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. CrossrefPubMedGoogle Scholar

[15]

Yeger-Lotem E, Sattath S, Kashtan N, Itzkovitz S, Milo R, Pinter RY, et al. Network motifs in integrated cellular networks of transcription–regulation and protein–protein interaction. Proc Natl Acad Sci USA. 2004;101:5934–5939. CrossrefGoogle Scholar

[16]

Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. Network motifs: simple building blocks of complex networks. Science. 2002;298:824–827. PubMedCrossrefGoogle Scholar

[17]

Liang C, Li Y, Luo J, Zhang Z. A novel motif-discovery algorithm to identify co-regulatory motifs in large transcription factor and microRNA co-regulatory networks in human. Bioinformatics. 2015;31:2348–2355. Web of ScienceCrossrefPubMedGoogle Scholar

[18]

Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal. 2006;Complex Systems:1695. Google Scholar

[19]

Nazarieh M, Wiese A, Will T, Hamed M, Helms V. Identification of key player genes in gene regulatory networks. BMC Sys Biol. 2016;10:88. CrossrefGoogle Scholar

[20]

Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2008;4:44–57. Web of ScienceGoogle Scholar

[21]

Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. journal of the royal statistical society. Series B (Methodological). 1995;57:289–300. Google Scholar

## Comments (0)