Abstract
To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham’s model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham’s approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham’s approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.
This work has been supported by the Research Training School on Statistical Modelling of the German Research Foundation (DFG). Katja Ickstadt has also been supported by the DFG within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis,” project C4. Holger Schwender has been supported by the Deutsche Forschungsgemeinschaft for grant SCHW 1508/3-1. Magdalena Malina was supported by the Budget of Ministry of Science and Higher Education, Republic of Poland as a research project No. N N201 412539, and by the Vienna Science and Technology Fund (WWTF) through project MA09-007a.
Appendix A
Proof, that the non-centrality parameter λCF is equal to the non-centrality parameter λLR
Notation: Consider a model
where X=Xn×t is the design matrix of rank t and ζ=(ζ1, ζ2, …, ζt)T is a vector of model’s parameters. Let μ=Xζ, V denote a subspace of


where


calculated in the model (A.1). The non-centrality parameter can be expressed as (see Theorem 7.4, Arnold, 1981)
The desired equality of noncentrality parameters can be proved in a more general case. Assume that the relationship between the dependent variable Y∈Rn and explanatory variables x1, x2, …, xt, where xj=(x1j, x2j, …, xnj)T, j=1, 2, …, t, can be described by the following model
where ε=(ε1,…, εn)T, ε~N(0, σ2In×n), Xc is a full rank n×(t+1) -matrix of the form Xc=(1, x1, x2, …, xt) and ζc is a (t+1) – dimensional vector of model’s parameters. Let

Now let us consider an n×2- matrix Xn×2=(1, X1), X1=(X11, X21, …, Xn1)T, such that μ=Xcζc∈lin{1, X1}, so as the model (3) can be also written in the form
where ζ=(β0, β1)T. Let V denote a subspace of Rn spanned by the columns of X. Consider the testing problem
where W is a linear subspace of Rn spanned by 1. Then the non-centrality parameters of F test for testing (A.5) in (A.3) and in (A.4) are equal.
Indeed, let λc denote the non-centrality parameter for testing problem (A.5) in the model (A.3) and let λ denote the non-centrality parameter in testing problem (A.5) in the model (A.4). Then
and
Since

and
Moreover, by the definition of μ we have that in (A.3)
and hence

μ=Xζ∈lin[1, X1]=V,
and hence PVμ=μ. Then clearly

Appendix B
Derivation of the formula for logic regression model under Cockerham’s coding
We start from the model (1), where n is the number of individuals selected from an intercross F2 population and Dj denotes the binary vector of the “dominant” values for all individuals at jth marker. To rewrite the model in a Cockerham’s coding, let v1=(1, 1, 1)T, v2=(0, 1, 1)T, v3=(0, 0, 1)T and v=(v1, v2, v3). Similarly, let w1=(1, 1, 1)T, w2=(–1, 0, 1)T, w3=(–1/2, 1/2, –1/2)T and w=(w1, w2, w3). We may then define

f(w)=(f1(w), f2(w), f3(w))=v,
where f1(w)=w1,

Then ∀j∈{1, 2, …, m}
Then the considered model (1) can be rewritten as:
which is exactly equal to (5).
Power to detect an individual interaction in the Cockerham’s model
Let

and let

Assume that we are performing an F-test for testing H0:η(k, l)=0 against H1:η(k, l)≠0 within a class of models






and
For each of the remaining three interaction terms Ik, l∈{UkZt, ZkUl, ZkZl} the corresponding F test for testing H0:η(k, l)=0 against H1:η(k, l)≠0 is performed within the class of models
respectively, where k, l∈{1, 2, …, m}, k<l. Then, in the above procedure the design matrix


Similarly as for UkUl, for each Ik, l∈{UkZl, ZkUl, ZkZl} the power of the coresponding F-test was calculated empirically, based on 10,000 realizations of the ratios of two independent noncentral χ2 random variables. This power for each Ik, l was compared with the power obtained in (4). It can be clearly seen that the interaction UkUl analyzed in Section (3.2), as compared with the other three possible interactions, is detected with the largest power for each value of β1 (Figure 3). All the four powers as functions of β1 are substantially smaller than the power to detect a single interaction within the properly defined logic regression model. Obviously, the average power to detect a single interaction within the Cockerham’s genetic model, calculated as a mean of the four powers for each β1 is also substantially smaller than the power to detect a single interaction within the logic regression model (Figure 4).

Figure 3
Power to detect a logic interaction by logic regression model and four powers to detect a single interaction under the Cockerham’s coding as functions of β1. Result obtained for n=1000, m=200, α=0.05, σ2=1.0.

Figure 4
Power to detect a logic interaction by logic regression model and an average power of detection of a single interaction under the Cockerham’s coding as functions of β1. Results for n=1000, m=200, α=0.05, σ2=1.0.
Appendix C: Two step testing procedures in the Cockerham’s model
Classical approach to test for epistasis
Classical methods of localizing genes are aimed at locating genes with significant main effects and often do not allow the location of QTL if there are no main effects but there are epistatic interactions with other QTL. The common practice is to estimate epistatic effects only at those positions for which main effects have been found. Such an approach can be described by the two step testing procedure.
Assume that the model (1) holds. In the first step of the procedure one would test m hypotheses
H0,j:αj=δj=0
against
H1,j:αj or δj or both are different from 0
within the class of models
Mj:Y=μj+αjUj+δjZj+ε(j),
where j∈{1, 2, …m}. Adjusted significance level for a single test in the first step is then equal to

Let


against
H1, (k, l): not all γ(k, l)’s are equal to 0
within the class
In order to control the FWER at the fixed level α, in the second step one needs to apply a significance level of

For the first step of the procedure one tests within the misspecified model, hence the test statistic for H0, j has the distribution of (n–2) times the ratio of two independent noncentral χ2 distributions. The power PI of the first step test is calculated empirically based on 10,000 independent realizations of noncentral χ2 distributions defined by (B.1) and (B.2) respectively (Appendix B).
For the second step, the F statistic for testing H0, (j, k) has the


For both steps of the procedure resulting powers as functions of β1 are compared with the power obtained by logic regression for m=200, n=1000, α=0.05 and σ=1.0 (Figure 5). The power for the first step exceeds very slightly the power for logic regression if the effect size β1 is very small, and is lower than the power for logic regression for larger values of β1. The power for the second step of the procedure is substantially smaller than the power for logic regression at the whole range of β1. Hence, the resulting power for the two step procedure will also be substantially smaller than the power to detect an interaction within the logic regression model. The relationship between these three powers remains the same for different values of m and n (Table 10).

Figure 5
Powers to detect a logic interaction by logic regression model and by the two-step testing procedure in the Cockerham’s model (from Appendix C) as functions of effect size β1. Results for n=1000, m=200, α=0.05, σ2=1.0. LR is the power of the logic regression model, calculated according to (4).
Table 10
Powers for logic regression and the two-step procedure for different values of β1 and different n and m, with α=0.05, σ2=1.0.
β1=0.5 | β1=1.0 | β1=1.5 | |||
---|---|---|---|---|---|
m=50 | n=500 | PLR | 0.756 | 1.000 | 1.000 |
PI | 0.012 | 0.333 | 0.893 | ||
PII | 0.000 | 0.007 | 0.110 | ||
m=100 | n=200 | PLR | 0.051 | 0.960 | 0.999 |
PI | 0.012 | 0.333 | 0.893 | ||
PII | 0.000 | 0.007 | 0.110 | ||
n=500 | PLR | 0.658 | 1.000 | 1.000 | |
PI | 0.410 | 0.999 | 1.000 | ||
PII | 0.002 | 0.166 | 0.848 | ||
n=1000 | PLR | 0.996 | 1.000 | 1.000 | |
PI | 0.843 | 1.000 | 1.000 | ||
PII | 0.017 | 0.757 | 0.999 | ||
n=2000 | PLR | 1.000 | 1.000 | 1.000 | |
PI | 0.996 | 1.000 | 1.000 | ||
PII | 0.159 | 0.999 | 1.000 | ||
m=200 | n=500 | PLR | 0.555 | 1.000 | 1.000 |
PI | 0.843 | 1.000 | 1.000 | ||
PII | 0.017 | 0.757 | 0.999 |
PI denotes the power of the first step of the procedure from Appendix C, calculated empirically based on 10,000 independent realizations of noncentral χ2 distributions defined by (B.1) and (B.2) respectively. PII denotes the power of the second step, calculated as in (C.1).
References
Arnold, S. F. (1981): The theory of linear models and multivariate analysis, John Wiley & Sons: New York, pp. 79–82.Search in Google Scholar
Baierl, A., M. Bogdan, F. Frommlet and A. Futschik (2006): “On locating multiple interacting quantitative trait loci in intercross designs,” Genetics, 173, 1693–1703.10.1534/genetics.105.048108Search in Google Scholar PubMed PubMed Central
Ball, R. D. (2001): “Bayesian methods for quantitative trait loci mapping based on model selection: Approximate analysis using the Bayesian information criterion,” Genetics, 159, 1351–1364.10.1093/genetics/159.3.1351Search in Google Scholar PubMed PubMed Central
Bateson, W. and G. Mendel (1909): Mendel’s principles of heredity, Cambridge University Press: New York, G.P. Putnam’s Sons.10.5962/bhl.title.44575Search in Google Scholar
Bogdan, M., F. Frommlet, P. Biecek, R. Cheng, J. K. Ghosh and R. Doerge (2008): “Extending the modified Bayesian information criterion (mbic) to dense markers and multiple interval mapping,” Biometrics, 64, 1162–1169, URL http://dx.doi.org/10.1111/j.1541-0420.2008.00989.x10.1111/j.1541-0420.2008.00989.xSearch in Google Scholar PubMed
Boulesteix, A. L., A. L. Strobl, S. Weidinger and W. Wichmann, H. E. (2007): “Multiple testing for snp-snp interactions,” Statist. Appl. Gen. Mol. Biol., 6, 1544–6115.Search in Google Scholar
Breiman, L. (1996): “Bagging predictors,” Mach. Learn., 26, 123–140.Search in Google Scholar
Breiman, L. (2001): “Random forests,” Mach. Learn., 45, 5–32.Search in Google Scholar
Breiman, L., J. H. Friedman, R. A. Olshen and C. J. Stone (1984): Classification and regression trees, Belmont, CA: Wadsworth.Search in Google Scholar
Broman, K. W. and S. C. G. Wu, H. Sen (2003): “R/qtl: Qtl mapping in experimental crosses,” Bioinformatics, 19, 889–890.10.1093/bioinformatics/btg112Search in Google Scholar PubMed
Carlborg, O. and C. S. Haley (2004): “Epistasis: too often neglected in complex trait studies?” Nat. Rev. Genet., 5, 618–625.Search in Google Scholar
Chen, C., H. Schwender, J. Keith, R. Nunkesser, K. Mengersen and P. Macrossan (2011): “Methods for identifying snp interactions: a review on variations of logic regression, random forest and bayesian logistic regression,” Comput. Biol. and Bioinf., 8, 1580–1591.Search in Google Scholar
Chen, Z. and J. Liu (2009): “Mixture generalized linear models for multiple interval mapping of quantitative trait loci in experimental crosses,” Biometrics, 65, 470–477.10.1111/j.1541-0420.2008.01100.xSearch in Google Scholar PubMed
Clayton, D. G. (2009): “Prediction and interaction in complex disease genetics: Experience in type 1 diabetes,” PLoS Genet, 5, e1000540, URL http://dx.doi.org/10.1371%2Fjournal.pgen.1000540.Search in Google Scholar
Cockerham, C. C. (1954): “An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present,” Genetics, 39, 859–882.10.1093/genetics/39.6.859Search in Google Scholar PubMed PubMed Central
Coffman, C., R. W. Doerge, K. Simonsen, K. Nichols, C. Duarte, R. Wolfinger, and L. M. McIntyre (2005): “An effective model selection strategy for detecting multiple qtl,” Genetics, 170, 1281–1297.10.1534/genetics.104.033910Search in Google Scholar PubMed PubMed Central
Cordell, H. J. (2002): “Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans,” Hum. Mole. Genet., 11, 2463–2468, URL http://hmg.oxfordjournals.org/content/11/20/2463.abstract.10.1093/hmg/11.20.2463Search in Google Scholar PubMed
Cordell, H. J. (2009): “Detecting gene-gene interactions that underlie human diseases,” Nat. Rev. Genet., 10, 392–404.Search in Google Scholar
Doerge, R. W. (2002): “Mapping and analysis of quantitative trait loci in experimental populations,” Nat. Rev. Gene., 43–52, URL http://www.nature.com/nrg/journal/v3/n1/full/nrg703.html.10.1038/nrg703Search in Google Scholar PubMed
Dupuis, J. and D. Siegmund (1999): “Statistical methods for mapping quantitative trait loci from a dense set of markers,” Genetics, 151, 373–386.10.1093/genetics/151.1.373Search in Google Scholar PubMed PubMed Central
Erhardt, V., M. Bogdan and C. Czado (2010): “Locating multiple interacting quantitative trait loci with the zero-inated generalized Poisson regression,” Statist. Appl. Gen. Mol. Biol, 9, 1554–6115.Search in Google Scholar
Fisher, R. A. (1919): “The correlation between relatives on the supposition of Mendelian inheritance,” T. Roy. Soc. Edin., 52, 399–433.Search in Google Scholar
Fritsch, A. and K. Ickstadt (2007): “Comparing logic regression based methods for identifying snp interactions,” Berlin, Heidelberg: Springer, Lecture Notes in Computer Science, 4414, 90–103.10.1007/978-3-540-71233-6_8Search in Google Scholar
Haley, C. and S. Knott (1992): “A simple regression method for mapping quantitative trait loci in line crosses using anking markers,” Heredity, 69, 315–324.10.1038/hdy.1992.131Search in Google Scholar PubMed
Jansen, R. and P. Stam (1994): “High resolution of quantitative traits into multiple loci via interval mapping,” Genetics, 136, 1447–1455.10.1093/genetics/136.4.1447Search in Google Scholar PubMed PubMed Central
Kao, C. and Z. Zeng (2002): “Modeling epistasis of quantitative trait loci using Cockerham′s model,” Genetics, 160, 1243–1261.10.1093/genetics/160.3.1243Search in Google Scholar PubMed PubMed Central
Kao, C., Z. Zeng and R. Teasdale (1999): “Multiple interval mapping for quantitative trait loci,” Genetics, 152, 1203–1216.10.1093/genetics/152.3.1203Search in Google Scholar PubMed PubMed Central
Kirkpatrick, S., C. D. Gelatt and M. Vecchi (1983): “Optimization by simulated annealing,” Science, 220, 671–680.10.1126/science.220.4598.671Search in Google Scholar PubMed
Kooperberg, C. and I. Ruczinski (2005): “Identifying interacting snps using Monte Carlo logic regression,” Genet. Epidemiol., 28, 157–170.Search in Google Scholar
Lander, E. S. and D. Botstein (1989): “Mapping Mendelian factors underlying quantitative traits using rp linkage maps.” Genetics, 121, 185–199, URL http://www.genetics.org/content/121/1/185.abstract.10.1093/genetics/121.1.185Search in Google Scholar PubMed PubMed Central
Li, W. and Z. Chen (2009): “Multiple interval mapping for quantitative trait loci with a spike in the trait distribution,” Genetics, 182, 337–342.10.1534/genetics.108.099028Search in Google Scholar PubMed PubMed Central
Liu, J., Y. Liu, X. Liu and H. -W. Deng (2007): “Bayesian mapping of quantitative trait loci for multiple complex traits with the use of variance components,” Am. J. Hum. Genet., 81, 304–320.Search in Google Scholar
Lucek, P. R. and J. Ott (1997): “Neural network analysis of complex traits,” Genet. Epidemiol., 14, 1101–1106.Search in Google Scholar
Lyons, M. A., H. Wittenburg, R. Li, K. A. Walsh, M. R. Leonard, G. A. Churchill, M. C. Carey and B. Paigen (2003): “New quantitative trait loci that contribute to cholesterol gallstone formation detected in an intercross of cast/ei and 129s1/svimj inbred mice,” Physiol. Genomics, 14, 225–239.Search in Google Scholar
McIntyre, L., C. Coffman and R. Doerge (2001): “Detection and location of single binary trait loci in experimental populations,” Genet. Res., 78, 79–92.Search in Google Scholar
Ruczinski, I., C. Kooperberg and M. LeBlanc (2003): “Logic regression,” J. Comput. Graphical Statist., 12, 474–511.Search in Google Scholar
Ruczinski, I., C. Kooperberg and M. LeBlanc (2004): “Exploring interactions in high-dimensional genomic data: an overview of logic regression, with applications,” J. Multivariate Anal., 90, 178–195.Search in Google Scholar
Schwender, H., K. Bowers, M. D. Fallin and I. Ruczinski (2011): “Importance measures for epistatic interactions in case-parent trios,” Ann. Hum. Genet., 75, 122–132.Search in Google Scholar
Schwender, H. and K. Ickstadt (2008): “Identification of snp interactions using logic regression,” Biostatistics, 9, 187–198.10.1093/biostatistics/kxm024Search in Google Scholar PubMed
Sen, S. and G. A. Churchill (2001): “A statistical framework for quantitative trait mapping,” Genetics, 159, 371–387.10.1093/genetics/159.1.371Search in Google Scholar PubMed PubMed Central
Xu, S. and W. R. Atchley (1996): “Mapping quantitative trait loci for complex binary diseases using line crosses,” Genetics, 143, 1417–1424.10.1093/genetics/143.3.1417Search in Google Scholar PubMed PubMed Central
Yandell, B. S., T. Mehta, S. Banerjee, R. M. J. Y. Shriner, D. Venkataraman, W. W. Neely, H. Wu, R. von Smith and N. Yi (2007): “Qtl with Bayesian interval mapping in experimental crosses,” Bioinformatics, 23, 641–643.10.1093/bioinformatics/btm011Search in Google Scholar PubMed PubMed Central
Yi, N. and S. Xu (2000): “Bayesian mapping of quantitative trait loci for complex binary traits,” Genetics, 155, 1391–1403.10.1093/genetics/155.3.1391Search in Google Scholar PubMed PubMed Central
Zeng, Z. B. (1994): “Precision mapping of quantitative trait loci,” Genetics, 136, 1457–1468.10.1093/genetics/136.4.1457Search in Google Scholar PubMed PubMed Central
Zhang, M., K. L. Montooth, M. T. Wells, A. G. Clark and D. Zhang (2005): “Mapping multiple quantitative trait loci by Bayesian classification,” Genetics, 169, 2305–2318, URL http://www.genetics.org/content/169/4/2305.abstract.10.1534/genetics.104.034181Search in Google Scholar PubMed PubMed Central
©2014 by Walter de Gruyter Berlin Boston