Any statistical experiment may be perceived as an information channel transforming a deterministic quantity (parameter) into a random one (observation) according to a design indicated by experimenter. The primary aim of statistician is to recover the information about the parameter from the observation. However the efficiency of this process depends not only on the statistical rule but also on the experimental design. Such design, which may be identified with the experiment, is represented by a probabilistic structure.
When observations have normal distribution the entire statistical analysis is based on their linear and quadratic forms. Thus the properties of such forms should be taken into account in any reasonable choice of statistical experiment.
Comparison of linear experiments by linear forms has been intensively studied in statistical literature. It is well known (for instance [1, 2, 3, 4, 5, 6]) that almost all criteria used for comparison of two linear experiments with respect to linear estimation reduce to the Loewner order between their information matrices, say M1 and M2. However, the comparison of normal linear experiments with respect to quadratic estimation is still at the initial stage and we are looking for respective tools.
It was revealed in Stępniak  that the relation “to be at least as good with respect to quadratic estimation” needs some knowledge about the matrix , where symbol + means the Moore-Penrose generalized inversion. We shall refer to this matrix as quotient of M2 by M1. Properties of such quotient may be interesting themselves. It appears that the Loewner order may be expressed in terms of the quotient, but not vice versa.
In this note we use the quotient of positive semidefinite matrices as the main tool in the ordering of normal linear experiments with respect to quadratic estimation. The orderings of linear experiments with respect to linear and with respect to quadratic estimation are extended here to the experiments involving nuisance parameters. Typical experiments of this kind are induced by allocations of treatments in blocks.
It is well known (see ) that any orthogonal allocation of treatments in blocks is optimal by means of linear estimation of all treatment contrasts. We show that this allocation is, however, not optimal for quadratic estimation.
2 Definitions and known results
In this paper the standard vector-matrix notation is used. All vectors and matrices considered here have real entries. The space of all n × 1 vectors is denoted by Rn. For any matrix M the symbols MT, R(M), N(M) and r(M) denote, respectively, its transpose, range (column space), kernel (null space) and rank. The symbol PM stands for the orthogonal projector onto R(M), i.e. the square matrix P satisfying the conditions Px = x for x ∈ R(M) and zero for x ∈ N(MT). Moreover, if M is square then tr(M) denotes its trace and the symbol M ≥ 0 means that M is symmetric and positive semidefinite (psd, for short).
Let x be a random vector with the expectation E(x) = Aα and the variance-covariance matrix σ V, where A and V are known matrices while α = (α1, … αp)T and σ > 0 are unknown parameters. In this situation we shall say that x is subject to the linear experiment 𝓛(A α, σ V). If V = I, then we say that the experiment is standard. If x is normally distributed then instead of 𝓛(A α, σ V) we shall use the symbol 𝓝(A α, σ V).
Now let us consider two experiments 𝓛1 = 𝓛(A1 α, σ V) and 𝓛2 = 𝓛(A2 α, σ W) with the same parameters and with observation vectors x ∈ Rm and y ∈ Rn, respectively.
(). Experiment 𝓛1 is said to be at least as good as 𝓛2 with respect to linear estimation [notation: 𝓛1 ⊵ 𝓛2] if for any parametric function ψ = cT α and for any estimator bT y there exists an estimator aT x with uniformly not greater squared risk. If 𝓛1 ⊵ 𝓛2 and 𝓛2 ⊵ 𝓛1 then we say that the experiments are equivalent for linear estimation.
for all α and σ . It is worth to note that the relation 𝓛(A1 α, σ V) ⊵ 𝓛(A2 α, σ W) does not depend on whether σ is known or not. Thus 𝓛1 ⊵ 𝓛2 if and only if 𝓛(A1 α, V) ⊵ 𝓛(A2 α, W).
Moreover, under the normality assumption, the condition (1) may be expressed in the form:
For any parametric function ψ and for any b ∈ Rn there exists a ∈ Rm such that | aT x − ψ| is stochastically not greater than | bT y− ψ| for all α and σ
Now consider two normal linear experiments 𝓝1 = 𝓝(A α, σ V) and 𝓝2 = 𝓝(B α, σ W) with observation vectors x ∈ Rm and y ∈ Rn. It is well known (cf. [12, 13]) that such experiments are not comparable with respect to all possible statistical problems. Therefore we shall restrict our attention to quadratic estimation only.
(). Experiment 𝓝1 is said to be at least as good as 𝓝2 with respect to quadratic estimation [notation: 𝓝1 ⪰ 𝓝2] if for any quadratic form yT Gy there exists a quadratic form xT Hx such that
for all α and σ . If 𝓝1 ⪰ 𝓝2 and 𝓝2 ⪰ 𝓝1 then we say that the experiments are equivalent for quadratic estimation.
In the last definition the quadratic forms xT Hx and yT Gy play the role of potential unbiased estimators for parametric functions of type ϕ(α, σ) = c σ + αT C α. It is known that any mean squared error of a linearly estimable parametric function ψ = ψ(α) in the experiment 𝓝1 (or in 𝓝2) has such a form (Stępniak ). The orderings ⊵ and ⪰ possess invariance property with respect to nonsingular linear transformation both the parameter α and the observation vectors x and y as well (, Lemmas 2.1 and 2.2).
The main tool in comparison of the standard linear experiments is the information matrix M defined as the Fisher information matrix AT A corresponding to the experiment 𝓝(A α, I).
The relation ⊵ may be characterized by the following theorem.
(, Theorem 1). For standard linear experiments 𝓛1 = 𝓛(A1 α, σ Im) and 𝓛2 = 𝓛(A2 α, σ In) with information matrices the followings are equivalent:
𝓛1 ⊵ 𝓛2,
M1 − M2 is psd,
R(M2) ⊆ R(M1) and the maximal eigenvalue of the matrix is not greater than 1.
A corresponding result for the relation ⪰ is due to Stępniak (, Theorem 5.1) in the form
For standard normal linear experiments 𝓝1 = 𝓝(A1 α, σ Im) and 𝓝2 = 𝓝(A2 α, σ In) with information matrices the followings are equivalent:
𝓝1 ⪰ 𝓝2,
M1 − M2 is psd and (2)
where λi, i = 1, …, q, are the positive eigenvalues of the matrix , counted with their multiplicities.
R(M2) ⊆ R(M1), the maximal eigenvalue of the matrix is not greater than 1 and the inequality (2) holds.
It is interesting that the both orderings ⊵ and ⪰ may be expressed in terms of the matrix , where Mj, j = 1, 2, are information matrices corresponding to the experiments 𝓝(A1 α, σ Im) and 𝓝(A2 α, σ In). Matrix of this kind will be called quotient of M2 by M1.
3 Quotient of matrices in comparison of experiments
For given psd matrices T and U of the same order we shall refer to the expressions Q1 = TU+, Q2 = U+ T, Q3 = (U+)1/2 T(U+)1/2 and Q4 = T1/2 U+ T1/2 as versions of the quotient of T by U. We note that only Q3 and Q4 are always symmetric.
We shall start from basic properties of the quotients.
For arbitrary positive semidefinite matrices T and U of the same order
All versions Q1 = TU+, Q2 = U+ T, Q3 = (U+)1/2 T(U+)1/2 and Q4 = T1/2 U+ T1/2 of the quotient of T by U have the same eigenvalues.
All eigenvalues of arbitrary quotient are nonnegative.
U – T is psd if and only if R(T) ⊆ R(U) and all eigenvalues of arbitrary quotient Qi are not greater than 1.
If Q1w = λ w then U+ Q1 w = Q2 U+ w = λ U+ w and λ is an eigenvalue of Q2. Conversely, if Q2 w = λ w then Q1 Tw = λTw. Thus Q1 and Q2 have the same eigenvalues. To prove the same for Q3 and Q4 we note that Q3 = FFT a nd Q4 = FT F for F = (U+)1/2 T1/2 and the desired correspondence follows by the implications Q3 w = λ w ⟹ Q4 FT w = λ FT w and Q4 w = λ w ⟹ Q3 Fw = λ Fw. Thus it remains to show a similar correspondence for Q2 and Q3.
The equality Q2 w = λ w implies Q4 T1/2 w = λ T1/2 w. Thus λ is an eigenvalue of Q4 and, in consequence, of Q3. Similarly, if Q3 w = λ w then Q2(U+)1/2 w = λ(U+)1/2 w. This implies the desired condition and completes the proof of the part (a).
It follows immediately from (a).
By (a) we only need to show the desired equivalence for i = 3. Implication U − T ≥ 0 ⟹ R(T) ⊆ R(U) is evident. For the remain we note that under assumption R(T) ⊆ R(U), λmax(Q3) ≤ 1 if and only if (U+)1/2 U(U+)1/2 ≥ (U+)1/2 T(U+). This implies (c) and completes the proof of Theorem 2.4. □
Now we shall use Theorem 3.1 to comparison of normal linear experiments 𝓝1 = 𝓝(A1 α, σ Im) and 𝓝2 = 𝓝(A2 α, σ In) w.r.t. quadratic estimation.
We note that r(A1) = r(M1) while n−r(A1) means the number of degrees of freedom in the experiment 𝓝1. By Theorem 2.4 we get the following result.
If the numbers of degrees of freedom in the experiments 𝓝1 = 𝓝(A1 α, σ Im) and 𝓝2 = 𝓝(A2 α, σ In) are equal then 𝓝1 ⪰ 𝓝2 if and only if M1 − M2 is psd and any quotient Qi, i = 3, 4, of the information matrix M2 by M1 is idempotent, i.e.
Under our assumption the right side of the inequality (2) is 0 and hence each eigenvalue of any quotient Qi is either 0 or 1. Since the quotients Qi, i = 3, 4, are symmetric this is equivalent their idempotency. □
The case when the numbers of observations in the both experiments are equal, i.e. m = n, is the most interesting. In this case by Theorem 3.1 we get
For standard normal experiments 𝓝1 and 𝓝2 with the same number observations the relation 𝓝1 ⪰ 𝓝2 holds if and only if M1 = M2, i.e. when the experiments are equivalent.
Assume that M1 − M2 is psd, the inequality (2) is true and m = n. Then R(M1) = R(M2) and, by Lemma 3.2, Q3 is idempotent. Thus Q3 is the orthogonal projector onto R(M1) = R(M2), and, in consequence, This implies the desired result. □
Now let us consider a linear experiment where observation vector x depends on several parameters but only some of them are of interest. More precisely, assume that
with unknown parameters α ∈ Rp, β ∈ Rk and σ > 0 such that α (or α and σ) is of interest, while β is treated as the nuisance one. Such experiment will be denoted by 𝓛(A α +B β, σ I) or (under the normality assumption) by 𝓝(A α + B β, σ I).
We shall say that a statistic t = t(x) is invariant (with respect to β) if its first two moments exist and they do not depend on β. It is evident that a linear form aT x is invariant in the experiment 𝓛(A α + B β, σ I) if and only if it depends on x only through (I − PB) x. The same condition for invariance of quadratic form xT Hx follows by the well known formula
Now let us consider two linear experiments 𝓛1 = 𝓛(A1 α + B1 β, σ Im) and 𝓛2 = 𝓛(A2 α + B2 β, σ In) (or 𝓝1 = 𝓝(A1 α + B1 β, σ Im) and 𝓝2 = 𝓝(A2 α + B2 β, σ In)) with observation vectors x ∈ Rm and y ∈ Rn.
We shall say that 𝓛1 is at least as good as 𝓛2 w.r.t. invariant linear estimation if for any invariant statistic bT y there exists an invariant aT x such that E(aT x) = E(bT y) and var(aT x) ≤ var(bT y) for all α and σ . Similarly, we shall say that 𝓝1 is at least as good as 𝓝2 w.r.t. invariant quadratic estimation if for any invariant statistic yT Gy there exists an invariant xT Hx such that E(xT Hx) = E(yT Gy) and var(xT Hx) ≤ var(yT Gy) for all α and σ.
First we shall reduce the comparison of linear experiments with a nuisance parameter β to the same problem for the usual linear experiments. To this aim we need the invariance condition in a a more explicit form.
Let x be observation vector in a linear experiment 𝓛(A α + B β, σ I) or 𝓝(A α + B β, σ I) and let b1, …, bn−r be orthonormal basis in N(BT). Then I − PB may be presented in the form
In this way 𝓛(A1 α + B1 β, σ Im) is at least as good as 𝓛(A2 α + B2 β, σ In) w.r.t. invariant linear estimation if and only if 𝓛(A͠1 α, σ Im−r1) ≥ 𝓛(A͠2 α, σ In−r2), where A͠i is defined by (3) and ri = r(Bi). Similarly 𝓝(A1 α + B1 β, σ Im) is at least as good as 𝓝(A2 α + B2 β, σ In) w.r.t. invariant quadratic estimation if and only if 𝓝(A͠1 α, σ Im−r1) ⪰ 𝓝(A͠2 α, σ In−r2).
For convenience the matrices will be called the reduced information matrices and will be denoted by . We note that (4)
For arbitrary linear experiments 𝓛1 = 𝓛(A1 α + B1 β, σ Im) and 𝓛2 = 𝓛(A2 α + B2 β, σ In), 𝓛1 is at least as good as 𝓛2 w.r.t. invariant linear estimation if and only if M͠1 − M͠2 ≥ 0.
For arbitrary normal linear experiments 𝓝1 = 𝓝(A1 α + B1 β, σ Im) and 𝓝2 = 𝓝(A2 α + B2 β, σ In), 𝓝1 is at least as good as 𝓝2 w.r.t. invariant quadratic estimation if and only if M͠1 − M͠2 ≥ 0 and
where λi, i = 1, …, q, are positive eigenvalues of arbitrary version of the quotient of M͠2 by M͠1, counted with their multiplicities.
In particular, if m − r(B1) = n − r(B2) then by Lemma 3.2 we get
If n − r(B1) = m − r(B2) then 𝓝1 is at least as good as 𝓝2 w.r.t. invariant quadratic estimation if and only if the matrix is idempotent.
If m = n and B1 = B2 then 𝓝1 is at least as good as 𝓝2 w.r.t. invariant quadratic estimation if and only if M͠1 = M͠2.
4 Problem of optimal allocation of treatments in blocks
Consider allocation of v treatments with replications t1, …, tv in k blocks of sizes b1, …, bk, where ∑iti = ∑jbj = n . Let us introduce matrices B = diag(1b1, …, 1bk) and D = (dij), where
These matrices indicate allocation of treatments in blocks. For this reason D is sometimes identified with block design.
To each pair (B, D) corresponds a linear experiment 𝓝 = 𝓝(D α + [1n, B] β, σ In), where α = (α1, …, αυ)T refers to the treatment effects, while β = (μ, β1, …, βk)T refers to the general mean and block effects. In this case the reduced information matrix (4), called also C-matrix (see [18, 19, 20]), may be presented in the form
where N = (nij) is the incidence matrix defined as N = DBT. It is clear that N1k = t and where t = (t1, …, tv)T and b = (b1, …, bk)T. A design D is said to be orthogonal if
One can verify that
In particular, for the orthogonal design,
Denote by 𝓓 = 𝓓(t; b) the class of all possible allocations of v treatments with replications t1, …, tv in k blocks of sizes b1, …, bk for v, k ≥ 2. Such class contains or does not contain an orthogonal design. If it does then by Stępniak  this design is optimal in 𝓓 w.r.t. invariant linear estimation, i.e. it is at least as good as any other design in the class.
It is natural to ask whether the orthogonal design is also optimal w.r.t. invariant quadratic estimation. In the light of the results presented in Section 3 we are strongly convinced that the answer is negative, but for formal reasons we are ready to provide a rigorous proof of this fact. By Corollary 3.8 we only need to show that for any incidence matrix N = (nij) corresponding to the orthogonal design there exists an incidence matrix M = (mij) such that Define
We note that M1k = N1k and Therefore, the designs represented by M and N belong to the same class. To show the desired inequality we only need, for instance, to compare the left upper entries, say u11 and of the matrices
Since we have
This leads to the following
Any orthogonal block design is not optimal w.r.t. invariant quadratic estimation. Moreover, for any t = (t1, …, tv)T and b = (b1, …, bk)T there is no optimal design in the class 𝓓 = 𝓓(t; b).
By the way we have demonstrated that, with reference to the orthogonal block design, the meaning of the optimality w.r.t. linear estimation may be strengthened in the sense that the words “at least as good” may be replaced by “better”.
This work was partially supported by the Centre for Innovation and Transfer of Natural Sciences and Engineering Knowledge.
Ehrenfeld S., Complete class theorem in experimental design, Proc. Third Berkeley Symp. on Math. Statist. Probab., 1955, Vol. 1, 69-75. Google Scholar
Kiefer J., Optimum experimental designs, J. Roy. Statist. Soc. Ser. B, 1959, 21, 272-319. Google Scholar
Goel P.K., Ginebra J., When one experiment is ‘always better than’ another?, Statistician, 2003, 52, 515-537. Google Scholar
Torgersen E., Comparison of Statistical Experiments. Cambridge University Press, Cambridge, England, 1991. Google Scholar
Stępniak C., Optimal allocation of treatments in block designs, Studia Sc. Math. Hung., 1987, 22, 341-345. Google Scholar
Hansen O.H., Torgersen E., Comparison of linear experiments, Ann. Statist., 1974, 2, 265-373. Google Scholar
Stępniak C., On admissibility in estimating the mean squared error of a linear estimator, Probab. Math. Statist., 1998, 18, 33-38. Google Scholar
Mathai A.M., Provost S.B., Quadratic Forms in Random Variables, Marcel Dekker, New York 1992. Google Scholar
Searle S.R., Linear Models, Wiley, New York, 1971. Google Scholar
Chakrabarti M.C., On the C-matrix in design of experiments, J. Indian Statist. Assoc., 1963, 1, 8-23. Google Scholar
Dey A., Theory of Block Designs, Wiley, New York, 1986. Google Scholar
Raghavarao D., Padgett L.V., Block Designs: Analysis, Combinatorics and Applications, World Scientific Publishers, Singapore, 2005. Google Scholar
About the article
Published Online: 2017-12-29