Show Summary Details
More options …

# The International Journal of Biostatistics

Ed. by Chambaz, Antoine / Hubbard, Alan E. / van der Laan, Mark J.

IMPACT FACTOR 2018: 1.309

CiteScore 2018: 1.11

SCImago Journal Rank (SJR) 2018: 1.325
Source Normalized Impact per Paper (SNIP) 2018: 0.715

Mathematical Citation Quotient (MCQ) 2018: 0.03

Online
ISSN
1557-4679
See all formats and pricing
More options …
Volume 14, Issue 2

# A Spatio-Temporal Model and Inference Tools for Longitudinal Count Data on Multicolor Cell Growth

PuXue Qiao
/ Christina Mølck
/ Davide Ferrari
/ Frédéric Hollande
Published Online: 2018-07-07 | DOI: https://doi.org/10.1515/ijb-2018-0008

## Abstract

Multicolor cell spatio-temporal image data have become important to investigate organ development and regeneration, malignant growth or immune responses by tracking different cell types both in vivo and in vitro. Statistical modeling of image data from common longitudinal cell experiments poses significant challenges due to the presence of complex spatio-temporal interactions between different cell types and difficulties related to measurement of single cell trajectories. Current analysis methods focus mainly on univariate cases, often not considering the spatio-temporal effects affecting cell growth between different cell populations. In this paper, we propose a conditional spatial autoregressive model to describe multivariate count cell data on the lattice, and develop inference tools. The proposed methodology is computationally tractable and enables researchers to estimate a complete statistical model of multicolor cell growth. Our methodology is applied on real experimental data where we investigate how interactions between cancer cells and fibroblasts affect their growth, which are normally present in the tumor microenvironment. We also compare the performance of our methodology to the multivariate conditional autoregressive (MCAR) model in both simulations and real data applications.

## 1 Introduction

Longitudinal image data based on fluorescent proteins play a crucial role for both in vivo and in vitro analysis of various biological processes such as gene expression and cell lineage fate. Assessing the growth patterns of different cell types within a heterogeneous population and monitoring their interactions enables biomedical researchers to determine the role of different cell types in important biological processes such as organ development and regeneration, malignant growth or immune responses under various experimental conditions. For example, tumor progression has been shown to be affected by bidirectional interactions among cancer cells or between cancer cells and cells from the microenvironment, including tumor-infiltrating immune cells [1]. Being able to study these interactions in a laboratory setting is therefore highly relevant, but is complicated by the difficulty of dissecting the effect of the different cell types as soon as the number of cell types exceeds two. In the present study we used longitudinal image data collected from multicolor live-cell imaging growth experiments of co-cultures of cancer cells and fibroblasts (a key cell type in the tumor microenvironment) as well as behaviourally distinct (cloned) cancer cells. Using a high-content imaging system, we were able to acquire characteristics for each individual cell at subsequent times, including fluorescent properties, spatial coordinates, and morphological features. The motivation of this work was to design a model allowing the determination of spatio-temporal growth interactions between these multiple cell populations.

In longitudinal growth experiments, the two important goals are to determine growth rates for different cell populations and to assess how interactions between cell types may affect their growth. Whilst a wide range of descriptive data analysis approaches have been used in applications, inference based on a comprehensive model of multicolor cell data is an open research area. The main challenges are related to the presence of complicated spatio-temporal interactions amongst cells and difficulties related to tracking individual cells across time from image data. Typical longitudinal experiments consist of a relatively small number of measurements (e.g. 5 to 20 images taken every few hours), which is adequate for monitoring cell growth. Tracking individual cells would typically require more frequent measurements, complicating the practicality of the experiments in terms of the storage cost of very large image files and the cytotoxicity induced by the imaging process.

Although tracking individual cell trajectories is difficult due to cell migration, overlapping cells, changes in cell morphology, image artifacts, cell death and division, obtaining cell counts by cell type (represented by a certain color) is straightforward and can be easily automated. To describe the spatial distribution for different cell types, we propose to divide an image into a number of contiguous regions (tiles) to form a regular lattice structure as shown in Figure 1(a). We then record the frequency of cells of different colors in each tile at subsequent time points, and based on which we model the spatial and temporal dependencies of the cell growth.

Figure 1:

(a) Microscope images for the cancer cell growth data obtained from a high-content imager (Operetta, Perkin Elmer) at the initial and final time points of the experiment. In each image, colors for non-fluorescent fibroblasts, as well as red and green fluorescent cancer cells are merged. (b) Illustration of the local structure for the model in (1). The two planes correspond to $3×3$ tiles at times $t$ and $t+1$. The average number of cells of color $c$ in a given tile at time $t+1$ is assumed to depend on the number of cells of other colors in contiguous neighboring tiles at time $t$.

To model spatio-temporal data, one could choose to approximate the spatio-temporal process by a spatial process of time series, that is, to view the process as a multivariate spatial process where the multivariate dependencies are inherited from temporal dependencies. In other words, it can be seen as a temporal extension of spatial processes.

The most popular way of developing a spatial process is through the conditionally auto-regressive (CAR) model proposed by Besag [2]. Waller, Carlin, Xia, and Gelfand [3] extend the CAR model into a spatio-temporal setting by allowing spatial effects to vary across time. However, the model lacks a specification of temporal dependency, as also noted by Knorr-Held [4]. More recently, Quick, Waller, and Casper [5] proposed a multivariate space-time CAR (MSTCAR) model, which is essentially a multivariate CAR model, where both temporal and between group dependencies are modelled as multivariate dependencies. Other works related to spatial process of time series include Sans, Schmidt, Nobre, et al. [6] and Quick, Waller, and Casper [7].

Alternatively, one also think of the process as a time series of spatial process, or a spatial extension of time series. This is the approach we take in our spatio-temporal modelling. The underlying notion is that “the temporal dependence is more natural to model than the spatial dependence” [8].

Following Cox et al. [9], it is useful to distinguish two modelling approaches for the analysis of time series data commonly seen in spatial-temporal modelling literature: the parameter-driven and observation-driven model. In a parameter-driven model, the dependence between subsequent observations is modelled by a latent stochastic process, which evolves independently of the past history of the observation process. In contrast, in an observation-driven model, time dependence arises because the conditional expectation of the outcome given the past depends explicitly on the past values.

For multivariate count data, the advantage of parameter-driven models is that one can easily assume that the conditional expectation of the observed process (on log-scale), as a latent process, is (multivariate) normal. There are extensive works related to latent spatio-temporal models under the Bayesian framework, including models with Gaussian data modelled by (multivariate) Gaussian process with an additive error [10, 11, 12, 13], Poisson data with conditional expectation modelled by Gaussian latent process ([14, 15] and Chapter 7 of [8]) and Poisson data with multivariate log-gamma latent process [16]. However, estimation of parameters in parameter-driven models requires considerable computational effort, as does prediction of the latent process.

On the other hand, in observation-driven models, inference is possible in a (penalized) maximum likelihood framework and therefore can be easily fitted even for quite complex regression models [17]. Schrödle, Held, and Rue [18] proposed a parameter-driven spatio-temporal model and compared it with a similar observation-driven model proposed by Paul, Held, and Toschke [19]. They conclude that the parameter-driven models perform slightly better in terms of prediction in some cases, however, while the computation time for the observation-driven model is mostly less than a second, fitting a parameter-driven model takes several hours if it ever converges, because of the complexity with the latent autoregressive process. Besides, their model contains only five parameters, while in our application, the number of parameters of interest grows quadratically with the number of cell populations, which will make the parameter-driven models intractable even with a moderate number of cell populations.

Therefore, we choose to work with a spatial extension of observation-driven time series. Zeger and Qaqish [20] review various observation-driven time series models with a quasi-likelihood estimation. Fokianos and Tjøstheim [21] develop and study the probabilistic properties of a log-linear autoregressive time series model for Poisson data, as an extension of the model considered by Fokianos, Rahbek, and Tjøstheim [22]. See Scott, et al. and Kedem and Fokianos [23, 24] for a complete review.

Literature about observation-driven spatio-temporal models, however, is relatively sparse. Held, Höhle, and Hofmann [25] propose a multivariate time series model where parameters are allowed to vary across space. Paul et al. [19] extended the model such that spatial dependences are captured by additional parameters that quantify the “directed influence” of neighboring areas at previous time points on the observation of interest. Paul and Held [26] further extend the model by introducing random effects. Note that these approaches model directly the conditional expectation of the count data, meaning they are using an identity link function, instead of the canonical log-link. Thus, it is required that the parameters are positive to ensure that the resulting conditional expectation is positive. Knorr-Held and Richardson [27] propose a space-time model for surveillance data, apart from separate seasonal and spatial components, they include an autoregressive term with a latent indicator.

In this paper, we develop a conditional spatial-temporal model for multivariate count data on tiled images, and provide its application on tiled images in the context of longitudinal cancer cell monitoring experiments. Our model enables us to measure the effect on the growth rate of each cell population and changes due to local cross-population interactions. Specifically, we consider a multivariate Poisson model with intensity modeled as a log-linear form similar to those in [27] and [21], and we quantify spatio-temporal impacts of different cell populations in neighboring tiles through model parameters, as illustrated in Figure 1(b). Impacts are allowed to be positive or negative, and unlike those models that describe between group dependence through a covariance matrix, influences do not have to be symmetrical in our model. Another main advantage of the proposed framework is that it enables one to accommodate spatio-temporal cell interactions for heterogeneous cell populations within a relatively parsimonious statistical model.

Since the model complexity can be potentially very large in the presence of many cell types, it is also important to address the question of how to select an appropriate model by retaining only the meaningful spatio-temporal interactions between cell populations We cary out model selection using the common model selection criteria for parametric models, the Akaike and the Bayesian information criteria (AIC and BIC).

The remainder of the paper is organized as follows. In Section 2, we introduce the conditional spatio-temporal lattice model for multivariate count data and develop maximum likelihood inference tools. In the same section, we discuss the asymptotic properties of our estimator and standard errors. In Section 3, we study the performance of our methodology using simulated data, and compare it to that of the multivariate conditional autoregressive (MCAR) model. In Section 4, we apply our method, as well as the MCAR model to analyze datasets from an in-vitro experiment, where cancer cells are co-cultured with fibroblasts. In Section 5, we conclude and give final remarks.

## 2.1 Multicolor spatial autoregressive model on the lattice

Let ${L}\in {\mathbb{N}}^{2}$ be a discrete lattice. In the context of our application, the lattice is obtained by tiling a microscope image into ${n}_{{L}}$ tiles, denoted by ${{L}}_{n}\left(\subset {L}\right)$. The total number of tiles ${n}_{{L}}$ is a monotonically increasing function of $n.$ One can choose various forms of lattice, for example, the regular or hexagonal lattices. For simplicity, we tile the image into $n×n$ regular rectangular tiles, which makes ${n}_{{L}}={n}^{2}.$ An example of a tiled image with $n=10$ is shown in Figure 1(a). Denote a pair of neighboring tiles $\left\{i,j\right\}$ with $i\sim j$, if tiles $i$ and $j$ share the same border or coincide ($i=j$). Each tile may contain cells of different colors; thus, we let ${C}=\left\{1,\dots ,{n}_{{C}}\right\}$ be a finite set of colors and denote by ${n}_{{C}}$ the total number of colors. Let $\mathbit{Y}=\left\{{\mathbit{Y}}_{t},t=1,\dots ,T\right\}$ be the sample of observations where ${\mathbit{Y}}_{t}=\left\{{\mathbit{Y}}_{t}^{\left(c\right)},c\in {C}\right\}$ is the collection of observations at time point $t$, and ${\mathbit{Y}}_{t}^{\left(c\right)}=\left({Y}_{1,t}^{\left(c\right)},\dots ,{Y}_{{n}_{{L}},t}^{\left(c\right)}{\right)}^{\mathrm{\top }}$ is the vector of observed frequencies for color $c$ on the lattice ${{L}}_{n}$ at time $t$. The joint distribution for the spatio-temporal process on the lattice is difficult to specify, due to local spatial interactions for neighboring tiles and global interactions occurring at the level of the entire image. An additional issue is that cells tend to be clustered together due to the cell division process and other biological mechanisms; thus it is not uncommon to observe low counts in a considerable portion of tiles. In typical longitudinal experiments, the number of time points seldom go beyond $50$ due to experimental, storage and processing cost, while ${n}_{{L}}$ can be relatively large. So we work under the framework where $T$ is assumed to be finite, while ${n}_{{L}}$ is allowed to grow to infinity.

We suppose that the count for the $i$th tile ${Y}_{i,t}^{\left(c\right)}$ follows a marginal Poisson distribution ${Y}_{i,t}^{\left(c\right)}|{\mathbit{Y}}_{t-1}\sim \text{Pois}\left({\lambda }_{i,t}^{\left(c\right)}\right)$, with intensity modeled by the canonical log-link ${v}_{i,t}^{\left(c\right)}=log{\lambda }_{i,t}^{\left(c\right)}$, where ${v}_{i,t}^{\left(c\right)}$ takes the following spatial autoregressive form: $\begin{array}{rl}{v}_{i,t}^{\left(c\right)}& ={\alpha }^{\left(c\right)}+\sum _{c\mathrm{\prime }\in {C}}{\beta }^{\left(c|c\mathrm{\prime }\right)}{S}_{i,t-1}^{\left(c\mathrm{\prime }\right)},\end{array}$(1)$\begin{array}{rl}{S}_{i,t-1}^{\left(c\mathrm{\prime }\right)}& =\frac{1}{{n}_{i}}\sum _{\begin{array}{c}i\sim j:\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}j\in {{L}}_{n}\end{array}}log\left(1+{Y}_{j,t-1}^{\left(c\mathrm{\prime }\right)}\right),\end{array}$(2)

for all $c\in {C},t=1,\dots ,T$, with ${n}_{i}=\left\{\mathrm{#}j:i\sim j,j\in {{L}}_{n}\right\}$ being the number of tiles in a neighborhood of tile $i$. Although we are adopting the regular grids for simplicity, the model is readily applicable to other tiling strategies. Changing the tiling strategy would only change the realisations of ${S}_{i,t-1}^{\left(c\mathrm{\prime }\right)}$ in (2).

Here, we assume that the conditional count for different tiles at time $t$ is independent conditioning on information from $t-1$, i.e. $P\left({Y}_{i,t}^{\left(c\right)}{Y}_{j,t}^{\left(c\mathrm{\prime }\right)}|{\mathbit{Y}}_{t-1}\right)=P\left({Y}_{i,t}^{\left(c\right)}|{\mathbit{Y}}_{t-1}\right)P\left({Y}_{j,t}^{\left(c\mathrm{\prime }\right)}|{\mathbit{Y}}_{t-1}\right),$

for all $c,c\mathrm{\prime }\in {C},t=1,\dots ,T,$ and $i,j\in {{L}}_{n},i\ne j.$ This does not suggest that they (${Y}_{i,t}^{\left(c\right)}$ and ${Y}_{j,t}^{\left(c\mathrm{\prime }\right)}$) are independent, but rather that their spatio-temporal dependence is due to the structure of intensity ${\lambda }_{i,t}^{\left(c\right)}$ in (1). Conditional independence is a commonly used assumption for spatio-temporal models in a non-gaussian setting [3, 28], since it’s exceedingly difficult to work with multivariate non-Gaussian distribution [8].

The elements of the parameter vector $\mathbit{\alpha }=\left({\alpha }^{\left(1\right)},\dots ,{\alpha }^{\left({n}_{{C}}\right)}{\right)}^{\mathrm{\top }}$ are main effects corresponding to a baseline average count for cells of different colors. The spatio-temporal interactions are measured by the statistic ${S}_{i,t-1}^{\left(c\mathrm{\prime }\right)}$ in (2), which essentially counts the number of cells of color $c\mathrm{\prime }$ in the neighborhood of tile $i$ at time $t-1$. Hence, the autoregressive parameter ${\beta }^{\left(c|c\mathrm{\prime }\right)}$ is interpreted as positive or negative change in the average number of cells with color $c$, due to interactions with cells of color $c\mathrm{\prime }$ in neighbouring tiles. A positive (or a negative) sign of ${\beta }^{\left(c|c\mathrm{\prime }\right)}$ means that the presence of cells of color $c\mathrm{\prime }$ in neighboring tiles promotes (or inhibits) the growth of cells of color $c$. The spatio-temporal effects ${\beta }^{c|c\mathrm{\prime }},c,c\mathrm{\prime }\in {C},$ are collected in the ${n}_{{C}}×{n}_{{C}}$ weighted incidence matrix ${B}$. This may be used to generate weighted directed graphs, as shown in the example of Figure 2, where the nodes of the directed graph correspond to cell types, and the directed edges are negative or positive spatio-temporal interactions between cell types.

Equation (1) could be extended to some more specific form, for example, ${v}_{i,t}^{\left(c\right)}={\alpha }^{\left(c\right)}+\sum _{c\mathrm{\prime }\in {C}}{\beta }_{1}^{\left(c|c\mathrm{\prime }\right)}{S}_{i,t-1}^{\left(c\mathrm{\prime }\right)}+{\beta }_{0}^{\left(c|c\mathrm{\prime }\right)}log\left(1+{Y}_{i,t-1}^{\left(c\mathrm{\prime }\right)}\right)$, where ${\beta }_{1}^{\left(c|c\mathrm{\prime }\right)}$ are interpreted as the effect of cells of color $c\mathrm{\prime }$ from neighbouring (but not the same) tiles have on the growth of cells with color $c$, while ${\beta }_{0}^{\left(c|c\mathrm{\prime }\right)}$ as the effect of cells of color $c\mathrm{\prime }$ from the same tile. However, we stick to the model in (1) because we have no evidence showing that the more complex model is advantageous from a model selection view point.

We choose to work with a log-linear form for the autoregressive equation of ${v}_{i,t}^{\left(c\right)}$ in eq. (1), where we apply a logarithmic transform and add $1$ to the counts at time $t-1$, ${Y}_{i,t-1}^{\left(c\right)}$. It offers several advantages compared to the more commonly used linear form. First, ${\lambda }_{i,t}^{\left(c\right)}$ and ${Y}_{i,t-1}^{\left(c\right)}$ are transformed on the same scale. Moreover, this model can accommodate both positive and negative correlations, while it is not possible to account for positive association in a stationary model if past counts are directly included as explanatory variables. For example, with the model ${v}_{i,t}=\alpha +\beta {Y}_{i,t-1}$ for a single color, the intensity would be ${\lambda }_{i,t}=exp\left(\alpha \right)exp\left(\beta {Y}_{i,t-1}\right),$ which may lead to instability of the Poisson means if $\beta >0$ since ${\lambda }_{i,t}$ is allowed to increase exponentially fast. Finally, adding $1$ to ${Y}_{i,t-1}^{\left(c\right)}$ is for coping with zero data values, since $log\left({Y}_{i,t-1}^{\left(c\right)}\right)$ is not defined when ${Y}_{i,t-1}^{\left(c\right)}=0$, which arises often, and it maps zeros of ${Y}_{i,t-1}^{\left(c\right)}$ into zeros of $log\left(1+{Y}_{i,t-1}^{\left(c\right)}\right)$.

## 2.2 Likelihood inference

Let $\mathbit{\theta }$ be the overall parameter vector $\mathbit{\theta }=\left({\mathbit{\alpha }}^{\mathrm{\top }},\text{vec}\left({B}{\right)}^{\mathrm{\top }}{\right)}^{\mathrm{\top }}\in {\mathbb{R}}^{p}$, where $\mathbit{\alpha }$ is a ${n}_{{C}}$-dimensional vector defined in Section 2.1 and ${B}$ is a ${n}_{{C}}×{n}_{{C}}$ matrix of color interaction effects, $p={n}_{{C}}\left(1+{n}_{{C}}\right)$ is the total number of parameters. In this section, we develop a weighted maximum likelihood estimator for our model, ${L}_{n}\left(\mathbit{\theta }\right)=\prod _{t=1}^{T}\prod _{c\in {C}}\prod _{i\in {{L}}_{n}}P\left({Y}_{i,t}^{\left(c\right)}|{\mathbit{Y}}_{t-1};\mathbit{\theta }{\right)}^{{w}_{i,t}^{\left(c\right)}}=\prod _{t=1}^{T}\prod _{c\in {C}}\prod _{i\in {{L}}_{n}}\left({e}^{-{\lambda }_{i,t}^{\left(c\right)}\left(\mathbit{\theta }\right)}\frac{{{\lambda }_{i,t}^{\left(c\right)}\left(\mathbit{\theta }\right)}^{{y}_{i,t}^{\left(c\right)}}}{{y}_{i,t}^{\left(c\right)}!}{\right)}^{{w}_{i,t}^{\left(c\right)}},$(3)

where ${\lambda }_{i,t}^{\left(c\right)}\left(\mathbit{\theta }\right)$ is the expected number of cells with color $c$ in tile $i$ at time $t$, defined in (1) and the weights ${w}_{i,t}^{\left(c\right)}$ are given constants. The weighted maximum likelihood estimator (MLE), $\stackrel{ˆ}{\mathbit{\theta }}$, is obtained by maximizing the weighted log-likelihood function ${\mathrm{\ell }}_{n}\left(\mathbit{\theta }\right)=\sum _{i\in {{L}}_{n}}\sum _{t=1}^{T}\sum _{c\in {C}}{w}_{i,t}^{\left(c\right)}\left[{Y}_{i,t}^{\left(c\right)}{v}_{i,t}^{\left(c\right)}\left(\mathbit{\theta }\right)-exp\left\{{v}_{i,t}^{\left(c\right)}\left(\mathbit{\theta }\right)\right\}\right],$(4)

where ${v}_{i,t}^{\left(c\right)}\left(\mathbit{\theta }\right)\equiv log{\lambda }_{i,t}^{\left(c\right)}\left(\mathbit{\theta }\right)$. Equivalently, $\stackrel{ˆ}{\mathbit{\theta }}$ is formed by solving the weighted estimating equations $0={\mathbit{u}}_{n}\left(\mathbit{\theta }\right)\equiv \frac{1}{{n}_{{L}}}\mathrm{\nabla }{\mathrm{\ell }}_{n}\left(\mathbit{\theta }\right)=\frac{1}{{n}_{{L}}}\sum _{i\in {{L}}_{n}}\sum _{t=1}^{T}{w}_{i,t}^{\left(c\right)}{\mathbit{\gamma }}_{i,t}\left(\mathbit{\theta }\right)\otimes \mathrm{\nabla }{\mathbit{v}}_{i,t},$(5)

where ${\mathbit{\gamma }}_{i,t}\left(\mathbit{\theta }\right)=\left({y}_{i,t}^{\left(1\right)}-exp\left\{{v}_{i,t}^{\left(1\right)}\left(\mathbit{\theta }\right)\right\},\dots ,{y}_{i,t}^{\left({n}_{{C}}\right)}-exp\left\{{v}_{i,t}^{\left({n}_{{C}}\right)}\left(\mathbit{\theta }\right)\right\}\right)$, $\otimes$ denotes the Kronecker product, $\mathrm{\nabla }$ is the gradient operator with respect to $\mathbit{\theta }$ and $\mathrm{\nabla }{\mathbit{v}}_{i,t}\equiv \mathrm{\nabla }{\mathbit{v}}_{i,t}^{\left(c\right)}\left(\mathbit{\theta }\right)=\left(1,{S}_{i,t-1}^{\left(1\right)},\dots ,{S}_{i,t-1}^{\left({n}_{{C}}\right)}{\right)}^{\mathrm{\top }}$.

Specific weights could be used to address the presence of outliers. Following Ferrari and Vecchia [29] and La Vecchia, Camponovo, and Ferrari [30], the influence of strong outliers could be avoided by taking weights of form ${w}_{i,t}^{\left(c\right)}=exp\left\{-\left(1-q\right)\left[{Y}_{i,t}^{\left(c\right)}{v}^{\left(c\right)}\left(\theta \right)-exp\left({v}^{\left(c\right)}\left(\theta \right)\right)\right]\right\}$ with $q$ being a tuning constant smaller than 1. However, for the current application we use constant weights all equal to 1.

Our empirical results show that this choice performs reasonably well in terms of estimation accuracy in all our numerical examples and guarantees optimal variance for the estimator $\stackrel{ˆ}{\mathbit{\theta }}$ under correct model specification. The solution to eq. (5) is obtained by a standard Fisher scoring algorithm, which is found to be stable and converges fast in all our numerical examples.

Finally, in practical applications it is also important to address the question of how to select an appropriate model by retaining only the meaningful spatio-temporal interactions between cell populations, and avoid over-parametrized models. Model selection plays an important role by balancing goodness-of-fit and model complexity. Here, we select non-zero model parameters based traditional model selection approaches: the Akaike Information criterion, $AIC=-2\mathrm{\ell }\left(\stackrel{ˆ}{\mathbit{\theta }}\right)+2p$, and the Bayesian information criterion, $BIC=-2\mathrm{\ell }\left(\stackrel{ˆ}{\mathbit{\theta }}\right)+plog\left(|{n}_{{L}}T|\right)$.

## 2.3 Asymptotic properties and standard errors

In this section, we overview the asymptotic behavior of the estimator introduced in Section 2.2. In our setting we consider a fixed number of time points, $T$, whilst the lattice ${{L}}_{n}$ is allowed to increase. This reflects the notion that the statistician is allowed to choose an increasingly fine tiling grid as the number of cells increases. If the regularity conditions stated in the Appendix hold, then $\sqrt{{n}_{{L}}}{\mathbit{H}}_{n}\left({\mathbit{\theta }}_{0}{\right)}^{1/2}\left({\stackrel{ˆ}{\mathbit{\theta }}}_{n}-{\mathbit{\theta }}_{0}\right)$ converges in distribution to a $p$-variate normal distribution with zero mean vector and identity variance, as ${n}_{{L}}\to \mathrm{\infty }$, with ${\mathbit{H}}_{n}\left(\mathbit{\theta }\right)$ given in (6). Asymptotic normality of ${\stackrel{ˆ}{\mathbit{\theta }}}_{n}$ follows by applying the limit theorems for M-estimators for nonlinear spatial models developed by Jenish and Prucha [31]. One condition required to ensure this behaviour is that ${\mathbit{Y}}_{t}$ has constant entries at the initial time point $t=0$, which is quite realistic since typically cells are seeded randomly at the beginning of the experiment. Our proofs mostly check $\alpha$-mixing conditions and ${{L}}_{2}$-Uniform Integrability of the score functions ${\mathbit{u}}_{i,t}\left(\mathbit{\theta }\right)$ ensures a pointwise law of large numbers, with additional stochastic equicontinuity, a uniform version of the law of large numbers required by Jenish and Prucha [31].

The asymptotic variance of $\stackrel{ˆ}{\mathbit{\theta }}$ is ${\mathbit{V}}_{n}\left(\stackrel{ˆ}{\mathbit{\theta }}\right)={\mathbit{H}}_{n}^{-1}\left({\mathbit{\theta }}_{0}\right)$, where ${\mathbit{H}}_{n}\left(\mathbit{\theta }\right)$ is the $p×p$ Hessian matrix $\begin{array}{r}{\mathbit{H}}_{n}\left(\mathbit{\theta }\right)=-E\left[{\mathrm{\nabla }}^{2}\mathrm{\ell }\left(\mathbit{\theta }\right)\right]=-E\left(\sum _{i\in {{L}}_{n}}\mathrm{\nabla }{\mathbit{u}}_{i}\left(\mathbit{\theta }\right)\right),\end{array}$(6)

with ${\mathbit{u}}_{i}\left(\mathbit{\theta }\right)={\mathbit{u}}_{i,1}\left(\mathbit{\theta }\right)+\cdots +{\mathbit{u}}_{i,T}\left(\mathbit{\theta }\right)$ being the partial score function for the $i$th tile. Direct evaluation of $\mathbit{H}\left(\mathbit{\theta }\right)$ may be challenging since the expectations in (6) is intractable. Thus, we estimate ${\mathbit{H}}_{n}\left(\mathbit{\theta }\right)$ by the empirical counterpart ${\stackrel{ˆ}{\mathbit{H}}}_{n}\left(\mathbit{\theta }\right)=\left(\begin{array}{cccc}{\stackrel{ˆ}{\mathbit{H}}}^{\left(1\right)}\left(\mathbit{\theta }\right)& \mathbf{0}& \cdots & \mathbf{0}\\ \mathbf{0}& {\stackrel{ˆ}{\mathbit{H}}}^{\left(2\right)}\left(\mathbit{\theta }\right)& \cdots & \mathbf{0}\\ ⋮& ⋮& \ddots & ⋮\\ \mathbf{0}& \mathbf{0}& \cdots & {\stackrel{ˆ}{\mathbit{H}}}^{\left({n}_{{C}}\right)}\left(\mathbit{\theta }\right)\end{array}\right),$

where $\begin{array}{rl}{\stackrel{ˆ}{\mathbit{H}}}^{\left(c\right)}\left(\mathbit{\theta }\right)& =\sum _{i\in {{L}}_{n}}\sum _{t=1}^{T}{w}_{i,t}^{\left(c\right)}exp\left[{v}_{i,t}^{\left(c\right)}\left(\mathbit{\theta }\right)\right]\left[\mathrm{\nabla }{\mathbit{v}}_{i,t}\right]{\left[\mathrm{\nabla }{\mathbit{v}}_{i,t}\right]}^{\mathrm{\top }}.\end{array}$(7)

Note that the above estimators approximate the quantities in formula (6) by conditional expectations. Our numerical results suggest that the above variance approximation yields confidence intervals with coverage close to the nominal level $\left(1-\alpha \right)$. Besides the above formulas, we also consider confidence intervals obtained by a parametric bootstrap approach. Specifically, we generate $B$ bootstrap samples ${\mathbit{Y}}_{\left(1\right)}^{\ast },\dots ,{\mathbit{Y}}_{\left(B\right)}^{\ast }$ by sampling at subsequent times from the conditional model specified in eqs. (1) and (2) with $\mathbit{\theta }=\stackrel{ˆ}{\mathbit{\theta }}$. From such bootstrap samples, we obtain bootstrapped estimators, ${\stackrel{ˆ}{\mathbit{\theta }}}_{\left(1\right)}^{\ast },\dots ,{\stackrel{ˆ}{\mathbit{\theta }}}_{\left(B\right)}^{\ast }$, which are used to estimate $var\left({\stackrel{ˆ}{\mathbit{\theta }}}_{0}\right)$ by the usual covariance estimator ${\stackrel{ˆ}{\mathbit{V}}}_{boot}\left(\stackrel{ˆ}{\mathbit{\theta }}\right)=\sum _{b=1}^{B}\left({\stackrel{ˆ}{\mathbit{\theta }}}_{\left(b\right)}^{\ast }-{\stackrel{‾}{\mathbit{\theta }}}^{\ast }{\right)}^{2}/\left(B-1\right)$, where ${\stackrel{‾}{\mathbit{\theta }}}^{\ast }=\sum _{b=1}^{B}{\stackrel{ˆ}{\mathbit{\theta }}}_{\left(b\right)}^{\ast }/B$. Finally, a $\left(1-\alpha \right)100\mathrm{%}$ confidence interval for ${\mathbit{\theta }}_{j}$ is obtained as ${\stackrel{ˆ}{\mathbit{\theta }}}_{j}±{z}_{1-\alpha /2}\left\{\stackrel{ˆ}{\mathbit{V}}{\right\}}_{jj}^{1/2}$, where ${z}_{q}$ is the $q$-quantile of a standard normal distribution, and $\stackrel{ˆ}{\mathbit{V}}$ is an estimate of $var\left(\stackrel{ˆ}{\mathbit{\theta }}\right)$ obtained by either eq. (7) or bootstrap resampling.

## 3 Monte Carlo simulations

In our Monte Carlo experiments, we generate data from a Poisson model as follows. At time $t=0$, we populate ${n}_{{L}}$ tiles using equal counts for cells of different colors. For $t=1,\dots ,T$, observations are drawn from the multivariate Poisson model ${Y}_{i,t}^{\left(c\right)}|{\mathbit{Y}}_{t-1}\sim \text{Poisson}\left({\lambda }_{i,t}^{\left(c\right)}\right),c\in {C}.$ Recall that the rate ${\lambda }_{i,t}^{\left(c\right)}$ defined in Section 2.1 contains autoregressive coefficients ${\beta }^{\left(c|c\mathrm{\prime }\right)}$, which are collected in the ${n}_{{C}}×{n}_{{C}}$ matrix ${B}$.

We assess the performance of MLE under different settings concerning the size and sparsity of ${B}$. Consider the three models with the following choices of ${B}$:

${{B}}_{\mathbf{1}}=\left(\begin{array}{ccc}0.7& -0.7& 0.7\\ 0.7& 0.7& -0.7\\ -0.7& 0.7& 0.7\end{array}\right),{{B}}_{\mathbf{2}}=\left(\begin{array}{ccc}0.05& -0.15& 0.25\\ 0.35& 0.45& -0.55\\ -0.65& 0.75& 0.85\end{array}\right),{{B}}_{\mathbf{3}}=\left(\begin{array}{ccc}0.7& -0.7& 0.7\\ 0& 0.7& 0\\ 0& 0& 0.7\end{array}\right).$

Denote Model $i$ as the model corresponding to ${{B}}_{i},i=1,2,3$. In Model 1, all the effects in ${B}$ have the same size; in Model 2, the effects have decreasing sizes; Model 3 is the same as Model 1, but with some interactions exactly equal to zero.

We set ${\alpha }^{\left(1\right)}=\cdots ={\alpha }^{\left({n}_{{C}}\right)}=-0.1$ for all three models. The above parameter choices reflect the situation where the generated process $\mathbit{Y}$ has a moderate growth.

In Table 1 and Table 2, we show results based on 1000 Monte Carlo runs generated from Models 1-3, for $n=25,{n}_{{C}}=3$ and $T=10$ and $25$. In Table 1, we show Monte Carlo estimates of squared bias and variance of $\stackrel{ˆ}{\mathbit{\theta }}$. Both squared bias and variance of our estimator are quite small in all three models, and decrease as $T$ gets larger. The variances of Model 2 are slightly larger than those in the other two models due to the increasing difficulty in estimating parameters close to zero.

Table 1:

Monte Carlo estimates for squared bias $\left(×{10}^{-6}\right)$ and variance $\left(×{10}^{-4}\right)$ of the MCLE for three models with time points $T=10,25.$ Simulation standard errors are shown in parenthesis. The three models differ in terms of the coefficients ${\beta }^{\left(c|c\mathrm{\prime }\right)},c,c\mathrm{\prime }\in {C}$, as described in Section 3: Non-zero equal effects (Model 1), non-zero decreasing interactions (Model 2), and sparse effects (Model 3). For all models, ${\alpha }^{\left(c\right)}=-0.1,c=1,2,3$. Estimates are based on 1000 Monte Carlo runs.

In Table 2, we report the coverage probability for symmetric confidence intervals of the form $\stackrel{ˆ}{\mathbit{\theta }}±{z}_{1-\alpha /2}\stackrel{ˆ}{sd}\left(\stackrel{ˆ}{\mathbit{\theta }}\right)$, where ${z}_{q}$ is the $q-$quantile for a standard normal distribution, with $\alpha =0.01,0.05,0.10.$ The standard error, $\stackrel{ˆ}{sd}\left(\stackrel{ˆ}{\mathbit{\theta }}\right)$, is obtained by the squared root of diagonal elements of ${\mathbit{V}}_{n}\left(\stackrel{ˆ}{\mathbit{\theta }}\right)$ and the parametric bootstrap estimate, ${\stackrel{ˆ}{\mathbit{V}}}_{est}$ and ${\stackrel{ˆ}{\mathbit{V}}}_{boot}$, described in Section 2.3. The coverage probability of the confidence intervals are very close to the nominal level for both methods.

Table 2:

Monte Carlo estimates for the coverage probability of $\left(1-\alpha \right)\mathrm{%}$ confidence intervals $\stackrel{ˆ}{\mathbit{\theta }}±{z}_{1-\alpha /2}\stackrel{ˆ}{sd}\left(\stackrel{ˆ}{\mathbit{\theta }}\right)$, with $\stackrel{ˆ}{sd}\left(\stackrel{ˆ}{\mathbit{\theta }}\right)$ obtained using bootstrap (${\stackrel{ˆ}{\mathbit{V}}}_{boot}$) and sandwich (${\stackrel{ˆ}{\mathbit{V}}}_{est}$) estimators in Section 2 and 3. The three models differ in terms of the coefficients ${\beta }^{\left(c|c\mathrm{\prime }\right)},c,c\mathrm{\prime }\in {C}$ as described in Section 3: Non-zero equal effects (Model 1), non-zero decreasing interactions (Model 2), and sparse effects (Model 3). For all models, ${\alpha }^{\left(c\right)}=-0.1,c=1,2,3$, estimates are based on 1000 Monte Carlo runs.

In Table 3, we show results for the model selection based on 1000 Monte Carlo samples from Model 3 using the AIC and the BIC given in Section 2 for $n=25$ and $T=10,25$. We report Type A error (a term is not selected when it actually belongs to the true model ) and Type B error (a term is selected when it is not in the true model ). For both AIC and BIC model selection is more accurate for large $T$. As expected AIC tends to over select, and BIC outperforms AIC, with zero Type A error, and very low Type B error.

Table 3:

Monte Carlo estimates for $\mathrm{%}$ Type A error (a term is not selected when it actually belongs to the true model) and $\mathrm{%}$ Type B error (a term is selected when it is not in the true model) using AIC and BIC criteria. Results are based on 1000 Monte Carlo samples generated from Model 3 with $n=25$ and $T=10,25$.

Finally, we compare the performance of our model with the following Multivariate conditional autoregressive (MCAR) model proposed by Leroux, Lei, and Breslow [32]: $\begin{array}{rl}& {Y}_{i,t}^{\left(c\right)}\sim \text{Pois}\left(exp\left({\mathbit{x}}_{i,t}^{T}\mathbit{\beta }+{\mathbit{Z}}_{i}\right)\right),\end{array}$

where ${\mathbit{Z}}_{i},i\in {{L}}_{n}$ are random effects with conditional distribution ${\mathbit{Z}}_{i}|{\mathbit{Z}}_{-i}\sim N\left(\frac{\rho \sum _{j\sim i:j\in {{L}}_{n}}{\mathbit{Z}}_{j}}{\rho {n}_{i}+1-\rho },\frac{{\mathbf{\Sigma }}_{Z}}{\rho {n}_{i}+1-\rho }\right),$

where $\rho$ is a spatial autocorrelation parameter, with $\rho =0$ corresponding to independence, while $\rho =1$ corresponds to the intrinsic model, and ${\mathbf{\Sigma }}_{Z}$ is a ${n}_{{C}}T×{n}_{{C}}T$ between variable covariance matrix, which is assumed to have no fixed structure, and ${n}_{i}$ is the number of tiles in a neighborhood of tile $i$ as defined in Section 2.1. Let $\mathbit{\beta }=\left({\alpha }^{T},\text{vec}\left({B}{\right)}^{T}{\right)}^{T}$ be a vector of regression parameters, where ${B}$ is defined in Section 2.1 and $\alpha$ is the intercept. Let the covariate ${\mathbit{x}}_{i,t}$ be a ${n}_{{C}}^{2}$-dimensional vector consists of ${n}_{{C}}$ vectors: $\left({S}_{i,t-1}^{\left(1\right)},\dots ,{S}_{i,t-1}^{\left({n}_{{C}}\right)}\right)$, where ${S}_{i,t-1}^{\left(c\right)}$ carries the information from the neighbouring tiles on the previous time point, defined in eq. (2).

An independent Gaussian prior, $N\left(0,100000\right)$, is specified for each regression parameter in $\mathbit{\beta }$. A uniform prior on the unit interval, $U\left(0,1\right)$, is specified for $\rho$. For covariance matrix ${\mathbf{\Sigma }}_{Z}$, assume an inverse Wishart distribution with identity scale matrix and ${n}_{{C}}T$ degree of freedom.

To evaluate the performance of MLE under our model and estimators obtained by the MCAR model, we generate $1000$ set of data from Model 1. Estimation of the MCAR model is done by MCMC sampling, using R package CARBayes by Lee [33]. Table 4 show Monte Carlo estimates of squared bias, variance, the coverage probability of $95\mathrm{%}$ confidence intervals and computation time for $n,T\in \left\{10,25\right\}$ and ${n}_{{C}}=1,2,3$. Two of the settings are the same as those shown for Model 1 in Table 1: $n=25,{n}_{{C}}=3,T=10$ and $n=25,{n}_{{C}}=3,T=25$ . In estimation of MCAR, we also show results of two MCMC settings: 1. MCAR1: $1000$ MCMC samples generated and $200$ discarded as the burn-in period; 2. MCAR2: $5000$ samples with $100$ discarded. Coverage probabilities of our model is computed as $\stackrel{ˆ}{\mathbit{\theta }}±{z}_{0.975}\stackrel{ˆ}{sd}\left(\stackrel{ˆ}{\mathbit{\theta }}\right)$, where ${z}_{q}$ is the $q-$quantile for a standard normal distribution. The standard error, $\stackrel{ˆ}{sd}\left(\stackrel{ˆ}{\mathbit{\theta }}\right)$, is obtained by taking the squared root of diagonal elements of ${\mathbit{V}}_{n}\left(\stackrel{ˆ}{\mathbit{\theta }}\right)$ described in Section 2.3.

Table 4:

Monte Carlo estimates for squared bias $\left(×{10}^{-6}\right)$, variance $\left(×{10}^{-4}\right)$, the coverage probability of $95\mathrm{%}$ confidence intervals as well as computation time for $n,T\in \left\{10,25\right\}$ and ${n}_{{C}}=1,2,3$ of MLE of our model, and MCAR, where in MCAR1, $1000$ MCMC samples generated and $200$ discarded as the burn-in period; and in MCAR2, $5000$ samples with $100$ discarded. True values of regression parameters are shown as ${{B}}_{1}$. Estimates are obtained from $1000$ Monte Carlo runs.

In overall, our method performs better than MCAR at analysing the kind of data that we generate, especially when $n$ and/or $T$ is small, with much smaller bias and variance, as well as computation time. The performance of MCAR improves significantly as the model gets more complicated (i.e. larger ${n}_{{C}}$), and when $n$ and $T$ increases. In the case where $n=25,T=25$ and ${n}_{{C}}=3$, it almost performs equally well with our model, however, it takes almost an hour to obtain the estimates, while our method requires less than a minute. Besides, for the coverage probabilities to reach the nominal level, it seems that MCAR requires larger MCMC sample size as the model gets more complicated, while those of our model has been stable and close to the nominal level in all cases.

## 4 Analysis of the cancer cell growth data

Cancer cell behaviour is believed to be determined by several factors including genetic profile and differentiation state. However, the presence of other cancer cells and non-cancer cells has also been shown to have a great impact on overall tumor behaviour [34, 35]. It is therefore important to be able to dissect and quantify these interactions in complex culture systems. The data sets in this section represent a cancer cell-fibroblast co-culture experiment. The data sets analyzed consist of counts of cell types (different cancer cell populations expressing different fluorescent proteins, and non-fluorescent fibroblasts) from 9 subsequent images taken at an 8-hour frequency over a period of 3 days using the Operetta high-content imager (Perkin Elmer). Information regarding cell type (fluorescent profile) and spatial coordinates for each individual cell were extracted using the associated software (Harmony, Perkin Elmer).

Each image was subsequently tiled using a $25×25$ regular grid.

We choose the number of tiles for a balance between the fit of the model and capturing the local impact between cell populations. More specifically, decreasing tile sizes enables one to detect local impacts between cell populations, which is one of the objectives of our analysis. However, if the tiles are too small, we will end up with mostly no cells in most tiles. In this situation the conditional Poisson model would not fit well the data. On the other hand, when the tiles are too large the model would fit the data well (the conditional Poisson would be approximately a conditional normal model), but we lose information on local impacts. We recommend 0 to 20 average cells per tile, since for such choice our diagnostic and goodness-of-fit analyses suggest that the conditional Poisson model fits well the data whilst enabling us to measure local correlation effects between populations.

## 4.1 Cancer cell-fibroblast co-culture experiment

In this experiment, cancer cells are co-cultured with fibroblasts, a predominant cell type in the tumor microenvironment, believed to affect tumor progression, partly due to interactions with and activation by cancer cells [34]. In this experiment, fibroblasts (F) are non-fluorescent whereas cancer cells fluoresce either in the red (R) or green (G) channels due to the experimental expression of mCherry or GFP proteins, respectively. Cells were initially seeded at a ratio of 1:1:2 (R:G:F).

Model selection and inference.   We applied our methodology to quantify the magnitude and direction of the impacts have on growth for the considered cell types. To select the relevant terms in the intensity expression (1), we carry out model selection using the BIC model selection criterion. In Table 5, we show estimated parameters for the full and the BIC models, with bootstrap $95\mathrm{%}$ confidence intervals in parenthesis. Figure 2 illustrates estimated spatio-temporal impacts between cell types using a directed graph. The solid and dashed arrows represent respectively significant and not significant impacts between cell types at the $95\mathrm{%}$ confidence level. Significant impacts coincide with parameters selected by BIC.

The interactions within each cell type (${\stackrel{ˆ}{\beta }}^{\left(c|c\right)},c=R,G,F$) are significant, which is consistent with healthy growing cells. As anticipated, the effects ${\stackrel{ˆ}{\beta }}^{\left(c|c\right)}$ for the cancer cells are larger than those for the slower growing fibroblasts. The validity of the estimated parameters is also supported by the similar sizes of the parameters for the green and red cancer cells. This is expected, since the red and green cancer cells are biologically identical except for the fluorescent protein they express. Interestingly, the size of the estimated effects within both types of cancer cells (${\stackrel{ˆ}{\beta }}^{\left(c|c\right)},c=R,G$) are larger than the impact they have on one another (${\stackrel{ˆ}{\beta }}^{\left(G|R\right)}$ and ${\stackrel{ˆ}{\beta }}^{\left(R|G\right)}$). This is not surprising, since ${\stackrel{ˆ}{\beta }}^{\left(c|c\right)}\left(c=R,G\right)$ reflects not only impacts between cells from the same cell population, but also cell proliferation. The fact that we are able to detect the impacts between the red and green cancer cells confirms that our methodology is sensitive enough to detect biologically relevant impacts even though no interactions were found between the cancer cells and the fibroblasts. This might be due to the fact that we used normal fibroblasts that had not previously been in contact with cancer cells and thus had not been activated to support tumor progression as is the case with cancer-activated fibroblasts.

Figure 2:

Directed graph showing fitted spatio-temporal interactions between GFP cancer cells (G), mCherry cancer cells (R) and fibroblasts (F). The solid and dashed arrows represent respectively the significant and not significant interactions between cell types at the $95\mathrm{%}$ confidence level.

Goodness-of-fit and one-step ahead prediction   To illustrate the goodness-of-fit of the estimated model, we generate cell counts for each type in each tile, ${\stackrel{ˆ}{y}}_{i,t}^{\left(c\right)}$, from the Pois(${\stackrel{ˆ}{\lambda }}_{i,t}^{\left(c\right)}$) distribution for $t\ge 1$, where ${\stackrel{ˆ}{\lambda }}_{i,t}^{\left(c\right)}$ is computed using observations at time $t-1$, with parameters estimated from the entire dataset. In Figure 4, we compare the actually observed and generated cell counts for GFP cancer cells (G) and mCherry cancer cells (R) and fibroblasts (F) across the entire image. The solid and dashed curves for all cell types are close, suggesting that the model fits the data reasonably well. As anticipated, the overall growth rate for the red and green cancer cells are similar, and sensibly larger than the growth rate for fibroblasts.

To assess the prediction performance of our method, we consider one-step-ahead forecasting using parameters estimated from a moving window of five time points. In Figure 3, we show quantiles of observed cell counts against predicted counts for each tile. The upper and lower $95\mathrm{%}$ confidence bounds are computed non-parametrically by taking ${\stackrel{ˆ}{F}}_{1}^{-1}\left({\stackrel{ˆ}{F}}_{0}\left({y}_{t}^{\left(c\right)}\right)-0.95\right)$ and ${\stackrel{ˆ}{F}}_{1}^{-1}\left({\stackrel{ˆ}{F}}_{0}\left({y}_{t}^{\left(c\right)}\right)+0.95\right)$, where ${\stackrel{ˆ}{F}}_{0}$ and ${\stackrel{ˆ}{F}}_{1}$ are the empirical distributions of the observations and predictions at time $t$ respectively [36]. The identity line falls within the confidence bands in each plot, indicating a satisfactory prediction performance.

Figure 3:

QQ-plots for cell growth, comparing observed (horizontal axis) and one-time ahead predicted (vertical axis) cell counts per tile on the entire image at times $t=6,7,8$ for GFP cancer cells (G), mCherry cancer cells (R) and fibroblasts (F). One-time ahead predictions are based on the model fitted using a moving window of five time points.

Comparison with MCAR model   Next, we compare the estimates as well as the goodness-of-fit on the real data with the MCAR model. Parameter estimates are shown in Table 5, with $95\mathrm{%}$ confidence intervals given in parenthesis. Results from both models are mostly consistent with each other, specifically, both models show that impacts within each cell type (${\stackrel{ˆ}{\beta }}^{\left(c|c\right)},c=R,G,F$) are significant, the effects ${\stackrel{ˆ}{\beta }}^{\left(c|c\right)}$ for cancer cells are larger than those for the slower growing fibroblasts, the green and red cancer cells have positive impact on each other, and cancer cells have no impact on fibroblasts. The only difference is, the MCAR model shows a negative impact of fibroblasts on the green cancer cells only, while our model detect no significant impact on either cancer cells. Since the red and green cancer cells are biologically identical except for the fluorescent protein they express, we expect a symmetrical result with both cancer cells.

Table 5:

Estimated parameters for the full, the BIC models and the MCAR model based on the cancer cell growth data described in Section 4. Bootstrap $95\mathrm{%}$ confidence intervals based on $50$ bootstrap samples are given in parenthesis.

In Figure 4, apart from the observed (solid curve) and generated (dashed curve) cell counts from our model, we also show the generated cell counts from the MCAR model (dotted curve) for the green cancer cells (G), red cancer cells (R) and fibroblasts (F) across the entire image. Compared to the dotted curves, the dashed curves are slightly closer to the solid ones, which means our model seems more appropriate for analysing this type of data than the MCAR model.

Figure 4:

Goodness-of-fit of the estimated models. Observed (solid) and predicted (dashed for our model and dotted for the MCAR model) number of GFP cancer cells (G), mCherry cancer cells (R) cancer cells and fibroblasts (F) for the entire image. Predicted cell counts for each cell type in each tile ${\stackrel{ˆ}{y}}_{i,t}^{\left(c\right)}$ is generated from the conditional Poisson model with intensity ${\stackrel{ˆ}{\lambda }}_{i,t}^{\left(c\right)}$ defined in eqs. (1) and (2), where the coefficients ${\stackrel{ˆ}{\beta }}^{\left(c|c\mathrm{\prime }\right)}$ are estimated from the entire dataset.

## 5 Conclusion and final remarks

In this paper, we introduced a conditional spatial autoregressive model and accompanying inference tools for multivariate spatio-temporal cell count data. The new methodology enables one to measure the overall cell growth rate in longitudinal experiments and spatio-temporal interactions with either homogeneous or heterogeneous cell populations. The proposed inference approach is computationally tractable and strikes a good balance between computational feasibility and statistical accuracy. Numerical findings from simulated and real data in Sections 3 and 4 confirm the validity of the proposed approach in terms of prediction, goodness-of-fit and estimation accuracy.

The data sets described in this paper serve as a proof-of-concept that the proposed methodology works. However, the potential applications and the relevant questions that the methodology can help to answer in cancer cell biology are plentiful. To build on from the examples given in this paper, the methodology can be used to study interactions between cancer cells and a wide range of cancer-relevant cell types such as cancer-activated fibroblasts, macrophages, and other immune cells when co-cultured. Since a substantial proportion of cancer cells in tumors are in close proximity to other cell types that have been shown to affect tumor progression, using these co-cultures is more representative of the situation in a patient compared to studying cancer cells on their own. In addition to just giving the final cell number, the presented approach can dissect which cell types affect the growth of others and to what extent in complex heterogeneous populations. This could be relevant in a drug discovery setting to determine if a drug affects cancer cell growth due to internal effects (on other cancer cells) or by interfering with the interaction between the cancer cells and other cell types. Finding drugs with different targets and mechanisms of action are particularly sought after as they provide a wider target profile, increasing the chance of patients responding as well as reducing the risk of tumors becoming resistant. The impact of different genes and associated pathways in different cell types in relation to inter-cellular interactions can also be studied by genetically modifying the cell type(s) in question before mixing the cells together. This could be beneficial to identify new potential drug targets. Our approach is also applicable in other kinds of studies where local spatial cell-cell interactions are believed to affect cell growth such as studies of neurodegenerative diseases [37] and wound healing/tissue re-generation [38]. In addition to evaluating cell growth, our approach can also be used to study transitions between cellular phenotypes upon interaction with other cell types, provided that the different phenotypes studied can be distinguished from one another based on the image data. Finally, it is worth noting that issues may arise when cells become too confluent/dense, this may lead to segmentation problems of the imaging system. If they become completely confluent, they are likely to progressively stop growing. If one wants to measure for longer period of time, experiments can be performed in larger wells/plates or with smaller starting cell numbers.

Our method offers several practical advantages to researchers interested in analysing multivariate count data on heterogeneous cell populations. First, the conditional Poisson model does not require tracking individual cells across time, a process that is often difficult to automate due to cell movement, morphology changes at subsequent time points, and additional complications related to storage of large data files. Second, we are able to quantify local spatio-temporal interactions between different cell populations from a very simple experimental set-up where the different cell populations are grown together in a single experimental condition (co-culture). An alternative, solely experimentally-based strategy would require monitoring the different cell types alone and together at different cell densities (number of cells per condition) in order to make inferences in terms of potential interactions. However, such an approach would give no possibility of evaluating the spatial relations in the co-culture conditions and would still restrict the number of simultaneously tested cell types to two.

In the future, we foresee several useful extensions of the current methodology, possibly enabling the treatment of more complex experimental settings. First, complex experiments involving a large number of cell populations, ${n}_{{C}}$, would imply an over-parametrized model. Clearly, this large number of parameters would be detrimental to both statistical accuracy and reliable optimization of the likelihood objective function ${\mathrm{\ell }}_{n}\left(\theta \right)$ in (4). To address these issues, we plan to explore a penalized likelihood of form ${\mathrm{\ell }}_{n}\left(\theta \right)-{\text{pen}}_{\lambda }\left(\theta \right)$, where $\text{pen}\left(\theta \right)$ is a nonnegative sparsity-inducing penalty function. For example, in a different likelihood setting, Bardic et al. [39] consider the ${L}_{1}$-type penalty $\text{pen}\left(\theta \right)=\lambda \sum |\theta |,$ $\lambda >0$.

Second, for certain experiments, it would be desirable to modify the statistics in eq. (2) to include additional information on cell growth such as the distance between heterogeneous cells, and covariates describing cell morphology.

Thirdly, it would be useful to develop a more principled way to select the tile sizes/number, and consider tiling the microscope image into a hexagonal lattice, which is a more natural choice in real application, since the distance between neighboring tiles would be more even than that of a regular lattice.

Finally, although numerical results (results not reported here) show that our method are quite robust in the presence of mild outliers (with around $5\mathrm{%}$ of contaminated data), for more severe situations, we expect that severe or numerous outliers will have some influence on the estimates since the Poisson score function is unbounded. To address this problem, the log-likelihood scores in eq. magenta (5) should be replaced by some other robust alternative. Following Ferrari and Vecchia [29] and La Vecchia et al. [30], robustness can be obtained by the so-called $q$-entropy estimation method simply obtained by replacing the usual logarithm in the log-likelihood estimating equation by the $q$-logarithm logarithm function ${log}_{q}\left(x\right)=\left({x}^{1-q}-1\right)/\left(1-q\right)$if $q\ne 1$, and ${log}_{q}\left(u\right)=log\left(x\right)$ if $q=1$, for all $x>0$. This ensures a bounded influence function for the implied estimator and therefore guarantees control of the bias under contamination.

## Acknowledgements

The authors wish to acknowledge support from the Australian National Health and Medical Research Council grants 1049561, 1064987 and 1069024 to Frédéric Hollande. Christina Mølck is supported by the Danish Cancer Society.

## Appendix

In the first part of this section, we provide technical lemmas required to prove asymptotic properties of the estimator ${\stackrel{ˆ}{\mathbit{\theta }}}_{n}.$

Denote ${E}_{t}\left[\cdot \right]$ as the expectation with respect to ${\mathbit{Y}}_{t}=\left\{{\mathbit{Y}}_{i,t},i\in {{L}}_{n}\right\},$ and $E\left[\cdot \right]$ as the expectation of $\mathbit{Y}=\left\{{\mathbit{Y}}_{t},t=1,\dots ,T\right\}$. Let ${N}_{i,r}$ be the set of tiles in the neighborhood of tile $i$, with radius $r$. Specifically, for two locations $i$ and $j$, we say $j\in {N}_{i,r}$ if $\parallel i-j\parallel \le r.$ Thus, the neighborhood defined in Section 2 is of radius $1$, i.e. $\left\{j:j\sim i\right\}=\left\{j:j\in {N}_{i,1}\right\}$. Denote ${n}_{r}=\underset{i\in {{L}}_{n}}{max}|{N}_{i,r}|={r}^{2}+r+1$. Actually, for any tile $i$ that is not on the boundary of the image, $|{N}_{i,r}|={n}_{r}.$

In the remainder of this paper we use the following assumptions:

• A.1: The parameter space $\mathbf{\Theta }$ is a compact subset of ${\mathbb{R}}^{p}$, and that ${\mathbit{\theta }}_{0}$ is the unique maximiser of $\mathrm{\ell }\left(\mathbit{\theta }\right)=\underset{{n}_{{L}}\to \mathrm{\infty }}{lim}{\mathrm{\ell }}_{n}\left(\mathbit{\theta }\right).$

• A.2: The $\left({n}_{{C}}+1\right)×{n}_{{L}}T$ matrix $\left(\mathrm{\nabla }{\mathbit{v}}_{1,1},\mathrm{\nabla }{\mathbit{v}}_{1,2},\dots ,\mathrm{\nabla }{\mathbit{v}}_{1,T},\mathrm{\nabla }{\mathbit{v}}_{2,1},\dots ,\mathrm{\nabla }{\mathbit{v}}_{n,T}\right)$ is full rank.

#### Lemma 1.

Let ${Y}_{1},\dots ,{Y}_{n}$ be independent Poisson random variables with mean ${\lambda }_{1},\dots ,{\lambda }_{n}$ respectively, where $N$ is a finite positive integer. Then for any positive integer $h$, $E\left[\underset{i=1,\dots ,n}{max}{Y}_{i}^{h}\right]\le {n}^{h}\underset{i=1,\dots ,n}{max}E\left[{Y}_{i}^{h}\right].$

#### Proof.

$\begin{array}{ll}E\left[{max}_{i=1,\dots ,n}{Y}_{i}^{h}\right]& ⩽E\left[{\left(\sum _{i=1}^{n}{Y}_{i}\right)}^{h}\right]\text{\hspace{0.17em}}\\ ⩽{n}^{h-1}E\left[\sum _{i=1}^{n}{Y}_{i}^{h}\right]\phantom{\rule{1em}{0ex}}\text{(convexity)}\\ ⩽{n}^{h}{max}_{i=1,\dots ,n}E\left[{Y}_{i}^{h}\right]\end{array}$

#### Lemma 2

Denote ${\stackrel{˜}{Y}}_{{N}_{i,r},t}=\underset{j\in {N}_{i,r},c\in {C}}{max}{Y}_{j,t}^{\left(c\right)}$, with corresponding observation ${\stackrel{˜}{y}}_{{N}_{i,r},t}$ and conditional mean ${\stackrel{˜}{\lambda }}_{{N}_{i,r},t}$, then $E\left[{\left({\stackrel{˜}{Y}}_{{N}_{i,r},t}+1\right)}^{B}\right]\le {w}_{r,t}\sum _{k=0}^{{B}^{t}}{f}_{t}\left(k\right){e}^{k\stackrel{˜}{\alpha }}{\left(1+{\stackrel{˜}{y}}_{{N}_{i,r+t},0}\right)}^{Bk},\phantom{\rule{1em}{0ex}}t=1,2,\dots ,T$(8)

where $\begin{array}{c}{f}_{t}\left(k\right)=\sum _{h=⌈k/B⌉}^{{B}^{t-1}}{e}^{\stackrel{˜}{\alpha }h}g\left(k,Bh\right){f}_{t-1}\left(h\right),\phantom{\rule{1em}{0ex}}g\left(a,b\right)=\sum _{k=a}^{b}\left(\begin{array}{c}b\\ h\end{array}\right)\left\{\begin{array}{c}h\\ a\end{array}\right\},\\ {f}_{1}\left(k\right)=g\left(k,B\right)=\sum _{h=k}^{B}\left(\begin{array}{c}B\\ h\end{array}\right)\left\{\begin{array}{c}h\\ k\end{array}\right\},\phantom{\rule{1em}{0ex}}{w}_{r,t}=\prod _{k=0}^{t-1}{n}_{r+k}{2}^{{n}_{r+k}},\end{array}$

the $\left\{\cdot \right\}$ denotes Stirling number of the second kind, $\stackrel{˜}{\alpha }=\underset{c\in {C}}{max}{\alpha }^{\left(c\right)},B=\underset{c}{max}\left(\sum _{c\mathrm{\prime }\in {C}}{\beta }^{\left(c|c\mathrm{\prime }\right)}\right){n}_{1}.$

#### Proof

$\begin{array}{ll}{\lambda }_{i,t}^{\left(c\right)}& =exp\left[{\alpha }^{\left(c\right)}+\sum _{c\mathrm{\prime }\in {C}}{\beta }^{\left(c|c\mathrm{\prime }\right)}\sum _{j\in {N}_{i,1}}log\left({y}_{j,t-1}^{\left(c\mathrm{\prime }\right)}+1\right)\right]⩽{e}^{\stackrel{˜}{\alpha }}{\left({\stackrel{˜}{y}}_{{N}_{i,1},t-1}+1\right)}^{B}.\end{array}$(9)

Similarly, for any $c\in {C}$, we have ${\lambda }_{{N}_{i,r},t}^{\left(c\right)}\le {e}^{\stackrel{˜}{\alpha }}{\left({\stackrel{˜}{y}}_{{N}_{i,r+1},t-1}+1\right)}^{B},$ since $\left\{j\mathrm{\prime }\in {N}_{j,1};j\in {N}_{i,r},i\in {{L}}_{n},r>0\right\}=\left\{j\in {N}_{i,r+1};i\in {{L}}_{n},r>0\right\}$.

Next, we proceed by induction.

For $T=1$, by the conditional independence assumption and Lemma 1, we have $\begin{array}{rl}& {E}_{T-1}\left[{E}_{T}\left({\left({\stackrel{˜}{Y}}_{{N}_{i,r},T}+1\right)}^{B}|{\mathbit{Y}}_{T-1}\right)\right]={E}_{T-1}\left[\sum _{h=0}^{B}\left(\genfrac{}{}{0}{}{B}{h}\right){E}_{T}\left(\underset{j\in {N}_{i,r},c\in {C}}{max}{{Y}_{i,T}^{\left(c\right)}}^{h}|{\mathbit{Y}}_{T-1}\right)\right]\\ & <{n}_{r}{2}^{{n}_{r}}{E}_{T-1}\left[\sum _{k=0}^{B}\sum _{h=k}^{B}\left(\genfrac{}{}{0}{}{B}{h}\right)\left\{\genfrac{}{}{0pt}{}{h}{k}\right\}{\stackrel{˜}{\lambda }}_{{N}_{i,r},T}^{k}\right]\le {w}_{r,1}\sum _{k=0}^{B}{f}_{1}\left(k\right){e}^{k\stackrel{˜}{\alpha }}{E}_{T-1}\left[{\left(1+{\stackrel{˜}{Y}}_{{N}_{i,r+1},T-1}\right)}^{Bk}\right].\end{array}$

Since $T-1=0$ and ${Y}_{t}$ has constant entries at time point $0$, ${E}_{T-1}\left[{\left(1+{\stackrel{˜}{Y}}_{{N}_{i,r+1},T-1}\right)}^{Bk}\right]={\left(1+{\stackrel{˜}{y}}_{{N}_{i,r+1},0}\right)}^{Bk}.$

Suppose eq. (8) is true for $T=t$, then for $T=t+1$, we have $\begin{array}{rl}& {E}_{T-t-1}{E}_{T-t}{E}_{T-t+1}\dots {E}_{T}\left[{\left({\stackrel{˜}{Y}}_{{N}_{i,r},T}+1\right)}^{B}|{\mathbf{Y}}_{T-1},\dots ,{\mathbf{Y}}_{T-t-1}\right]\\ & \phantom{\rule{2em}{0ex}}⩽{E}_{T-t-1}\left\{{w}_{r,t}\sum _{k=0}^{{B}^{t}}{f}_{t}\left(k\right){e}^{k\stackrel{˜}{\alpha }}{E}_{T-t}\left[{\left(1+{\stackrel{˜}{Y}}_{{N}_{i,r+t},T-t}\right)}^{Bk}|{\mathbf{Y}}_{T-t-1}\right]\right\}\\ & \phantom{\rule{2em}{0ex}}={w}_{r,t}\sum _{k=0}^{{B}^{t}}{f}_{t}\left(k\right){e}^{k\stackrel{˜}{\alpha }}\left\{\sum _{k\mathrm{\prime }=0}^{Bk}\left(\begin{array}{c}Bk\\ k\mathrm{\prime }\end{array}\right){E}_{T-t-1}\left[{E}_{T-t}\left({\stackrel{˜}{Y}}_{{N}_{i,r+t},T-t}^{k\mathrm{\prime }}|{\mathbf{Y}}_{T-t-1}\right)\right]\right\}\\ & \phantom{\rule{2em}{0ex}}⩽{w}_{r,t}\sum _{k=0}^{{B}^{t}}{f}_{t}\left(k\right){e}^{k\stackrel{˜}{\alpha }}\left\{\sum _{k\mathrm{\prime }=0}^{Bk}\left(\begin{array}{c}Bk\\ k\mathrm{\prime }\end{array}\right){E}_{T-t-1}\left[{n}_{r+t}{2}^{{n}_{r+t}}{max}_{j\in {N}_{i,r+t},c\in {C}}{E}_{T-t}\left(Y{{j,T-t}_{\left(c\right)}^{}}^{k\mathrm{\prime }}|{\mathbf{Y}}_{T-t-1}\right)\right]\right\}\\ & \phantom{\rule{2em}{0ex}}={w}_{r,t+1}\sum _{k=0}^{{B}^{t}}{f}_{t}\left(k\right){e}^{k\stackrel{˜}{\alpha }}\left[\sum _{k\mathrm{\prime }\mathrm{\prime }=0}^{Bk}\sum _{k\mathrm{\prime }=k\mathrm{\prime }\mathrm{\prime }}^{Bk}\left(\begin{array}{c}Bk\\ k\mathrm{\prime }\end{array}\right)\left\{\begin{array}{c}k\mathrm{\prime }\\ k\mathrm{\prime }\mathrm{\prime }\end{array}\right\}{E}_{T-t-1}\left({\stackrel{˜}{\lambda }}_{{N}_{i,r+t},T-t}^{k\mathrm{\prime }\mathrm{\prime }}\right)\right]\text{\hspace{0.17em}}\\ & \phantom{\rule{2em}{0ex}}⩽{w}_{r,t+1}\sum _{k\mathrm{\prime }\mathrm{\prime }=0}^{{B}^{t+1}}\sum _{k=⌈k\mathrm{\prime }\mathrm{\prime }/B⌉}^{{B}^{t}}{f}_{t}\left(k\right){e}^{k\stackrel{˜}{\alpha }}g\left(k\mathrm{\prime }\mathrm{\prime },Bk\right){e}^{k\mathrm{\prime }\mathrm{\prime }\stackrel{˜}{\alpha }}{E}_{T-t-1}\left[{\left(1+{\stackrel{˜}{Y}}_{{N}_{i,r+t+1},T-t-1}\right)}^{Bk\mathrm{\prime }\mathrm{\prime }}\right]\\ & \phantom{\rule{2em}{0ex}}={w}_{r,t+1}\sum _{k\mathrm{\prime }\mathrm{\prime }=0}^{{B}^{t+1}}{f}_{t+1}\left(k\mathrm{\prime }\mathrm{\prime }\right){e}^{k\mathrm{\prime }\mathrm{\prime }\stackrel{ˉ}{\alpha }}{\left(1+{\stackrel{˜}{y}}_{{N}_{i,r+t+1},0}\right)}^{Bk\mathrm{\prime }\mathrm{\prime }}\end{array}$

#### Lemma 3.

Given Assumption A.1, for any finite constant $a,b\ge 0$ and $\theta \in \mathrm{\Theta },$ $E\left({{\lambda }_{i,t}^{\left(c\right)}}^{a}{{S}_{i,t-1}^{\left(c\mathrm{\prime }\right)}}^{b}\right)<\mathrm{\infty },\phantom{\rule{1em}{0ex}}\mathrm{\forall }c,c\mathrm{\prime }\in {C},i\in {{L}}_{n},t=1,\dots ,T.$

#### Proof.

By the definition of ${f}_{t}\left(k\right)$ given in Lemma 2, we know that ${f}_{t}\left(k\right)$ is bounded for all bounded $t$ under assumption A.1. Thus, Lemma 2 implies $\begin{array}{rl}E\left({{\lambda }_{i,t}^{\left(c\right)}}^{a}{{S}_{i,t-1}^{\left(c\mathrm{\prime }\right)}}^{b}\right)& =E\left[{\left(\sum _{j\in {N}_{i,1}}log\left(1+{Y}_{j,t-1}^{\left(c\mathrm{\prime }\right)}\right)\right)}^{b}{{\lambda }_{i,t}^{\left(c\right)}}^{a}\right]\\ & \le E\left[\left(1+{\stackrel{˜}{Y}}_{{N}_{i,1},t-1}{\right)}^{bB}{{\lambda }_{i,t}^{\left(c\right)}}^{a}\right]\le E\left[{e}^{a\stackrel{ˉ}{\alpha }}\left(1+{\stackrel{˜}{Y}}_{{N}_{i,1},t-1}{\right)}^{\left(a+b\right)B}\right]\\ & \le {e}^{a\stackrel{˜}{\alpha }}{w}_{1,t}\sum _{k=0}^{{B}^{t}}{f}_{t}\left(k\right){e}^{k\stackrel{˜}{\alpha }}{\left(1+{\stackrel{˜}{y}}_{{N}_{i,1+t},0}\right)}^{Bk}<\mathrm{\infty }.\end{array}$

For simplicity, define the distance between tile $i$ and $j$ as $d\left(i,j\right)=r$ if $r-1<\parallel i-j\parallel \le r.$

#### Lemma 4.

For any $i\in {{L}}_{n},{t}_{1}=1,\dots ,T,$$\text{Cov}\left({Y}_{i,{t}_{1}},{Y}_{j,{t}_{2}}\right)=0,\phantom{\rule{1em}{0ex}}\text{for\hspace{0.17em}}\mathrm{\forall }j\in {{L}}_{n},{t}_{2}=1,\dots ,T,\text{if\hspace{0.17em}}d\left(i,j\right)>{t}_{1}+{t}_{2}.$

and $|\left(j,{t}_{2}\right):\text{Cov}\left({Y}_{j,{t}_{2}},{Y}_{i,{t}_{1}}\right)\ne 0;j\in {{L}}_{n},{t}_{2}=1,\dots ,T,|\le T\left(8{T}^{2}+4T+1\right)$

#### Proof.

Let ${N}_{i,t}^{\ast }=\left\{j:\text{Cov}\left({Y}_{j,0},{Y}_{i,t}\right)\ne 0;j\in {L}\right\}$ be the collection of counts in tiles at time $0$ that are correlated with the count in tile $i$ at time $t$ (${Y}_{i,t}$). Due to the neighborhood structure in the autoregressive term described in Section 2, one can easily tell that ${N}_{i,t}^{\ast }$ is a neighbourhood around tile $i$, with the radius equal to $t$.

Due to the condition that ${Y}_{t}$ has constant entries at time $0$, we have $\text{Cov}\left({Y}_{i,{t}_{1}},{Y}_{j,{t}_{2}}\right)=0$ if ${N}_{i,{t}_{1}}^{\ast }\cap {N}_{j,{t}_{2}}^{\ast }=\mathrm{\varnothing },$ which is true when $d\left(i,j\right)>{t}_{1}+{t}_{2}.$

For any $\left(i,{t}_{1}\right)\in {D}_{n}$, $\left\{\left(j,{t}_{2}\right):{N}_{i,{t}_{1}}^{\ast }\cap {N}_{j,{t}_{2}}^{\ast }\ne \mathrm{\varnothing }\right\}$ is a neighborhood around tile $i$, with a radius ${t}_{1}+{t}_{2}$.

Since ${n}_{r}=2{r}^{2}+2r+1,$ we have $|\left(j,{t}_{2}\right):{N}_{i,{t}_{1}}^{\ast }\cap {N}_{j,{t}_{2}}^{\ast }\ne \mathrm{\varnothing }|\le T|j:{N}_{i,T}^{\ast }\cap {N}_{j,T}^{\ast }\ne \mathrm{\varnothing }|=T{N}_{2T}=T\left(8{T}^{2}+4T+1\right).$

In the second part of this section, we study the asymptotic properties of the estimator ${\stackrel{ˆ}{\theta }}_{n}$.

#### Proposition 1

(Existence and uniqueness) If assumption A.3 holds, then there exist unique maximizer of ${\mathrm{\ell }}_{n}\left(\mathbit{\theta }\right)$, denoted by ${\stackrel{ˆ}{\mathbit{\theta }}}_{n}$.

#### Proof.

First, since $\mathbf{\Theta }$ is compact and ${\mathrm{\ell }}_{n}\left(\mathbit{\theta }\right)$ is continuous, at least one maximiser of ${\mathrm{\ell }}_{n}\left(\mathbit{\theta }\right)$ exist. Next, we wish to prove that the maximiser is unique. The $p×p$ Hessian matrix of $-{\mathrm{\ell }}_{n}\left(\mathbit{\theta }\right)$ can be written as a block matrix ${\mathbit{H}}_{n}\left(\mathbit{\theta }\right)=-{\mathrm{\nabla }}^{2}{\mathrm{\ell }}_{n}\left(\mathbit{\theta }\right)=\left(\begin{array}{cccc}{\mathbit{H}}_{n}^{\left(1\right)}\left(\mathbit{\theta }\right)& \mathbf{0}& \cdots & \mathbf{0}\\ \mathbf{0}& {\mathbit{H}}_{n}^{\left(2\right)}\left(\mathbit{\theta }\right)& \cdots & \mathbf{0}\\ ⋮& ⋮& \ddots & ⋮\\ \mathbf{0}& \mathbf{0}& \cdots & {\mathbit{H}}_{n}^{\left({n}_{{C}}\right)}\left(\mathbit{\theta }\right)\end{array}\right),$

where ${\mathbit{H}}_{n}^{\left(c\right)}\left(\mathbit{\theta }\right)=\sum _{i\in {{L}}_{n}}\sum _{t=1}^{T}exp\left[{\mathbit{v}}_{i,t}^{\left(c\right)}\left(\mathbit{\theta }\right)\right]\left[\mathrm{\nabla }{\mathbit{v}}_{i,t}\right]{\left[\mathrm{\nabla }{\mathbit{v}}_{i,t}\right]}^{\mathrm{\top }}$ is a $\left({n}_{{C}}+1\right)×\left({n}_{{C}}+1\right)$ matrix. Matrix $\left[\mathrm{\nabla }{\mathbit{v}}_{i,t}\right]{\left[\mathrm{\nabla }{\mathbit{v}}_{i,t}\right]}^{\mathrm{\top }}$ is positive semidefinite with rank 1. By Assumption A.2, $\sum _{i\in {{L}}_{n}}\sum _{t=1}^{T}\left[\mathrm{\nabla }{\mathbit{v}}_{i,t}\right]{\left[\mathrm{\nabla }{\mathbit{v}}_{i,t}\right]}^{\mathrm{\top }}$ is full rank, which means ${\mathbit{H}}_{n}^{\left(c\right)}\left(\mathbit{\theta }\right)$ is positive definite for all $c\in {C}$ and $\mathbit{\theta }\in \mathbf{\Theta }$, since $exp\left[{\mathbit{v}}_{i,t}^{\left(c\right)}\left(\mathbit{\theta }\right)\right]>0.$ This shows that $-{\mathrm{\ell }}_{n}\left(\mathbit{\theta }\right)$ is strictly convex, which implies ${\stackrel{ˆ}{\mathbit{\theta }}}_{n}$ is unique.

#### Proposition 2

[Consistency] If the regularity assumption A.1 holds, then ${\stackrel{ˆ}{\mathbit{\theta }}}_{n}\stackrel{p}{\to }{\mathbit{\theta }}_{0}$ with probability tending 1, as ${n}_{{L}}\to \mathrm{\infty }$.

#### Proof.

We proceed by verifying the conditions of Theorem 2 in [31]. First we show that the score functions are ${{L}}_{p}$-Uniform Integrable for $p<3$, i.e. $\underset{n\to \mathrm{\infty }}{lim}\underset{\begin{array}{c}i\in {{L}}_{n}\\ t=1,\dots ,T\end{array}}{sup}\underset{\mathbit{\theta }\in \mathbf{\Theta }}{sup}E\left[{\mathbit{u}}_{i,t}^{p}\left(\mathbit{\theta }\right)I\left({\mathbit{u}}_{i,t}\left(\mathbit{\theta }\right)>k\right)\right]\to \mathbf{0},\phantom{\rule{1em}{0ex}}\text{as\hspace{0.17em}}k\to \mathrm{\infty }.$(10)

The general form of each entry of ${\mathbit{u}}_{i,t}\left(\mathbit{\theta }\right)$ is $\left({\lambda }_{i,t}^{\left(c\right)}-{y}_{i,t}^{\left(c\right)}\right){S}_{i,t-1}^{c\mathrm{\prime }}$, take $p=3$, we have $\begin{array}{c}E\left[\left(\left({\lambda }_{i,t}^{\left(c\right)}-{y}_{i,t}^{\left(c\right)}\right){S}_{i,t-1}^{c\mathrm{\prime }}{\right)}^{3}\right]\\ ={E}_{1}\dots {E}_{t-2}{E}_{t-1}\left[{E}_{t}\left[{\left(\left({\lambda }_{i,t}^{\left(c\right)}-{Y}_{i,t}^{\left(c\right)}\right){S}_{i,t-1}^{c\mathrm{\prime }}\right)}^{3}|{\mathbf{Y}}_{t-1}\right]|{\mathbf{Y}}_{t-2}\right]\dots \\ ={E}_{1}\dots {E}_{t-2}{E}_{t-1}\left[S{{i,t-1}_{c\mathrm{\prime }}^{}}^{3}\left[\lambda {{i,t}_{\left(c\right)}^{}}^{3}-3\lambda {{i,t}_{\left(c\right)}^{}}^{2}{E}_{t}\left[{Y}_{i,t}^{\left(c\right)}|{\mathbf{Y}}_{t-1}\right]+3{\lambda }_{i,t}^{\left(c\right)}{E}_{t}\left[Y{{i,t}_{\left(c\right)}^{}}^{2}|{\mathbf{Y}}_{t-1}\right]+{E}_{t}\left[Y{{i,t}_{\left(c\right)}^{}}^{3}|{\mathbf{Y}}_{t-1}\right]\right]|{\mathbf{Y}}_{t-2}\right]\dots \\ ={E}_{1}\dots {E}_{t-2}{E}_{t-1}\left[S{{i,t-1}_{c\mathrm{\prime }}^{}}^{3}\left(2\lambda {{i,t}_{\left(c\right)}^{}}^{3}+6\lambda {{i,t}_{\left(c\right)}^{}}^{2}+{\lambda }_{i,t}^{\left(c\right)}\right)|{\mathbf{Y}}_{t-2}\right]\dots ,\end{array}$(a)

which is finite by lemma 3. This gives us the ${{L}}_{3}-$boundedness of ${\mathbit{u}}_{i,t}\left(\mathbit{\theta }\right)$, i.e. $\underset{n\to \mathrm{\infty }}{lim}\underset{\begin{array}{c}i\in {{L}}_{n}\\ t=1,\dots ,T\end{array}}{sup}\underset{\mathbit{\theta }\in \mathbf{\Theta }}{sup}E\left[{{\mathbit{u}}_{i,t}^{\left(c\right)}\left(\mathbit{\theta }\right)}^{3}\right]<\mathrm{\infty },$

which implies ${{L}}_{p}$-Uniform Integrability, for $p<3$.

Second, we show the stochastic equicontinuity of ${\mathbit{u}}_{i,t}\left(y;\mathbit{\theta }\right)$, i.e. $\underset{n\to \mathrm{\infty }}{lim}\underset{\begin{array}{c}i\in {{L}}_{n}\\ t=1,\dots ,T\end{array}}{sup}P\left(\underset{\begin{array}{c}\mathbit{\theta },\mathbit{\theta }\mathrm{\prime }\in \mathbf{\Theta }\\ \parallel \mathbit{\theta }-\mathbit{\theta }\mathrm{\prime }\parallel <\delta \end{array}}{sup}|{\mathbit{u}}_{i,t}\left(\mathbit{\theta }\right)-{\mathbit{u}}_{i,t}\left(\mathbit{\theta }\mathrm{\prime }\right)|>ϵ\right)=\mathbf{0}.$

The $\mathrm{\nabla }{\mathbit{u}}_{i,t}\left(\mathbit{\theta }\right)$ is a $p×p$ matrix, with each column being either $\frac{\mathrm{\partial }{\mathbit{\gamma }}_{i,t}\left(\mathbit{\theta }\right)}{\mathrm{\partial }{\beta }^{\left(c|c\mathrm{\prime }\right)}}\otimes \mathrm{\nabla }{\mathbit{v}}_{i,t}$ or $\frac{\mathrm{\partial }{\mathbit{\gamma }}_{i,t}\left(\mathbit{\theta }\right)}{\mathrm{\partial }{\alpha }^{\left(c\right)}}\otimes \mathrm{\nabla }{\mathbit{v}}_{i,t}$, and $\frac{\mathrm{\partial }{\mathbit{\gamma }}_{i,t}\left(\mathbit{\theta }\right)}{\mathrm{\partial }{\beta }^{\left(c|c\mathrm{\prime }\right)}}=\left(0,\dots ,0,{\lambda }_{i,t}^{\left(c\right)}{S}_{i,t}^{\left(c\right)},0,\dots \right),\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}\frac{\mathrm{\partial }{\mathbit{\gamma }}_{i,t}\left(\mathbit{\theta }\right)}{\mathrm{\partial }{\alpha }^{\left(c\right)}}=\left(0,\dots ,0,{\lambda }_{i,t}^{\left(c\right)},0,\dots \right).$

Thus, the non-zero entries of $E\underset{\mathbit{\theta }\in \mathbf{\Theta }}{sup}\left[\mathrm{\nabla }{\mathbit{u}}_{i,t}\left(\mathbit{\theta }\right)\right]$ have the general form: $E\underset{\mathbit{\theta }\in \mathbf{\Theta }}{sup}\left[{\lambda }_{i,t}^{\left(c\right)}{S}_{i,t}^{\left(c\right)}{S}_{i,t}^{\left(c\mathrm{\prime }\right)}\right]$, which are bounded by an equivalent analogous to Lemma 3.

Thirdly, we check $\alpha -$mixing conditions. Let $U$ and $V$ be two subsets of ${D}_{n}$, and let $\sigma \left(U\right)=\sigma \left\{{Y}_{i,t};\left(i,t\right)\in U\right\}$ be the $\sigma -$algebra generated by random variables ${Y}_{i,t},\left(i,t\right)\in U$.

Define $\alpha \left(U,V\right)=sup\left\{|P\left(A\cap B\right)-P\left(A\right)P\left(B\right)|;A\in \sigma \left(U\right),B\in \sigma \left(V\right)\right\}.$

Then the $\alpha -$mixing coefficient for the random field $\left\{{Y}_{i,t},i\in {{L}}_{n},t=1,\dots ,T\right\}$ is defined as $\alpha \left(k,l,m\right)=sup\left\{\alpha \left(U,V\right),|U|\le k,|V|\le l,d\left(U,V\right)\ge m\right\}.$

Following Bai et al. [40], in an $a-$dimensional space, we need (a) $\mathrm{\exists }\delta >0s.t.\sum _{m=1}^{\mathrm{\infty }}{m}^{a-1}\alpha \left(1,1,m{\right)}^{\delta /\left(2+\delta \right)}<\mathrm{\infty },$ (b) For $k+l\le 4,\sum _{m=1}^{\mathrm{\infty }}{m}^{a-1}\alpha \left(k,l,m\right)<\mathrm{\infty },$ (c) $\mathrm{\exists }ϵ>0\phantom{\rule{1em}{0ex}}s.t.\phantom{\rule{thickmathspace}{0ex}}\alpha \left(1,\mathrm{\infty },m\right)={O}\left({m}^{-a-ϵ}\right),$ where $k,l,m\in \mathbb{N}$ and $d\left(U,V\right)=min\left\{\parallel i-j\parallel :i\in U,j\in V\right\}$ is the distance between sets $U$ and $V$.

For any fixed ${i}_{1},\dots ,{i}_{k}\in {{L}}_{n},k<\mathrm{\infty }$ and ${t}_{1}=0,\dots ,T$,

consider $U=\left\{{Y}_{i,{t}_{1}}={y}_{i,{t}_{1}},\dots ,{Y}_{{i}_{k},{t}_{1}}={y}_{{i}_{k},{t}_{1}}\right\}$ and $V=\left\{{Y}_{j,{t}_{2}}={y}_{j,{t}_{2}};j\in {{L}}_{n},{t}_{2}=0,\dots ,T\right\}$, then $|U|=k$ and $|V|\to \mathrm{\infty }$ as $n\to \mathrm{\infty }.$ By Lemma 4, we have $P\left({Y}_{i,{t}_{1}}={y}_{i,{t}_{1}},{Y}_{j,{t}_{2}}={y}_{i,{t}_{1}}\right)-P\left({Y}_{i,{t}_{1}}={y}_{i,{t}_{1}}\right)P\left({Y}_{j,{t}_{2}}={y}_{j,{t}_{2}}\right)=0,$ if $d\left(i,j\right)>{t}_{1}+{t}_{2}$. Thus, $\alpha \left(U,V\right)=0$ for any $|U|=k$, provided that $d\left(U,V\right)>2T$, that is, $\alpha \left(k,\mathrm{\infty },m\right)=0$ if $m>2T$.

This implies all three mixing conditions.

Finally, by Theorem 3 in Jenish and Prucha [31], Uniform Integrability in eq. (10) and mixing condition (a) ensure that the score functions ${\mathbit{u}}_{i,t}\left(\mathbit{y};\mathbit{\theta }\right)$ satisfy a point wise law of large numbers in the sense that $\frac{1}{{n}_{{L}}}\sum _{i\in {{L}}_{n}}\sum _{t=1}^{T}\underset{\mathbit{\theta }\in \mathbf{\Theta }}{sup}\left({\mathbit{u}}_{i,t}\left(\mathbit{y},\mathbit{\theta }\right)-E{\mathbit{u}}_{i,t}\left(\mathbit{y};\mathbit{\theta }\right)\right)\stackrel{p}{\to }\mathbf{0},\text{as\hspace{0.17em}}{n}_{{L}}\to \mathrm{\infty },$

for all $\mathbit{\theta }\in \mathbf{\Theta }.$

#### Proposition 3.

If the regularity assumptions A.1 and A.2 hold, we have $\sqrt{{n}_{{L}}}{\mathbit{V}}_{n}\left(\mathbit{\theta }{\right)}^{-1/2}\left({\stackrel{ˆ}{\mathbit{\theta }}}_{n}-{\mathbit{\theta }}_{0}\right)$ converges in distribution to a $p-$variate Normal with zero mean vector and identity variance, as ${n}_{{L}}\to \mathrm{\infty }$.

#### Proof.

First, we show the uniform law of large numbers for $\mathrm{\nabla }{\mathbit{u}}_{n}\left(\mathbit{\theta }\right)$: $\underset{\mathbit{\theta }}{sup}∥\mathrm{\nabla }{\mathbit{u}}_{n}\left(\mathbit{\theta }\right)-E\left[\mathrm{\nabla }{\mathbit{u}}_{n}\left(\mathbit{\theta }\right)\right]∥\stackrel{p}{\to }\mathbf{0},\phantom{\rule{1em}{0ex}}\text{as\hspace{0.17em}}{n}_{{L}}\to \mathrm{\infty },$(11)

where ${\mathbit{u}}_{n}\left(\mathbit{\theta }\right)=\mathrm{\nabla }{\mathrm{\ell }}_{n}\left(\mathbit{\theta }\right)/{n}_{{L}}$ as defined in Section 2. Note that $\begin{array}{rl}\text{Var}\left(\mathrm{\nabla }{\mathbit{u}}_{n}\left(\mathbit{\theta }\right)\right)& =\frac{1}{{n}_{{L}}^{2}}\text{Var}\left(\sum _{i=1}^{n}\sum _{t=1}^{T}\mathrm{\nabla }{\mathbit{u}}_{i,t}\left(\mathbit{\theta }\right)\right)\\ & =\frac{1}{{n}_{{L}}^{2}}\sum _{i\in {{L}}_{n}}\sum _{t=1}^{T}\text{Var}\left(\mathrm{\nabla }{\mathbit{u}}_{i,t}\left(\mathbit{\theta }\right)\right)\\ & +\frac{1}{{n}_{{L}}^{2}}\sum _{i\in {{L}}_{n}}\sum _{{t}_{1}=1}^{T}\sum _{\begin{array}{c}j\in {{L}}_{n}j\ne i\end{array}}\sum _{\begin{array}{c}{t}_{2}=1{t}_{2}\ne {t}_{1}\end{array}}^{T}\text{Cov}\left(\mathrm{\nabla }{\mathbit{u}}_{i,{t}_{1}}\left(\mathbit{\theta }\right),\mathrm{\nabla }{\mathbit{u}}_{j,{t}_{2}}\left(\mathbit{\theta }\right)\right)\end{array}$(12)

The first term in eq. (12) is ${O}\left({n}_{{L}}^{-1}\right)$, since $\text{Var}\left(\mathrm{\nabla }{\mathbit{u}}_{i,t}\left(\mathbit{\theta }\right)\right)\le {\left[E\left(\mathrm{\nabla }{\mathbit{u}}_{i,t}\left(\mathbit{\theta }\right)\right)\right]}^{2}$, which is shown to be finite in the proof of Proposition 2.

For the second term in eq. (12), by Lemma 2 we have $\begin{array}{rl}& \frac{1}{{n}_{{L}}^{2}}\sum _{i\in {{L}}_{n}}\sum _{{t}_{1}=1}^{T}\sum _{\begin{array}{c}j\in {{L}}_{n}j\ne i\end{array}}\sum _{\begin{array}{c}{t}_{2}=1{t}_{2}\ne {t}_{1}\end{array}}^{T}\text{Cov}\left(\mathrm{\nabla }{\mathbit{u}}_{i,{t}_{1}}\left(\mathbit{\theta }\right),\mathrm{\nabla }{\mathbit{u}}_{j,{t}_{2}}\left(\mathbit{\theta }\right)\right)\\ & \le \frac{1}{{n}_{{L}}^{2}}\sum _{i\in {{L}}_{n}}\sum _{{t}_{1}=1}^{T}T\left(8{T}^{2}+4T+1\right)\underset{\begin{array}{c}j:d\left(i,j\right)\le 2T{t}_{2}\ne {t}_{1}\end{array}}{max}\text{Cov}\left(\mathrm{\nabla }{\mathbit{u}}_{i,{t}_{1}}\left(\mathbit{\theta }\right),\mathrm{\nabla }{\mathbit{u}}_{j,{t}_{2}}\left(\mathbit{\theta }\right)\right),\end{array}$

where $\text{Cov}\left(\mathrm{\nabla }{\mathbit{u}}_{i,{t}_{1}}\left(\mathbit{\theta }\right),\mathrm{\nabla }{\mathbit{u}}_{j,{t}_{2}}\left(\mathbit{\theta }\right)\right)\le E\left(\mathrm{\nabla }{\mathbit{u}}_{i,{t}_{1}}\left(\mathbit{\theta }\right),\mathrm{\nabla }{\mathbit{u}}_{j,{t}_{2}}\left(\mathbit{\theta }\right)\right)\le E{\left(\mathrm{\nabla }{\mathbit{u}}_{i,{t}_{1}}\left(\mathbit{\theta }\right)\right)}^{2}+E{\left(\mathrm{\nabla }{\mathbit{u}}_{i,{t}_{2}}\left(\mathbit{\theta }\right)\right)}^{2}$ is finite by Lemma 2. Thus, the second term in eq. (12) is also of order ${O}\left({n}_{{L}}^{-1}\right)$ element wise, which means $\text{Var}\left(\mathrm{\nabla }{\mathbit{u}}_{n}\left(\mathbit{\theta }\right)\right)\to \mathbf{\text{0}}$ as $n\to \mathrm{\infty }.$ Therefore, eq. (11) follows by Chebyshev’s inequality.

Second, ${\mathbit{V}}_{n}\left(\mathbit{\theta }\right)=1/{n}_{{L}}\text{Var}\left(\sum _{i\in {{L}}_{n}}\sum _{t=1}^{T}{\mathbit{u}}_{it}\left(\mathbit{\theta }\right)\right)=-1/{n}_{{L}}E\left(\sum _{i\in {{L}}_{n}}\sum _{t=1}^{T}{\mathbit{u}}_{it}\left(\mathbit{\theta }\right)\right)=1/{n}_{{L}}{\mathbit{H}}_{n}\left(\mathbit{\theta }\right)$, which is shown to be positive definite under Assumption A.2 in Proposition 1. Thus, together with uniform Integrability in eq. (10) and the mixing conditions, by Theorem 1 in [31], we have $\sqrt{{n}_{{L}}}{\mathbit{V}}_{n}\left(\mathbit{\theta }{\right)}^{-1/2}{\mathbit{u}}_{n}\left(\mathbit{\theta }\right)\to N\left(\mathbf{0},{\mathbit{I}}_{p}\right)$(13)

Finally, by Taylor’s expansion, $\begin{array}{c}{\mathbf{u}}_{n}\left({\stackrel{ˆ}{\theta }}_{n}\right)=\mathbf{0}={\mathbf{u}}_{n}\left({\theta }_{0}\right)+\mathrm{\nabla }{\mathbf{u}}_{n}\left({\theta }_{0}\right)\left({\stackrel{ˆ}{\theta }}_{n}-{\theta }_{0}\right)+\frac{1}{2}{\mathrm{\nabla }}^{2}{\mathbf{u}}_{n}\left({\theta }_{0}\right){\left({\stackrel{˜}{\theta }}_{n}-{\theta }_{0}\right)}^{2}\\ ⇒\mathbf{0}=\sqrt{{n}_{{L}}}{\mathbf{V}}_{n}{\left({\theta }_{0}\right)}^{-1/2}{\mathbf{u}}_{n}\left({\theta }_{0}\right)+\sqrt{{n}_{{L}}}{\mathbf{V}}_{n}{\left(\theta \right)}^{-1/2}\mathrm{\nabla }{\mathbf{u}}_{n}\left({\theta }_{0}\right)\left({\stackrel{ˆ}{\theta }}_{n}-{\theta }_{0}\right)+\\ \frac{1}{2}\sqrt{{n}_{{L}}}{\mathbf{V}}_{n}{\left({\theta }_{0}\right)}^{-1/2}{\mathrm{\nabla }}^{2}{\mathbf{u}}_{n}\left({\stackrel{˜}{\theta }}_{n}\right){\left({\stackrel{ˆ}{\theta }}_{n}-{\theta }_{0}\right)}^{2},\end{array}$(14)

where ${\stackrel{˜}{\mathbit{\theta }}}_{n}$ is a vector with elements between ${\stackrel{ˆ}{\mathbit{\theta }}}_{n}$ and ${\mathbit{\theta }}_{0}.$ Since ${\stackrel{ˆ}{\mathbit{\theta }}}_{n}={\mathbit{\theta }}_{0}+{o}_{p}\left(\mathbf{1}\right)$ by Proposition 2, we have $\left({\stackrel{˜}{\mathbit{\theta }}}_{n}-{\mathbit{\theta }}_{0}{\right)}^{2}=\left({\stackrel{ˆ}{\mathbit{\theta }}}_{n}-{\mathbit{\theta }}_{0}\right){o}_{p}\left(\mathbf{1}\right).$ The second derivative ${\mathrm{\nabla }}^{2}{\mathbit{u}}_{n}\left(\mathbit{\theta }\right)$ is a $p×p×p$ matrix, with entries being either $0$ or ${\lambda }_{it}^{\left(c\right)}{S}_{i,t-1}^{\left({c}_{1}\right)}{S}_{i,t-1}^{\left({c}_{2}\right)}{S}_{i,t-1}^{\left({c}_{3}\right)}$, where $i=1,\dots ,n,$ and $t=1,\dots ,T,$ and $c,{c}_{1},{c}_{2},{c}_{3}\in {C}.$ Due to the structure of ${\lambda }_{it}^{\left(c\right)}$ and ${S}_{i,t-1}^{\left(c\right)}$ in Section 2, all non-zero elements in ${\mathrm{\nabla }}^{2}{\mathbit{u}}_{n}\left(\mathbit{\theta }\right)$ are monotone with respect to $\mathbit{\theta }.$ Thus, there exists ${\mathbit{\theta }}_{s}\in \mathbf{\Theta }$ such that ${\mathrm{\nabla }}^{2}{\mathbit{u}}_{n}\left({\mathbit{\theta }}_{s}\right)\ge {\mathrm{\nabla }}^{2}{\mathbit{u}}_{n}\left(\mathbit{\theta }\right)$ for all $\mathbit{\theta }\in \mathbf{\Theta }.$ Therefore, we have $E\underset{\mathbit{\theta }\in \mathbf{\Theta }}{sup}{\mathrm{\nabla }}^{2}{\mathbit{u}}_{n}\left(\mathbit{\theta }\right)=\underset{\mathbit{\theta }\in \mathbf{\Theta }}{sup}E{\mathrm{\nabla }}^{2}{\mathbit{u}}_{n}\left(\mathbit{\theta }\right)$,

which can be shown to be finite by an equivalent analogous to Lemma 3.

Thus, eq. (13) can be written as $\begin{array}{rl}\mathbf{0}=& \sqrt{{n}_{{L}}}{\mathbit{V}}_{n}\left({\mathbit{\theta }}_{0}{\right)}^{-1/2}{\mathbit{u}}_{n}\left({\mathbit{\theta }}_{0}\right)+\sqrt{{n}_{{L}}}{\mathbit{V}}_{n}\left(\mathbit{\theta }{\right)}^{-1/2}\left(\mathrm{\nabla }{\mathbit{u}}_{n}\left({\mathbit{\theta }}_{0}\right)+{o}_{p}\left(\mathbf{1}\right)\right)\left({\stackrel{ˆ}{\mathbit{\theta }}}_{n}-{\mathbit{\theta }}_{0}\right),\end{array}$

By eq. (11), $\mathrm{\nabla }{\mathbit{u}}_{n}\left({\mathbit{\theta }}_{0}\right)\stackrel{p}{\to }E\left[\mathrm{\nabla }{\mathbit{u}}_{n}\left({\mathbit{\theta }}_{0}\right)\right]=-{\mathbit{V}}_{n}\left({\mathbit{\theta }}_{0}\right)$, since ${\mathrm{\ell }}_{n}\left(\mathbit{\theta }\right)$ is the full likelihood. Therefore, by eqs. (12) and (13), we have $\sqrt{{n}_{{L}}}{\mathbit{V}}_{n}\left({\mathbit{\theta }}_{0}{\right)}^{1/2}\left({\stackrel{ˆ}{\mathbit{\theta }}}_{n}-{\mathbit{\theta }}_{0}\right)\stackrel{d}{\to }N\left(\mathbf{0},{\mathbit{I}}_{p}\right).$

## References

• [1]

Medema JP, Vermeulen L. Microenvironmental regulation of stem cells in intestinal homeostasis and cancer. Nature. 2011;474:318–326.

• [2]

Besag J. Spatial interaction and the statistical analysis of lattice systems. J Royal Stat Soci Series B Methodol. 1974;192–236. Google Scholar

• [3]

Waller LA, Carlin BP, Xia H, Gelfand AE. Hierarchical spatio-temporal mapping of disease rates. J Am Stat Assoc. 1997;92:607–617.

• [4]

Knorr-Held L. Bayesian modelling of inseparable space-time variation in disease risk. 1999. Google Scholar

• [5]

Quick H, Waller LA, Casper M. A multivariate space–time model for analysing county level heart disease death rates by race and sex. J R Stat Soc: Ser C. Appl Stat. 2017.

• [6]

Sans Ó, Schmidt AM, Nobre AA, et al. Bayesian spatio-temporal models based on discrete convolutions. Can J Stat. 2008;36:239–258.

• [7]

Quick H, Waller LA, Casper M. Hierarchical multivariate space-time methods for modeling counts with an application to stroke mortality data. arXiv preprint arXiv:1602.04528. 2016. Google Scholar

• [8]

Cressie N, Wikle CK. Statistics for spatio-temporal data. John Wiley & Sons, 2011. Google Scholar

• [9]

Cox DR, Gudmundsson G, Lindgren G, Bondesson L, Harsaae E, Laake P, Juselius K, Lauritzen SL. Statistical analysis of time series: Some recent developments [with discussion and reply]. Scand J Stat. 1981;93–115. Google Scholar

• [10]

Bradley JR, Holan SH, Wikle CK. Multivariate spatio-temporal models for high-dimensional areal data with application to longitudinal employer-household dynamics. Ann Appl Stat. 2015;9:1761–1791.

• [11]

Bradley JR, Holan SH, Wikle CK. Multivariate spatio-temporal survey fusion with application to the american community survey and local area unemployment statistics. Stat. 2016;5:224–233.

• [12]

Shaddick G, Wakefield J. Modelling daily multivariate pollutant data at multiple sites. J R Stat Soc: Ser C. Appl Stat. 2002;51:351–372.

• [13]

Wikle CK, Berliner LM, Cressie N. Hierarchical bayesian space-time models. Environ Ecol Stat. 1998;5:117–154.

• [14]

Holan S, Wikle C. Hierarchical dynamic generalized linear mixed models for discrete-valued spatio-temporal data. Handbook of Discrete–Valued Time Series, 2015. Google Scholar

• [15]

Mugglin AS, Cressie N, Gemmell I. Hierarchical statistical modelling of influenza epidemic dynamics in space and time. Stat Med. 2002;21:2703–2721.

• [16]

Bradley JR, Holan SH, Wikle CK. Computationally efficient multivariate spatio-temporal models for high-dimensional count-valued data. Bayesian Anal. 2017.

• [17]

Davis RA, Dunsmuir WT, Streett SB. Observation-driven models for poisson counts. Biometrika. 2003;90:777–790.

• [18]

Schrödle B, Held L, Rue H. Assessing the impact of a movement network on the spatiotemporal spread of infectious diseases. Biometrics. 2012;68:736–744.

• [19]

Paul M, Held L, Toschke AM. Multivariate modelling of infectious disease surveillance data. Stat Med. 2008;27:6250–6267.

• [20]

Zeger SL, Qaqish B. Markov regression models for time series: a quasi-likelihood approach. Biometrics. 1988;1019–1031.

• [21]

Fokianos K, Tjøstheim D. Log-linear poisson autoregression. J Multivariate Anal. 2011;102:563–578.

• [22]

Fokianos K, Rahbek A, Tjøstheim D. Poisson autoregression. J Am Stat Assoc. 2009;104:1430–1439.

• [23]

Dunsmuir WT, Scott DJ, et al. The glarma package for observation driven time series regression of counts. J Stat Softw. 2015;67:1–36. Google Scholar

• [24]

Kedem B, Fokianos K. Regression models for time series analysis, vol. 488. John Wiley & Sons, 2005.

• [25]

Held L, Höhle M, Hofmann M. A statistical framework for the analysis of multivariate infectious disease surveillance counts. Stat Modell. 2005;5:187–199.

• [26]

Paul M, Held L. Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Stat Med. 2011;30:1118–1136.

• [27]

Knorr-Held L, Richardson S. A hierarchical model for space–time surveillance data on meningococcal disease incidence. J R Stat Soc: Ser C. Appl Stat. 2003;52:169–183.

• [28]

Wikle CK, Anderson CJ. Climatological analysis of tornado report counts using a hierarchical bayesian spatiotemporal model. J Geophys Res Atmos. 2003;108. Google Scholar

• [29]

Ferrari D, Vecchia. On robust estimation via pseudo-additive information. Biometrika. 2011;99:238–244. Google Scholar

• [30]

La Vecchia D, Camponovo L, Ferrari D. Robust heart rate variability analysis by generalized entropy minimization. Comput Stat Data Anal. 2015;82:137–151.

• [31]

Jenish N, Prucha IR. Central limit theorems and uniform laws of large numbers for arrays of random fields. J Econom. 2009;150:86–98.

• [32]

Leroux BG, Lei X, Breslow N. Estimation of disease rates in small areas: a new mixed model for spatial dependence. In: Statistical models in epidemiology, the environment clinical trials, 179–191. Springer, 2000. Google Scholar

• [33]

Lee D. Carbayes: An r package for bayesian spatial modeling with conditional autoregressive priors. J Stat Softw. 2013;55:1–24. Google Scholar

• [34]

Kalluri R, Zeisberg M. Fibroblasts in cancer. Nat Rev Cancer. 2006;6:392–401.

• [35]

Tabassum DP, Polyak K. Tumorigenesis: it takes a village. Nat Rev Cancer. 2015;15:473–483.

• [36]

Koenker R. Quantile regression. No. 38, Cambridge university press, 2005. Google Scholar

• [37]

Garden GA, La Spada AR. Intercellular (mis) communication in neurodegenerative disease. Neuron. 2012;73:886–901.

• [38]

Leoni G, Neumann P, Sumagin R, et al. Wound repair: role of immune–epithelial interactions. Mucosal Immunol. 2015;8:959–968.

• [39]

Bradic J, Fan J, Wang W. Penalized composite quasi-likelihood for ultrahigh dimensional variable selection. J R Stat Soc Series B Stat Methodol. 2011;73:325–349.

• [40]

Bai Y, Song PX, Raghunathan T. Joint composite estimating functions in spatiotemporal models. J R Stat Soc Series B Stat Methodol. 2012;74:799–824.

Revised: 2018-05-08

Accepted: 2018-06-19

Published Online: 2018-07-07

This article was supported by the Australian National Health and Medical Research Council (1049561, 1064987 and 1069024)

Citation Information: The International Journal of Biostatistics, Volume 14, Issue 2, 20180008, ISSN (Online) 1557-4679,

Export Citation

© 2018 Walter de Gruyter GmbH, Berlin/Boston.