**Almost-sure**.An event is almost-sure if its probality equals 1.

**Example**.If $O\sim N(0,1)$ then the event $O\ne 0$ is almost-sure. Yet, by symmetry, the mass of the standard Gaussian law $N(0,1)$ concentrates around 0: whatever is the length $\mathrm{\ell}>0$, the interval which has the largest probability to contain *O* is the interval $[-\mathrm{\ell}/2,\mathrm{\ell}/2]$ centered at 0 with radius $\mathrm{\ell}/2$.

**Example**.The estimator ${\mathrm{\vartheta}}_{n}$ of $\mathrm{\vartheta}(P)$ is strongly consistent if the event ${lim}_{n\to \mathrm{\infty}}|{\mathrm{\vartheta}}_{n}-\mathrm{\vartheta}(P)|=0$ is almost-sure for *P*.

**Bernoulli law**.The random variable *A* is drawn from the Bernoulli law with parameter $p\in [0,1]$ if *A* can only take the values 0 and 1, in such a way that $A=1$ with probability *p* (and, therefore, $A=0$ with probability $1-p$).

**Central limit theorem**.A central limit theorem is a theorem providing assumptions which guarantee that sums of a large number of random variables behave like a Gaussian random variable. Typically, if ${O}_{1},\dots ,{O}_{n}$ are real-valued and independent random variables such that $P\{{O}_{i}\}=0$ for each $i\le n$ and ${\sum}_{i=1}^{n}P\{{O}_{i}^{2}\}=1$, and if moreover no ${O}_{i}$ contributes too heavily to the sum, then ${\sum}_{i=1}^{n}{O}_{i}$ approximately follows the standard Gaussian law $N(0,1)$.

**Conditional independence**.Consider a collection $\{{O}_{i}:i\in I\}$ of random variables indexed by *I*. Let ${I}_{1},{I}_{2},{I}_{3}\subset I$ be subsets of *I*. We say that ${\mathcal{O}}_{1}=\{{O}_{i}:i\in {I}_{1}\}$ is conditionally independent from ${\mathcal{O}}_{2}=\{{O}_{i}:i\in {I}_{2}\}$ given ${\mathcal{O}}_{3}=\{{O}_{i}:i\in {I}_{3}\}$ if the joint conditional law of $({\mathcal{O}}_{1},{\mathcal{O}}_{2})$ given ${\mathcal{O}}_{3}$ is the product of the two conditional laws of ${\mathcal{O}}_{1}$ and ${\mathcal{O}}_{2}$ given ${\mathcal{O}}_{3}$. If ${I}_{3}=\mathrm{\varnothing}$ is empty, so that ${\mathcal{O}}_{3}=\mathrm{\varnothing}$ too, then conditional independence coincides with independence, which does not hold in general.

**Conditional law**.Consider a random variable $O\sim P$ which decomposes as $O=(W,Y)$. The conditional law of *Y* given *W* is the law of the random variable *Y* when the realization of *W* is given (known).

**Example**.$W\in [0,1]$ and *Y* is drawn from the Bernoulli law with parameter $1/3$ if $W\le 1/2$ and $3/5$ if $W>1/2$.

**Conditionally on**.See “conditional law” and “conditional independence”.

**Confidence interval**.A confidence interval for $\mathrm{\vartheta}(P)$ with level $(1-\mathrm{\alpha})\in [0,1]$ is a random interval, whose construction is based on *n* observations drawn from the law *P*, in order to contain $\mathrm{\vartheta}(P)$ with probability at least $(1-\mathrm{\alpha})$. At a fixed level $(1-\mathrm{\alpha})$, *(i)* the better of two confidence intervals is the narrower, and *(ii)* the larger the number *n* of observations used to build a confidence interval the narrower it is. When the level increases, the resulting confidence interval gets wider. Confidence intervals are often built by using an estimator ${\mathrm{\vartheta}}_{n}$ of $\mathrm{\vartheta}(P)$ as a pivot, *i.e*. under the form $[{\mathrm{\vartheta}}_{n}-{c}_{n},{\mathrm{\vartheta}}_{n}+{c}_{n}]$ for a well-chosen, possibly random, half-length ${c}_{n}$.

**Confounding**.The relationship between two variables is subject to confounding, or confounded, whenever their probabilistic dependence, possibly conditioned on a third variable, cannot be interpreted causally.

**Consistent (estimator)**.Consistency is an asymptotic notion: an estimator ${\mathrm{\vartheta}}_{n}$ of $\mathrm{\vartheta}(P)$ is consistent if it converges in some sense to $\mathrm{\vartheta}(P)$ when the number *n* of observations upon which its construction relies goes to infinity. The estimator is weakly consistent if, for every fixed error $\epsilon >0$, the probability that ${\mathrm{\vartheta}}_{n}$ be at least $\epsilon $-away from $\mathrm{\vartheta}(P)$ goes to 0 when *n* goes to infinity: ${lim}_{n\to \mathrm{\infty}}P\{|{\mathrm{\vartheta}}_{n}-\mathrm{\vartheta}(P)|\ge \epsilon \}=0$. It is strongly consistent if ${\mathrm{\vartheta}}_{n}$ converges to $\mathrm{\vartheta}(P)$ almost-surely: $P\{{lim}_{n\to \mathrm{\infty}}|{\mathrm{\vartheta}}_{n}-\mathrm{\vartheta}(P)|=0\}=1$. Strong consistency implies weak consistency, but the reverse is not true.

**Contingency table**.A contingency table, term coined by K. Pearson in 1904, is a two (or more)-entry table where one reports the frequencies associated with two (or more) categorical variables of interest. The origin of contingency tables goes back to the research conducted by P.C.A. Louis to demonstrate the therapeutic inefficacy of bloodletting [67].

**Example**.Consider ${O}_{1},\dots ,{O}_{n}$ $n=50$ variables such that each ${O}_{i}$ contains $({A}_{i},{Y}_{i})\in \{0,1{\}}^{2}$. The following contingency table

$\begin{array}{ccc}& Y=1& Y=0\\ A=1& 18& 12\\ A=0& 7& 13\\ & & \end{array}$

teaches us that, among these *n* observations, 18 (respectively, 12, 7, and 13) feature a couple $({A}_{i},{Y}_{i})$ equal to $(1,1)$ (respectively, $(1,0)$, $(0,1)$, and $(0,0)$).

**Correlation coefficient**.The correlation coefficient of two real-valued random variable is a measure of their probabilistic dependence on a linear scale. If *X* and *Y* are independent then their correlation coefficient equals 0. The reverse is not true.

**Empirical measure**.Given *n* observations ${O}_{1},\dots ,{O}_{n}$, the empirical measure is the law ${P}_{n}$ such that, if $O\sim {P}_{n}$ is drawn from ${P}_{n}$ then $O={O}_{i}$ with probability ${n}^{-1}$ for each $1\le i\le n$.

**Estimator**.An estimator is a random variable obtained by combining the observations yielded by an experiment for the sake of estimating a feature of interest of the experiment.

**Example**.Consider ${O}_{1},\dots ,{O}_{n}$ independent random variables drawn from a common law *P*. The empirical mean ${n}^{-1}{\sum}_{i=1}^{n}{O}_{i}$ is an estimator of the mean $\mathrm{\vartheta}(P)=P\{O\}$ of $O\sim P$. If *O* is real-valued and if $P\{|O|\}$ is finite then the empirical mean is a strongly consistent estimator (by the strong law of large numbers).

**Feature**.See “parameter”.

**Gaussian law**.The real-valued random variable *O* is drawn from the standard Gaussian law $N(0,1)$ if for all $a\le b$, the probability that $O\in [a,b]$ equals the area under the Gauss curve of equation $t\mapsto {\sqrt{2\mathrm{\pi}}}^{-1}exp(-{t}^{2}/2)$. This law is particularly important because it naturally appears as a limit law of sequences of experiments in theorems referred to as “central limit theorems”.

**Independence**.Consider a collection $\{{O}_{i}:i\in I\}$ of random variables indexed by *I*. Let ${I}_{1},{I}_{2}\subset I$ be two subsets of *I*. We say that ${\mathcal{O}}_{1}=\{{O}_{i}:i\in {I}_{1}\}$ is independent from ${\mathcal{O}}_{2}=\{{O}_{i}:i\in {I}_{2}\}$ if the values of the realizations of the first set are not influenced by the values of the realizations of the second set. More formally, ${\mathcal{O}}_{1}$ is independent from ${\mathcal{O}}_{2}$ if the joint law of $({\mathcal{O}}_{1},{\mathcal{O}}_{2})$ is the product of the two marginal laws of ${\mathcal{O}}_{1}$ and ${\mathcal{O}}_{2}$, or, equivalently, if the conditional law of ${\mathcal{O}}_{2}$ given ${\mathcal{O}}_{1}$ coincides with the marginal law of ${\mathcal{O}}_{2}$, and the other way around.

**Example**.Let $(W,Y)\in \{0,1{\}}^{2}$ be such that $P(W=Y=1)=1/10$, $P(W=1,Y=0)=1/15$, $P(W=0,Y=1)=1/2$, and $P(W=Y=0)=1/3$.

The marginal law of *Y* is the Bernoulli law with parameter $P(Y=1)=P(Y=1\text{\hspace{0.17em}}\mathrm{a}\mathrm{n}\mathrm{d}\text{\hspace{0.17em}}(W=1\mathrm{o}\mathrm{r}\text{\hspace{0.17em}}W=0))=P(W=Y=1)+P(W=0,Y=1)=3/5$. The marginal law of *W* is the Bernoulli law with parameter $P(W=1)\phantom{\rule{negativethinmathspace}{0ex}}=\phantom{\rule{negativethinmathspace}{0ex}}P(W=1\text{\hspace{0.17em}}\mathrm{a}\mathrm{n}\mathrm{d}\text{\hspace{0.17em}}(Y=1\text{\hspace{0.17em}}\mathrm{o}\mathrm{r}\text{\hspace{0.17em}}Y=0))=P(W=Y=1)+P(W=1,Y=0)=1/6$.

Note that $P(W=1)P(Y=1)=1/10=P(W=Y=1)$, $P(W=1)P(Y=0)=1/15=P(W=1,Y=0)$, $P(W=0)P(Y=1)=3/5=P(W=0,Y=1)$ and $P(W=0)P(Y=0)=1/3=P(W=Y=0)$. Thus, *W* and *Y* are independent under *P*.

**Inference**.Statistical inference is the process of drawing inferences from data. Statistical inference relies on mathematical procedures developed in the framework of the theory of statistics, which builds upon the theory of probability, for the sake of analyzing the structure of a random experiment based on its observation. The analysis is typically expressed in terms of pointwise or confidence-interval-based estimation, or hypotheses testing, or regression.

**Joint law**.Consider a random variable *O* which decomposes as $O=(W,Y)$. The joint law of *O* is the law of the couple $(W,Y)$.

**Law**.The law *P* of a random variable *O* is the exhaustive description of how chance produces a realization of *O*. We note $O\sim P$ to indicate that *O* is drawn from the law *P*.

**Law of large numbers**.A law of large numbers is a probabilistic theorem providing assumptions which guarantee that the empirical mean ${n}^{-1}{\sum}_{i=1}^{n}{O}_{i}$ of *n* random variables ${O}_{1},\dots ,{O}_{n}$ sharing a common law *P* converges to their common mean $P\{O\}$. We say that such a law is “weak” if the convergence takes place in probability, *i.e*. if whatever is a fixed margin of error, the probability that the gap separating the empirical mean and its theoretical counterpart exceed this error goes to 0 when *n* goes to infinity. It was J. Bernoulli who first formalized this law, back in 1690. A weak law of large numbers notably holds when the random variables ${O}_{1},\dots ,{O}_{n}$ are real-valued, independent, and such that $P\{|O|\}$ be finite.

We say that such a law is “strong” if the convergence takes place almost surely, *i.e*. if there is a 1-probability that the empirical mean converge to its theoretical counterpart when *n* goes to infinity. If a strong law holds then a weak law necessarily holds true. The reverse is not true. A. Kolmogorov proved in 1929 that a strong law of large numbers notably holds when ${O}_{1},\dots ,{O}_{n}$ are real-valued, independent, and such that $P\{|O|\}$ be finite.

For B. Gnedenko and A. Kolmogorov [68],

In fact, all epistemologic value of the theory of probability is based on this: that large-scale random phenomena, in their collective action, create strict non-random regularity.

**Likelihood**.The likelihood of an observation *O* under a law *P* susceptible to produce *O* quantifies how likely it is that *O* be actually drawn from *P*. The more likely it is, the larger the likelihood. The maximum likelihood principle builds upon this interpretation: given two laws of identical complexity, both susceptible to produce the observation *O*, one must prefer that law which maximizes the likelihood. If the two laws have differing complexities then the comparison of their likelihoods requires a preliminary adjustment based on a parsimony principle, the more complex law being naturally advantaged over the simpler law.

**Marginal law**.Consider a random variable $O\sim P$ which decomposes as $O=(W,Y)$. The marginal law of *Y* is the law of the random variable *Y* extracted from *O*. This expression originates from the vocabulary of contingency tables.

**Example**.Let $(W,Y)\in \{0,1{\}}^{2}$ be such that $P(W=Y=1)=1/10$, $P(W=1,Y=0)=1/5$, $P(W=0,Y=1)=3/10$, and $P(W=Y=0)=2/5$. Then $P(W=1)\phantom{\rule{negativethinmathspace}{0ex}}=\phantom{\rule{negativethinmathspace}{0ex}}P(W=1\phantom{\rule{thickmathspace}{0ex}}\mathrm{a}\mathrm{n}\mathrm{d}\phantom{\rule{thickmathspace}{0ex}}(Y=1\phantom{\rule{thickmathspace}{0ex}}\mathrm{o}\mathrm{r}\phantom{\rule{thickmathspace}{0ex}}Y=0))=P(W=Y=1)+P(W=1,Y=0)=3/10$, so the marginal law of *W* is the Bernoulli law with parameter $3/10$. Likewise, $P(Y=1)=P(Y=1\phantom{\rule{thickmathspace}{0ex}}\mathrm{a}\mathrm{n}\mathrm{d}\phantom{\rule{thickmathspace}{0ex}}(W=1\phantom{\rule{thickmathspace}{0ex}}\mathrm{o}\mathrm{r}\phantom{\rule{thickmathspace}{0ex}}W=0))=P(W=Y=1)+P(W=0,Y=1)=2/5$, so the marginal law of *Y* is the Bernoulli law with parameter $2/5$.

**Model**.A model is a collection of laws from which the observation *O* may be drawn. A model is said parametric if its elements are identified by a finite-dimensional parameter.

**Example**.Let $M$ be the non-parametric model consisting of all laws compatible with the definition of the observation *O*. A subset $\{P(\epsilon ):\epsilon \in [-1,1]\}\subset \mathcal{M}$ of candidate laws $P(\epsilon )$ identified by the real parameter $\epsilon $ is a parametric model. Since it is one-dimensional, it is often called a “path”.

**Parameter**.Value of a functional defined upon a model and evaluated at a law belonging to that model.

**Example**.$\mathrm{\theta}(\mathbb{P})$ for $\mathrm{\theta}:\mathbb{M}\to \mathrm{\Theta}$ or $\mathrm{\vartheta}(P)$ for $\mathrm{\vartheta}:\mathcal{M}\to \mathrm{\Theta}$.

**Random variable**.Description, possibly non-exhaustive, of the result of a random experiment, *i.e*. of a reproducible experiment subject to chance.

**Example**.The experiment consisting in flipping a balanced coin in a well-defined experimental setting is a random experiment (it is reproducible and we cannot be certain of its outcome). The result of each coin toss is described by the random variable which takes the value 1 for tail and 0 otherwise. The law of this random variable is the Bernoulli law with parameter $1/2$.

**Regression**.Given observations ${O}_{1},\dots ,{O}_{n}$ of a generic data $O=(W,Y)$, regressing *Y* on *W* consists in inferring from the observations information on how *Y* depends on *W*. Typically, regressing *Y* on *W* means explaining the mean of the random variable *Y* conditionally on *W*, *i.e*. expressing the mean of *Y* as a function of *W*.

**Example**.If $Y\in \{0,1\}$ then regressing *Y* on *W* amounts to estimating the conditional probability $P(Y=1|W)$ that *Y* equal 1 given the value of *W*. This example is an instance of regression in the aforementioned typical sense since $P(Y=1|W)$ coincides with the conditional mean $P\{Y|W\}$ of *Y* given *W*.

**Substitution estimator**.Given a functional $\mathrm{\vartheta}:\mathcal{M}\to \mathrm{\Theta}$ of interest, an estimator ${\mathrm{\vartheta}}_{n}$ of the parameter $\mathrm{\vartheta}(P)$ is a substitution estimator if it writes as ${\mathrm{\vartheta}}_{n}=\mathrm{\vartheta}({P}_{n})$ for a law ${P}_{n}$ approaching *P*.

**Example**.Consider $\mathrm{\vartheta}:\mathcal{M}\to \mathbb{R}$ such that $\mathrm{\vartheta}(P)=P\{O\}$ where $\mathcal{M}$ is a set of laws which all admit a finite mean. Let ${O}_{1},\dots ,{O}_{n}$ be independent random variables with a common law *P* and let ${P}_{n}$ be the empirical measure. The empirical mean ${n}^{-1}{\sum}_{i=1}^{n}{O}_{i}=\mathrm{\vartheta}({P}_{n})$ is a substitution estimator of the mean $\mathrm{\vartheta}(P)$.

**Uniform law**.The real-valued random variable *O* is drawn from the uniform law on $[A,B]$ if for all $A\le a\le b\le B$, the probability that $O\in [a,b]$ equals the ratio $(b-a)/(B-A)$.

## Comments (0)