Identificabilidad

En estadística , la identificabilidad es una propiedad que un modelo debe satisfacer para que sea posible una inferencia precisa . Un modelo es identificable si es teóricamente posible aprender los valores verdaderos de los parámetros subyacentes de este modelo después de obtener un número infinito de observaciones de él. Matemáticamente, esto equivale a decir que diferentes valores de los parámetros deben generar diferentes distribuciones de probabilidad de las variables observables. Por lo general, el modelo es identificable solo bajo ciertas restricciones técnicas, en cuyo caso el conjunto de estos requisitos se denomina condiciones de identificación .

Se dice que un modelo que no es identificable es no identificable o no identificable : dos o más parametrizaciones son observacionalmente equivalentes . En algunos casos, aunque un modelo no sea identificable, aún es posible conocer los valores verdaderos de un cierto subconjunto de los parámetros del modelo. En este caso decimos que el modelo es parcialmente identificable . En otros casos, puede ser posible aprender la ubicación del parámetro verdadero hasta una cierta región finita del espacio de parámetros, en cuyo caso el modelo se establece como identificable .

Aparte de la exploración estrictamente teórica de las propiedades del modelo, se puede hacer referencia a la identificabilidad en un ámbito más amplio cuando se prueba un modelo con conjuntos de datos experimentales, utilizando análisis de identificabilidad . ^[1]

Definición

Let ${\mathcal {P}}=\{P_{\theta }:\theta \in \Theta \}$ be a statistical model where the parameter space $\Theta$ is either finite- or infinite-dimensional. We say that ${\mathcal {P}}$ is identifiable if the mapping $\theta \mapsto P_{\theta }$ is one-to-one:^[2]

P_{\theta _{1}}=P_{\theta _{2}}\quad \Rightarrow \quad \theta _{1}=\theta _{2}\quad \ {\text{for all }}\theta _{1},\theta _{2}\in \Theta .

This definition means that distinct values of θ should correspond to distinct probability distributions: if θ₁≠θ₂, then also P_θ₁≠P_θ₂.^[3] If the distributions are defined in terms of the probability density functions (pdfs), then two pdfs should be considered distinct only if they differ on a set of non-zero measure (for example two functions ƒ₁(x) = 1_{0 ≤ x < 1} and ƒ₂(x) = 1_{0 ≤ x ≤ 1} differ only at a single point x = 1 — a set of measure zero — and thus cannot be considered as distinct pdfs).

Identifiability of the model in the sense of invertibility of the map $\theta \mapsto P_{\theta }$ is equivalent to being able to learn the model's true parameter if the model can be observed indefinitely long. Indeed, if {X_t} ⊆ S is the sequence of observations from the model, then by the strong law of large numbers,

{\frac {1}{T}}\sum _{t=1}^{T}\mathbf {1} _{\{X_{t}\in A\}}\ {\xrightarrow {\text{a.s.}}}\ \Pr[X_{t}\in A],

for every measurable set A ⊆ S (here 1_{...} is the indicator function). Thus, with an infinite number of observations we will be able to find the true probability distribution P₀ in the model, and since the identifiability condition above requires that the map $\theta \mapsto P_{\theta }$ be invertible, we will also be able to find the true value of the parameter which generated given distribution P₀.

Examples

Example 1

Let ${\mathcal {P}}$ be the normal location-scale family:

{\mathcal {P}}={\Big \{}\ f_{\theta }(x)={\tfrac {1}{{\sqrt {2\pi }}\sigma }}e^{-{\frac {1}{2\sigma ^{2}}}(x-\mu )^{2}}\ {\Big |}\ \theta =(\mu ,\sigma ):\mu \in \mathbb {R} ,\,\sigma \!>0\ {\Big \}}.

Then

{\begin{aligned}&f_{\theta _{1}}=f_{\theta _{2}}\\[6pt]\Longleftrightarrow {}&{\frac {1}{{\sqrt {2\pi }}\sigma _{1}}}\exp \left(-{\frac {1}{2\sigma _{1}^{2}}}(x-\mu _{1})^{2}\right)={\frac {1}{{\sqrt {2\pi }}\sigma _{2}}}\exp \left(-{\frac {1}{2\sigma _{2}^{2}}}(x-\mu _{2})^{2}\right)\\[6pt]\Longleftrightarrow {}&{\frac {1}{\sigma _{1}^{2}}}(x-\mu _{1})^{2}+\ln \sigma _{1}={\frac {1}{\sigma _{2}^{2}}}(x-\mu _{2})^{2}+\ln \sigma _{2}\\[6pt]\Longleftrightarrow {}&x^{2}\left({\frac {1}{\sigma _{1}^{2}}}-{\frac {1}{\sigma _{2}^{2}}}\right)-2x\left({\frac {\mu _{1}}{\sigma _{1}^{2}}}-{\frac {\mu _{2}}{\sigma _{2}^{2}}}\right)+\left({\frac {\mu _{1}^{2}}{\sigma _{1}^{2}}}-{\frac {\mu _{2}^{2}}{\sigma _{2}^{2}}}+\ln \sigma _{1}-\ln \sigma _{2}\right)=0\end{aligned}}

This expression is equal to zero for almost all x only when all its coefficients are equal to zero, which is only possible when |σ₁| = |σ₂| and μ₁ = μ₂. Since in the scale parameter σ is restricted to be greater than zero, we conclude that the model is identifiable: ƒ_θ₁ = ƒ_θ₂ ⇔ θ₁ = θ₂.

Example 2

Let ${\mathcal {P}}$ be the standard linear regression model:

y=\beta 'x+\varepsilon ,\quad \mathrm {E} [\,\varepsilon \mid x\,]=0

(where ′ denotes matrix transpose). Then the parameter β is identifiable if and only if the matrix $\mathrm {E} [xx']$ is invertible. Thus, this is the identification condition in the model.

Example 3

Suppose ${\mathcal {P}}$ is the classical errors-in-variables linear model:

{\begin{cases}y=\beta x^{*}+\varepsilon ,\\x=x^{*}+\eta ,\end{cases}}

where (ε,η,x*) are jointly normal independent random variables with zero expected value and unknown variances, and only the variables (x,y) are observed. Then this model is not identifiable,^[4] only the product βσ²_∗ is (where σ²_∗ is the variance of the latent regressor x*). This is also an example of a set identifiable model: although the exact value of β cannot be learned, we can guarantee that it must lie somewhere in the interval (β_yx, 1÷β_xy), where β_yx is the coefficient in OLS regression of y on x, and β_xy is the coefficient in OLS regression of x on y.^[5]

If we abandon the normality assumption and require that x* were not normally distributed, retaining only the independence condition ε ⊥ η ⊥ x*, then the model becomes identifiable.^[4]

Software

In the case of parameter estimation in partially observed dynamical systems, the profile likelihood can be also used for structural and practical identifiability analysis.^[6] An implementation of the [1] is available in the MATLAB Toolbox PottersWheel.

References

Citations

^ Raue, A.; Kreutz, C.; Maiwald, T.; Bachmann, J.; Schilling, M.; Klingmuller, U.; Timmer, J. (2009-08-01). "Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood". Bioinformatics. 25 (15): 1923–1929. doi:10.1093/bioinformatics/btp358. PMID 19505944.
^ Lehmann & Casella 1998, Definition 1.5.2
^ van der Vaart 1998, p. 62
^ ^a ^b Reiersøl 1950
^ Casella & Berger 2001, p. 583
^ Raue, A; Kreutz, C; Maiwald, T; Bachmann, J; Schilling, M; Klingmüller, U; Timmer, J (2009), "Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood", Bioinformatics, 25 (15): 1923–9, doi:10.1093/bioinformatics/btp358, PMID 19505944, archived from the original on 2013-01-13.

Sources

Casella, George; Berger, Roger L. (2002), Statistical Inference (2nd ed.), ISBN 0-534-24312-6, LCCN 2001025794CS1 maint: ref duplicates default (link)
Hsiao, Cheng (1983), Identification, Handbook of Econometrics, Vol. 1, Ch.4, North-Holland Publishing Company
Lehmann, E. L.; Casella, G. (1998), Theory of Point Estimation (2nd ed.), Springer, ISBN 0-387-98502-6CS1 maint: ref duplicates default (link)
Reiersøl, Olav (1950), "Identifiability of a linear relation between variables which are subject to error", Econometrica, 18 (4): 375–389, doi:10.2307/1907835, JSTOR 1907835
van der Vaart, A. W. (1998), Asymptotic Statistics, Cambridge University Press, ISBN 978-0-521-49603-2