Prueba del ejemplo de Stein

El ejemplo de Stein es un resultado importante en la teoría de la decisión que puede enunciarse como

La regla de decisión ordinaria para estimar la media de una distribución gaussiana multivariante es inadmisible bajo el riesgo de error cuadrático medio en la dimensión al menos 3 .

El siguiente es un resumen de su prueba . ^[1] Se remite al lector al artículo principal para obtener más información.

Prueba bosquejada

La función de riesgo de la regla de decisión ${\ Displaystyle d (\ mathbf {x}) = \ mathbf {x}}$ es

{\ Displaystyle R (\ theta, d) = \ operatorname {E} _ {\ theta} [| \ mathbf {\ theta -X} | ^ {2}]}

{\ displaystyle = \ int (\ mathbf {\ theta -x}) ^ {T} (\ mathbf {\ theta -x}) \ left ({\ frac {1} {2 \ pi}} \ right) ^ { n / 2} e ^ {(- 1/2) (\ mathbf {\ theta -x}) ^ {T} (\ mathbf {\ theta -x})} m (dx)}

{\ Displaystyle = n.}

Ahora considere la regla de decisión

{\ Displaystyle d '(\ mathbf {x}) = \ mathbf {x} - {\ frac {\ alpha} {| \ mathbf {x} | ^ {2}}} \ mathbf {x}}

dónde ${\ Displaystyle \ alpha = n-2}$ . Te mostraremos que ${\ displaystyle d '}$ es una mejor regla de decisión que ${\ Displaystyle d}$ . La función de riesgo es

{\ Displaystyle R (\ theta, d ') = \ operatorname {E} _ {\ theta} \ left [\ left | \ mathbf {\ theta -X} + {\ frac {\ alpha} {| \ mathbf {X } | ^ {2}}} \ mathbf {X} \ right | ^ {2} \ right]}

{\ displaystyle = \ operatorname {E} _ {\ theta} \ left [| \ mathbf {\ theta -X} | ^ {2} +2 (\ mathbf {\ theta -X}) ^ {T} {\ frac {\ alpha} {| \ mathbf {X} | ^ {2}}} \ mathbf {X} + {\ frac {\ alpha ^ {2}} {| \ mathbf {X} | ^ {4}}} | \ mathbf {X} | ^ {2} \ right]}

{\ displaystyle = \ operatorname {E} _ {\ theta} \ left [| \ mathbf {\ theta -X} | ^ {2} \ right] +2 \ alpha \ operatorname {E} _ {\ theta} \ left [{\ frac {\ mathbf {(\ theta -X) ^ {T} X}} {| \ mathbf {X} | ^ {2}}} \ right] + \ alpha ^ {2} \ operatorname {E} _ {\ theta} \ left [{\ frac {1} {| \ mathbf {X} | ^ {2}}} \ right]}

- una cuadrática en ${\ Displaystyle \ alpha}$ . Podemos simplificar el término medio considerando una función general de "buen comportamiento" ${\ Displaystyle h: \ mathbf {x} \ mapsto h (\ mathbf {x}) \ in \ mathbb {R}}$ y el uso de la integración por partes . Para ${\ Displaystyle 1 \ leq i \ leq n}$ , para cualquier diferenciable continua ${\ Displaystyle h}$ creciendo lo suficientemente lento para grandes ${\ Displaystyle x_ {i}}$ tenemos:

{\ Displaystyle \ operatorname {E} _ {\ theta} [(\ theta _ {i} -X_ {i}) h (\ mathbf {X}) | X_ {j} = x_ {j} (j \ neq i )] = \ int (\ theta _ {i} -x_ {i}) h (\ mathbf {x}) \ left ({\ frac {1} {2 \ pi}} \ right) ^ {n / 2} e ^ {- (1/2) \ mathbf {(x- \ theta)} ^ {T} \ mathbf {(x- \ theta)}} m (dx_ {i})}

{\ Displaystyle = \ left [h (\ mathbf {x}) \ left ({\ frac {1} {2 \ pi}} \ right) ^ {n / 2} e ^ {- (1/2) \ mathbf {(x- \ theta)} ^ {T} \ mathbf {(x- \ theta)}} \ right] _ {x_ {i} = - \ infty} ^ {\ infty} - \ int {\ frac {\ parcial h} {\ parcial x_ {i}}} (\ mathbf {x}) \ left ({\ frac {1} {2 \ pi}} \ right) ^ {n / 2} e ^ {- (1 / 2) \ mathbf {(x- \ theta)} ^ {T} \ mathbf {(x- \ theta)}} m (dx_ {i})}

{\ Displaystyle = - \ operatorname {E} _ {\ theta} \ left [{\ frac {\ h parcial} {\ parcial x_ {i}}} (\ mathbf {X}) | X_ {j} = x_ { j} (j \ neq i) \ right].}

Por lo tanto,

{\ Displaystyle \ operatorname {E} _ {\ theta} [(\ theta _ {i} -X_ {i}) h (\ mathbf {X})] = - \ operatorname {E} _ {\ theta} \ left [{\ frac {\ h parcial} {\ parcial x_ {i}}} (\ mathbf {X}) \ derecha].}

(Este resultado se conoce como lema de Stein ).

Ahora, elegimos

{\ Displaystyle h (\ mathbf {x}) = {\ frac {x_ {i}} {| \ mathbf {x} | ^ {2}}}.}

Si ${\ Displaystyle h}$ cumplió la condición de "buen comportamiento" (no es así, pero esto se puede remediar, ver más abajo), habríamos

{\ Displaystyle {\ frac {\ parcial h} {\ parcial x_ {i}}} = {\ frac {1} {| \ mathbf {x} | ^ {2}}} - {\ frac {2x_ {i} ^ {2}} {| \ mathbf {x} | ^ {4}}}}

y entonces

{\ Displaystyle \ operatorname {E} _ {\ theta} \ left [{\ frac {\ mathbf {(\ theta -X) ^ {T} X}} {| \ mathbf {X} | ^ {2}}} \ right] = \ sum _ {i = 1} ^ {n} \ operatorname {E} _ {\ theta} \ left [(\ theta _ {i} -X_ {i}) {\ frac {X_ {i} } {| \ mathbf {X} | ^ {2}}} \ right]}

{\ Displaystyle = - \ sum _ {i = 1} ^ {n} \ operatorname {E} _ {\ theta} \ left [{\ frac {1} {| \ mathbf {X} | ^ {2}}} - {\ frac {2X_ {i} ^ {2}} {| \ mathbf {X} | ^ {4}}} \ right]}

{\ displaystyle = - (n-2) \ operatorname {E} _ {\ theta} \ left [{\ frac {1} {| \ mathbf {X} | ^ {2}}} \ right].}

Luego, volviendo a la función de riesgo de ${\ displaystyle d '}$ :

{\ Displaystyle R (\ theta, d ') = n-2 \ alpha (n-2) \ operatorname {E} _ {\ theta} \ left [{\ frac {1} {| \ mathbf {X} | ^ {2}}} \ right] + \ alpha ^ {2} \ operatorname {E} _ {\ theta} \ left [{\ frac {1} {| \ mathbf {X} | ^ {2}}} \ right ].}

Esta cuadrática en ${\ Displaystyle \ alpha}$ se minimiza en

{\ Displaystyle \ alpha = n-2,}

donación

{\ Displaystyle R (\ theta, d ') = R (\ theta, d) - (n-2) ^ {2} \ operatorname {E} _ {\ theta} \ left [{\ frac {1} {| \ mathbf {X} | ^ {2}}} \ right]}

que por supuesto satisface

{\ Displaystyle R (\ theta, d ')

haciendo ${\ Displaystyle d}$ una regla de decisión inadmisible.

Queda por justificar el uso de

{\ Displaystyle h (\ mathbf {X}) = {\ frac {\ mathbf {X}} {| \ mathbf {X} | ^ {2}}}.}

Esta función no es continuamente diferenciable, ya que es singular en ${\ Displaystyle \ mathbf {x} = 0}$ . Sin embargo, la función

{\ Displaystyle h (\ mathbf {X}) = {\ frac {\ mathbf {X}} {\ varepsilon + | \ mathbf {X} | ^ {2}}}}

es continuamente diferenciable, y después de seguir el álgebra y dejar que ${\ Displaystyle \ varepsilon \ to 0}$ , se obtiene el mismo resultado.

Referencias

^ Samworth, Richard (diciembre de 2012). "La paradoja de Stein" (PDF) . Eureka . 62 : 38–41.