Teorema de Karhunen-Loève

En la teoría de los procesos estocásticos , el teorema de Karhunen-Loève (llamado así por Kari Karhunen y Michel Loève ), también conocido como el teorema de Kosambi-Karhunen-Loève ^[1]^[2] es una representación de un proceso estocástico como una combinación lineal infinita de funciones ortogonales , análoga a una representación en serie de Fourier de una función en un intervalo acotado. La transformación también se conoce como transformación de Hotelling y transformación de vector propio , y está estrechamente relacionada con el análisis de componentes principales.(PCA) técnica ampliamente utilizada en el procesamiento de imágenes y en el análisis de datos en muchos campos. ^[3]

Los procesos estocásticos dados por series infinitas de esta forma fueron considerados por primera vez por Damodar Dharmananda Kosambi . ^[4]^[5] Existen muchas expansiones de este tipo de un proceso estocástico: si el proceso se indexa sobre $[a, b]$ , cualquier base ortonormal de $L 2 ([a, b])$ produce una expansión del mismo en esa forma. La importancia del teorema de Karhunen-Loève es que produce la mejor base en el sentido de que minimiza el error cuadrático medio total .

A diferencia de una serie de Fourier donde los coeficientes son números fijos y la base de expansión consta de funciones sinusoidales (es decir, funciones seno y coseno ), los coeficientes del teorema de Karhunen-Loève son variables aleatorias y la base de expansión depende del proceso. De hecho, las funciones de base ortogonal utilizadas en esta representación están determinadas por la función de covarianza del proceso. Se puede pensar que la transformada Karhunen-Loève se adapta al proceso para producir la mejor base posible para su expansión.

En el caso de un proceso estocástico centrado ${X t} t \in [a, b]$ ( centrado significa $E [X t] = 0$ para todo $t \in [a, b]$ ) que satisface una condición de continuidad técnica, $X t$ admite una descomposición

{\ Displaystyle X_ {t} = \ sum _ {k = 1} ^ {\ infty} Z_ {k} e_ {k} (t)}

donde $Z k$ son variables aleatorias no correlacionadas por pares y las funciones $e k$ son funciones continuas de valor real en $[a, b]$ que son ortogonales por pares en $L 2 ([a, b])$ . Por lo tanto, a veces se dice que la expansión es bi-ortogonal ya que los coeficientes aleatorios $Z k$ son ortogonales en el espacio de probabilidad mientras que las funciones deterministas $e k$ son ortogonales en el dominio del tiempo. El caso general de un proceso $X t$ que no está centrado puede devolverse al caso de un proceso centrado considerando $X t - E [X t]$ que es un proceso centrado.

Además, si el proceso es gaussiano , entonces las variables aleatorias $Z k$ son gaussianas y estocásticamente independientes . Este resultado generaliza la transformada Karhunen-Loève . Un ejemplo importante de un proceso estocástico real centrado en $[0, 1]$ es el proceso de Wiener ; el teorema de Karhunen-Loève se puede utilizar para proporcionar una representación ortogonal canónica para él. En este caso, la expansión consta de funciones sinusoidales.

La expansión anterior en variables aleatorias no correlacionadas también se conoce como la expansión Karhunen-Loève o descomposición Karhunen-Loève . La versión empírica (es decir, con los coeficientes calculados a partir de una muestra) se conoce como la transformada de Karhunen-Loève (KLT), análisis de componentes principales , descomposición ortogonal adecuada (POD) , funciones ortogonales empíricas (un término utilizado en meteorología y geofísica ), o la transformación de Hotelling .

Formulación

A lo largo de este artículo, consideraremos un proceso aleatorio de media cero integrable al cuadrado $X t$ definido sobre un espacio de probabilidad $(Ω, F, P)$ e indexado sobre un intervalo cerrado $[a, b]$ , con función de covarianza $K X (s, t)$ . Así tenemos:

{\ Displaystyle \ forall t \ in [a, b] \ qquad X_ {t} \ in L ^ {2} (\ Omega, F, \ mathbf {P}),}

{\ Displaystyle \ forall t \ in [a, b] \ qquad \ mathbf {E} [X_ {t}] = 0,}

{\ Displaystyle \ forall t, s \ in [a, b] \ qquad K_ {X} (s, t) = \ mathbf {E} [X_ {s} X_ {t}].}

Asociamos a $K X$ un operador lineal $T K X$ definido de la siguiente manera:

{\ Displaystyle {\ begin {alineado} & T_ {K_ {X}} &: L ^ {2} ([a, b]) & \ to L ^ {2} ([a, b]) \\ &&: f \ mapsto T_ {K_ {X}} f & = \ int _ {a} ^ {b} K_ {X} (s, \ cdot) f (s) \, ds \ end {alineado}}}

Dado que

T K X

es un operador lineal, tiene sentido hablar de sus valores propios λ _k y funciones propias

e k

, que se encuentran resolviendo la ecuación integral de Fredholm homogénea de segundo tipo

{\ Displaystyle \ int _ {a} ^ {b} K_ {X} (s, t) e_ {k} (s) \, ds = \ lambda _ {k} e_ {k} (t)}

Declaración del teorema

Teorema . Sea $X t$ un proceso estocástico integrable al cuadrado de media cero definido sobre un espacio de probabilidad $(Ω, F, P)$ e indexado sobre un intervalo cerrado y acotado [ a , b ], con función de covarianza continua $K X (s, t)$ .

Entonces $K X (s, t)$ es un núcleo de Mercer y dejando que $e k$ sea una base ortonormal en $L 2 ([a, b])$ formada por las funciones propias de $T K X$ con sus respectivos valores propios $λ k, X t$ admite la siguiente representación

{\ Displaystyle X_ {t} = \ sum _ {k = 1} ^ {\ infty} Z_ {k} e_ {k} (t)}

donde la convergencia está en $L 2$ , uniforme en t y

{\ Displaystyle Z_ {k} = \ int _ {a} ^ {b} X_ {t} e_ {k} (t) \, dt}

Además, las variables aleatorias $Z k$ tienen media cero, no están correlacionadas y tienen varianza λ _k

{\ Displaystyle \ mathbf {E} [Z_ {k}] = 0, ~ \ forall k \ in \ mathbb {N} \ qquad {\ mbox {y}} \ qquad \ mathbf {E} [Z_ {i} Z_ {j}] = \ delta _ {ij} \ lambda _ {j}, ~ \ forall i, j \ in \ mathbb {N}}

Tenga en cuenta que por las generalizaciones del teorema de Mercer podemos reemplazar el intervalo [ a , b ] con otros espacios compactos C y la medida de Lebesgue en [ un , b ] con una medida de Borel cuyo soporte es C .

Prueba

La función de covarianza $K X$ satisface la definición de un núcleo de Mercer. Según el teorema de Mercer , existe en consecuencia un conjunto $λ k$ , $e k (t)$ de valores propios y funciones propias de $T K X que$ forman una base ortonormal de $L 2 ([a, b])$ , y $K X$ se puede expresar como

{\ Displaystyle K_ {X} (s, t) = \ sum _ {k = 1} ^ {\ infty} \ lambda _ {k} e_ {k} (s) e_ {k} (t)}

El proceso $X t$ se puede expandir en términos de las funciones propias $e k$ como:

{\ Displaystyle X_ {t} = \ sum _ {k = 1} ^ {\ infty} Z_ {k} e_ {k} (t)}

donde los coeficientes (variables aleatorias)

Z k

están dados por la proyección de

X t

sobre las respectivas funciones propias

{\ Displaystyle Z_ {k} = \ int _ {a} ^ {b} X_ {t} e_ {k} (t) \, dt}

Entonces podemos derivar

{\ Displaystyle {\ begin {alineado} \ mathbf {E} [Z_ {k}] & = \ mathbf {E} \ left [\ int _ {a} ^ {b} X_ {t} e_ {k} (t ) \, dt \ right] = \ int _ {a} ^ {b} \ mathbf {E} [X_ {t}] e_ {k} (t) dt = 0 \\ [8pt] \ mathbf {E} [ Z_ {i} Z_ {j}] & = \ mathbf {E} \ left [\ int _ {a} ^ {b} \ int _ {a} ^ {b} X_ {t} X_ {s} e_ {j } (t) e_ {i} (s) \, dt \, ds \ right] \\ & = \ int _ {a} ^ {b} \ int _ {a} ^ {b} \ mathbf {E} \ izquierda [X_ {t} X_ {s} \ right] e_ {j} (t) e_ {i} (s) \, dt \, ds \\ & = \ int _ {a} ^ {b} \ int _ {a} ^ {b} K_ {X} (s, t) e_ {j} (t) e_ {i} (s) \, dt \, ds \\ & = \ int _ {a} ^ {b} e_ {i} (s) \ left (\ int _ {a} ^ {b} K_ {X} (s, t) e_ {j} (t) \, dt \ right) \, ds \\ & = \ lambda _ {j} \ int _ {a} ^ {b} e_ {i} (s) e_ {j} (s) \, ds \\ & = \ delta _ {ij} \ lambda _ {j} \ end {alineado}}}

donde hemos utilizado el hecho de que las

e k

son funciones propias de

T K X

y son ortonormales.

Demostremos ahora que la convergencia está en $L 2$ . Dejar

{\ Displaystyle S_ {N} = \ sum _ {k = 1} ^ {N} Z_ {k} e_ {k} (t).}

Luego:

{\ Displaystyle {\ begin {alineado} \ mathbf {E} \ left [\ left | X_ {t} -S_ {N} \ right | ^ {2} \ right] & = \ mathbf {E} \ left [X_ {t} ^ {2} \ right] + \ mathbf {E} \ left [S_ {N} ^ {2} \ right] -2 \ mathbf {E} \ left [X_ {t} S_ {N} \ right ] \\ & = K_ {X} (t, t) + \ mathbf {E} \ left [\ sum _ {k = 1} ^ {N} \ sum _ {l = 1} ^ {N} Z_ {k } Z _ {\ ell} e_ {k} (t) e _ {\ ell} (t) \ right] -2 \ mathbf {E} \ left [X_ {t} \ sum _ {k = 1} ^ {N} Z_ {k} e_ {k} (t) \ right] \\ & = K_ {X} (t, t) + \ sum _ {k = 1} ^ {N} \ lambda _ {k} e_ {k} (t) ^ {2} -2 \ mathbf {E} \ left [\ sum _ {k = 1} ^ {N} \ int _ {a} ^ {b} X_ {t} X_ {s} e_ {k } (s) e_ {k} (t) \, ds \ right] \\ & = K_ {X} (t, t) - \ sum _ {k = 1} ^ {N} \ lambda _ {k} e_ {k} (t) ^ {2} \ end {alineado}}}

que va a 0 por el teorema de Mercer.

Propiedades de la transformada Karhunen-Loève

Caso especial: distribución gaussiana

Dado que el límite en la media de las variables aleatorias conjuntamente gaussianas es conjuntamente gaussianas, y las variables aleatorias (centradas) conjuntamente gaussianas son independientes si y solo si son ortogonales, también podemos concluir:

Teorema . Las variables $Z i$ tienen una distribución gaussiana conjunta y son estocásticamente independientes si el proceso original ${X t} t$ es gaussiano.

En el caso de Gauss, dado que las variables $Z i$ son independientes, podemos decir más:

{\ Displaystyle \ lim _ {N \ to \ infty} \ sum _ {i = 1} ^ {N} e_ {i} (t) Z_ {i} (\ omega) = X_ {t} (\ omega)}

casi seguro.

La transformación Karhunen-Loève decorrelaciona el proceso

Esto es consecuencia de la independencia de $Z k$ .

La expansión de Karhunen-Loève minimiza el error cuadrático medio total

En la introducción, mencionamos que la expansión de Karhunen-Loeve truncada fue la mejor aproximación del proceso original en el sentido de que reduce el error cuadrático medio total resultante de su truncamiento. Debido a esta propiedad, a menudo se dice que la transformada KL compacta óptimamente la energía.

Más específicamente, dada cualquier base ortonormal ${f k$ } de $L 2 ([a, b])$ , podemos descomponer el proceso $X t$ como:

{\ Displaystyle X_ {t} (\ omega) = \ sum _ {k = 1} ^ {\ infty} A_ {k} (\ omega) f_ {k} (t)}

dónde

{\ Displaystyle A_ {k} (\ omega) = \ int _ {a} ^ {b} X_ {t} (\ omega) f_ {k} (t) \, dt}

y podemos aproximar $X t$ por la suma finita

{\ Displaystyle {\ hat {X}} _ {t} (\ omega) = \ sum _ {k = 1} ^ {N} A_ {k} (\ omega) f_ {k} (t)}

para algún entero N .

Reclamo . De todas estas aproximaciones, la aproximación KL es la que minimiza el error cuadrático medio total (siempre que hayamos dispuesto los valores propios en orden decreciente).

[Prueba]

Considere el error resultante del truncamiento en el N -ésimo término en la siguiente expansión ortonormal:

{\ Displaystyle \ varepsilon _ {N} (t) = \ sum _ {k = N + 1} ^ {\ infty} A_ {k} (\ omega) f_ {k} (t)}

El error cuadrático medio ε _N² ( t ) se puede escribir como:

{\ Displaystyle {\ begin {alineado} \ varepsilon _ {N} ^ {2} (t) & = \ mathbf {E} \ left [\ sum _ {i = N + 1} ^ {\ infty} \ sum _ {j = N + 1} ^ {\ infty} A_ {i} (\ omega) A_ {j} (\ omega) f_ {i} (t) f_ {j} (t) \ right] \\ & = \ sum _ {i = N + 1} ^ {\ infty} \ sum _ {j = N + 1} ^ {\ infty} \ mathbf {E} \ left [\ int _ {a} ^ {b} \ int _ {a} ^ {b} X_ {t} X_ {s} f_ {i} (t) f_ {j} (s) \, ds \, dt \ right] f_ {i} (t) f_ {j} ( t) \\ & = \ sum _ {i = N + 1} ^ {\ infty} \ sum _ {j = N + 1} ^ {\ infty} f_ {i} (t) f_ {j} (t) \ int _ {a} ^ {b} \ int _ {a} ^ {b} K_ {X} (s, t) f_ {i} (t) f_ {j} (s) \, ds \, dt \ final {alineado}}}

Luego integramos esta última igualdad sobre [ a , b ]. La ortonormalidad de la f _k produce:

{\ Displaystyle \ int _ {a} ^ {b} \ varepsilon _ {N} ^ {2} (t) \, dt = \ sum _ {k = N + 1} ^ {\ infty} \ int _ {a } ^ {b} \ int _ {a} ^ {b} K_ {X} (s, t) f_ {k} (t) f_ {k} (s) \, ds \, dt}

El problema de minimizar el error cuadrático medio total se reduce a minimizar el lado derecho de esta igualdad sujeto a la restricción de que f _k se normalice. Por lo tanto, introducimos $β k$ , los multiplicadores lagrangianos asociados con estas restricciones, y nuestro objetivo es minimizar la siguiente función:

{\ Displaystyle Er [f_ {k} (t), k \ in \ {N + 1, \ ldots \}] = \ sum _ {k = N + 1} ^ {\ infty} \ int _ {a} ^ {b} \ int _ {a} ^ {b} K_ {X} (s, t) f_ {k} (t) f_ {k} (s) \, ds \, dt- \ beta _ {k} \ izquierda (\ int _ {a} ^ {b} f_ {k} (t) f_ {k} (t) \, dt-1 \ right)}

Diferenciar con respecto a f _i ( t ) (esta es una derivada funcional ) y establecer la derivada en 0 produce:

{\ Displaystyle {\ frac {\ parcial Er} {\ parcial f_ {i} (t)}} = \ int _ {a} ^ {b} \ left (\ int _ {a} ^ {b} K_ {X } (s, t) f_ {i} (s) \, ds- \ beta _ {i} f_ {i} (t) \ right) \, dt = 0}

que se satisface en particular cuando

{\ Displaystyle \ int _ {a} ^ {b} K_ {X} (s, t) f_ {i} (s) \, ds = \ beta _ {i} f_ {i} (t).}

En otras palabras, cuando f _k se eligen para ser las funciones propias de T _{K _X} , por lo tanto, resulta en la expansión KL.

Varianza explicada

Una observación importante es que dado que los coeficientes aleatorios Z _k de la expansión KL no están correlacionados, la fórmula de Bienaymé afirma que la varianza de X _t es simplemente la suma de las varianzas de los componentes individuales de la suma:

{\ Displaystyle \ operatorname {var} [X_ {t}] = \ sum _ {k = 0} ^ {\ infty} e_ {k} (t) ^ {2} \ operatorname {var} [Z_ {k}] = \ sum _ {k = 1} ^ {\ infty} \ lambda _ {k} e_ {k} (t) ^ {2}}

Integrando sobre [ a , b ] y usando la ortonormalidad de la e _k , obtenemos que la varianza total del proceso es:

{\ Displaystyle \ int _ {a} ^ {b} \ operatorname {var} [X_ {t}] \, dt = \ sum _ {k = 1} ^ {\ infty} \ lambda _ {k}}

En particular, la varianza total de la aproximación N truncada es

{\ Displaystyle \ sum _ {k = 1} ^ {N} \ lambda _ {k}.}

Como resultado, la expansión N -truncada explica

{\ Displaystyle {\ frac {\ sum _ {k = 1} ^ {N} \ lambda _ {k}} {\ sum _ {k = 1} ^ {\ infty} \ lambda _ {k}}}}

de la varianza; y si estamos contentos con una aproximación que explica, digamos, el 95% de la varianza, entonces solo tenemos que determinar un ${\ Displaystyle N \ in \ mathbb {N}}$ tal que

{\ Displaystyle {\ frac {\ sum _ {k = 1} ^ {N} \ lambda _ {k}} {\ sum _ {k = 1} ^ {\ infty} \ lambda _ {k}}} \ geq 0,95.}

La expansión Karhunen-Loève tiene la propiedad de entropía de representación mínima

Dada una representación de ${\ Displaystyle X_ {t} = \ sum _ {k = 1} ^ {\ infty} W_ {k} \ varphi _ {k} (t)}$ , por alguna base ortonormal ${\ Displaystyle \ varphi _ {k} (t)}$ y al azar ${\ Displaystyle W_ {k}}$ , dejamos ${\ Displaystyle p_ {k} = \ mathbb {E} [| W_ {k} | ^ {2}] / \ mathbb {E} [| X_ {t} | _ {L ^ {2}} ^ {2} ]}$ , así que eso ${\ Displaystyle \ sum _ {k = 1} ^ {\ infty} p_ {k} = 1}$ . Entonces podemos definir la entropía de representación como ${\ Displaystyle H (\ {\ varphi _ {k} \}) = - \ sum _ {i} p_ {k} \ log (p_ {k})}$ . Entonces nosotros tenemos ${\ Displaystyle H (\ {\ varphi _ {k} \}) \ geq H (\ {e_ {k} \})}$ , para todas las opciones de ${\ Displaystyle \ varphi _ {k}}$ . Es decir, la expansión KL tiene una entropía de representación mínima.

Prueba:

Denote los coeficientes obtenidos para la base ${\ Displaystyle e_ {k} (t)}$ como ${\ Displaystyle p_ {k}}$ , y para ${\ Displaystyle \ varphi _ {k} (t)}$ como ${\ Displaystyle q_ {k}}$ .

Escoger ${\ Displaystyle N \ geq 1}$ . Tenga en cuenta que desde ${\ Displaystyle e_ {k}}$ minimiza el error cuadrático medio, tenemos que

{\ Displaystyle \ mathbb {E} \ left | \ sum _ {k = 1} ^ {N} Z_ {k} e_ {k} (t) -X_ {t} \ right | _ {L ^ {2}} ^ {2} \ leq \ mathbb {E} \ left | \ sum _ {k = 1} ^ {N} W_ {k} \ varphi _ {k} (t) -X_ {t} \ right | _ {L ^ {2}} ^ {2}}

Ampliando el tamaño de la mano derecha, obtenemos:

{\ Displaystyle \ mathbb {E} \ left | \ sum _ {k = 1} ^ {N} W_ {k} \ varphi _ {k} (t) -X_ {t} \ right | _ {L ^ {2 }} ^ {2} = \ mathbb {E} | X_ {t} ^ {2} | _ {L ^ {2}} + \ sum _ {k = 1} ^ {N} \ sum _ {\ ell = 1} ^ {N} \ mathbb {E} [W _ {\ ell} \ varphi _ {\ ell} (t) W_ {k} ^ {*} \ varphi _ {k} ^ {*} (t)] _ {L ^ {2}} - \ sum _ {k = 1} ^ {N} \ mathbb {E} [W_ {k} \ varphi _ {k} X_ {t} ^ {*}] _ {L ^ { 2}} - \ sum _ {k = 1} ^ {N} \ mathbb {E} [X_ {t} W_ {k} ^ {*} \ varphi _ {k} ^ {*} (t)] _ { L ^ {2}}}

Usando la ortonormalidad de ${\ Displaystyle \ varphi _ {k} (t)}$ y expandiendo ${\ Displaystyle X_ {t}}$ en el ${\ Displaystyle \ varphi _ {k} (t)}$ base, obtenemos que el tamaño de la mano derecha es igual a:

{\ Displaystyle \ mathbb {E} [X_ {t}] _ {L ^ {2}} ^ {2} - \ sum _ {k = 1} ^ {N} \ mathbb {E} [| W_ {k} | ^ {2}]}

Podemos realizar un análisis idéntico para el ${\ Displaystyle e_ {k} (t)}$ , y reescriba la desigualdad anterior como:

{\ Displaystyle {\ Displaystyle \ mathbb {E} [X_ {t}] _ {L ^ {2}} ^ {2} - \ sum _ {k = 1} ^ {N} \ mathbb {E} [| Z_ {k} | ^ {2}]} \ leq {\ Displaystyle \ mathbb {E} [X_ {t}] _ {L ^ {2}} ^ {2} - \ sum _ {k = 1} ^ {N } \ mathbb {E} [| W_ {k} | ^ {2}]}}

Restar el primer término común y dividir por ${\ Displaystyle \ mathbb {E} [| X_ {t} | _ {L ^ {2}} ^ {2}]}$ , obtenemos que:

{\ Displaystyle \ sum _ {k = 1} ^ {N} p_ {k} \ geq \ sum _ {k = 1} ^ {N} q_ {k}}

Esto implica que:

{\ Displaystyle - \ sum _ {k = 1} ^ {\ infty} p_ {k} \ log (p_ {k}) \ leq - \ sum _ {k = 1} ^ {\ infty} q_ {k} \ log (q_ {k})}

Aproximaciones lineales de Karhunen-Loève

Considere toda una clase de señales que queremos aproximar sobre los primeros $M$ vectores de una base. Estas señales se modelan como realizaciones de un vector aleatorio $Y [n]$ de tamaño $N$ . Para optimizar la aproximación diseñamos una base que minimiza el error de aproximación promedio. En esta sección se demuestra que las bases óptimas son bases Karhunen-Loeve que diagonalizan la matriz de covarianza de $Y$ . El vector aleatorio $Y$ se puede descomponer de forma ortogonal

{\ Displaystyle \ left \ {g_ {m} \ right \} _ {0 \ leq m \ leq N}}

como sigue:

{\ Displaystyle Y = \ sum _ {m = 0} ^ {N-1} \ left \ langle Y, g_ {m} \ right \ rangle g_ {m},}

donde cada

{\ Displaystyle \ left \ langle Y, g_ {m} \ right \ rangle = \ sum _ {n = 0} ^ {N-1} {Y [n]} g_ {m} ^ {*} [n]}

es una variable aleatoria. La aproximación de los primeros $M \leq N$ vectores de la base es

{\ Displaystyle Y_ {M} = \ sum _ {m = 0} ^ {M-1} \ left \ langle Y, g_ {m} \ right \ rangle g_ {m}}

La conservación de energía en forma ortogonal implica

{\ Displaystyle \ varepsilon [M] = \ mathbf {E} \ left \ {\ left \ | Y-Y_ {M} \ right \ | ^ {2} \ right \} = \ sum _ {m = M} ^ {N-1} \ mathbf {E} \ left \ {\ left | \ left \ langle Y, g_ {m} \ right \ rangle \ right | ^ {2} \ right \}}

Este error está relacionado con la covarianza de $Y$ definida por

{\ Displaystyle R [n, m] = \ mathbf {E} \ left \ {Y [n] Y ^ {*} [m] \ right \}}

Para cualquier vector $x [n]$ denotamos por $K$ el operador de covarianza representado por esta matriz,

{\ Displaystyle \ mathbf {E} \ left \ {\ left | \ langle Y, x \ rangle \ right | ^ {2} \ right \} = \ langle Kx, x \ rangle = \ sum _ {n = 0} ^ {N-1} \ sum _ {m = 0} ^ {N-1} R [n, m] x [n] x ^ {*} [m]}

El error $ε [M]$ es, por tanto, una suma de los últimos $N - M$ coeficientes del operador de covarianza.

{\ Displaystyle \ varepsilon [M] = \ sum _ {m = M} ^ {N-1} {\ left \ langle Kg_ {m}, g_ {m} \ right \ rangle}}

El operador de covarianza $K$ es hermitiano y positivo y, por lo tanto, está diagonalizado en una base ortogonal llamada base Karhunen-Loève. El siguiente teorema establece que una base de Karhunen-Loève es óptima para aproximaciones lineales.

Teorema (Optimidad de base Karhunen-Loève). Sea $K$ un operador de covarianza. Para todo $M \geq 1$ , el error de aproximación

{\ Displaystyle \ varepsilon [M] = \ sum _ {m = M} ^ {N-1} \ left \ langle Kg_ {m}, g_ {m} \ right \ rangle}

es mínimo si y solo si

{\ Displaystyle \ left \ {g_ {m} \ right \} _ {0 \ leq m

es una base de Karhunen-Loeve ordenada por valores propios decrecientes.

{\displaystyle \left\langle Kg_{m},g_{m}\right\rangle \geq \left\langle Kg_{m+1},g_{m+1}\right\rangle ,\qquad 0\leq m

Aproximación no lineal en bases

Linear approximations project the signal on M vectors a priori. The approximation can be made more precise by choosing the M orthogonal vectors depending on the signal properties. This section analyzes the general performance of these non-linear approximations. A signal $f\in \mathrm {H}$ is approximated with M vectors selected adaptively in an orthonormal basis for $\mathrm {H}$ ^{[definition needed]}

\mathrm {B} =\left\{g_{m}\right\}_{m\in \mathbb {N} }

Let $f_{M}$ be the projection of f over M vectors whose indices are in $I M$ :

f_{M}=\sum _{m\in I_{M}}\left\langle f,g_{m}\right\rangle g_{m}

The approximation error is the sum of the remaining coefficients

\varepsilon [M]=\left\{\left\|f-f_{M}\right\|^{2}\right\}=\sum _{m\notin I_{M}}^{N-1}\left\{\left|\left\langle f,g_{m}\right\rangle \right|^{2}\right\}

To minimize this error, the indices in $I M$ must correspond to the M vectors having the largest inner product amplitude

\left|\left\langle f,g_{m}\right\rangle \right|.

These are the vectors that best correlate f. They can thus be interpreted as the main features of f. The resulting error is necessarily smaller than the error of a linear approximation which selects the M approximation vectors independently of f. Let us sort

\left\{\left|\left\langle f,g_{m}\right\rangle \right|\right\}_{m\in \mathbb {N} }

in decreasing order

\left|\left\langle f,g_{m_{k}}\right\rangle \right|\geq \left|\left\langle f,g_{m_{k+1}}\right\rangle \right|.

The best non-linear approximation is

f_{M}=\sum _{k=1}^{M}\left\langle f,g_{m_{k}}\right\rangle g_{m_{k}}

It can also be written as inner product thresholding:

f_{M}=\sum _{m=0}^{\infty }\theta _{T}\left(\left\langle f,g_{m}\right\rangle \right)g_{m}

with

{\displaystyle T=\left|\left\langle f,g_{m_{M}}\right\rangle \right|,\qquad \theta _{T}(x)={\begin{cases}x&|x|\geq T\\0&|x|

The non-linear error is

\varepsilon [M]=\left\{\left\|f-f_{M}\right\|^{2}\right\}=\sum _{k=M+1}^{\infty }\left\{\left|\left\langle f,g_{m_{k}}\right\rangle \right|^{2}\right\}

this error goes quickly to zero as M increases, if the sorted values of $\left|\left\langle f,g_{m_{k}}\right\rangle \right|$ have a fast decay as k increases. This decay is quantified by computing the $\mathrm {I} ^{\mathrm {P} }$ norm of the signal inner products in B:

\|f\|_{\mathrm {B} ,p}=\left(\sum _{m=0}^{\infty }\left|\left\langle f,g_{m}\right\rangle \right|^{p}\right)^{\frac {1}{p}}

The following theorem relates the decay of $ε [M]$ to $\|f\|_{\mathrm {B} ,p}$

Theorem (decay of error). If $\|f\|_{\mathrm {B} ,p}<\infty$ with $p < 2$ then

\varepsilon [M]\leq {\frac {\|f\|_{\mathrm {B} ,p}^{2}}{{\frac {2}{p}}-1}}M^{1-{\frac {2}{p}}}

and

\varepsilon [M]=o\left(M^{1-{\frac {2}{p}}}\right).

Conversely, if $\varepsilon [M]=o\left(M^{1-{\frac {2}{p}}}\right)$ then

$\|f\|_{\mathrm {B} ,q}<\infty$ for any $q > p$ .

Non-optimality of Karhunen–Loève bases

To further illustrate the differences between linear and non-linear approximations, we study the decomposition of a simple non-Gaussian random vector in a Karhunen–Loève basis. Processes whose realizations have a random translation are stationary. The Karhunen–Loève basis is then a Fourier basis and we study its performance. To simplify the analysis, consider a random vector Y[n] of size N that is random shift modulo N of a deterministic signal f[n] of zero mean

\sum _{n=0}^{N-1}f[n]=0

Y[n]=f[(n-p){\bmod {N}}]

The random shift P is uniformly distributed on [0, N − 1]:

{\displaystyle \Pr(P=p)={\frac {1}{N}},\qquad 0\leq p

Clearly

\mathbf {E} \{Y[n]\}={\frac {1}{N}}\sum _{p=0}^{N-1}f[(n-p){\bmod {N}}]=0

and

R[n,k]=\mathbf {E} \{Y[n]Y[k]\}={\frac {1}{N}}\sum _{p=0}^{N-1}f[(n-p){\bmod {N}}]f[(k-p){\bmod {N}}]={\frac {1}{N}}f\Theta {\bar {f}}[n-k],\quad {\bar {f}}[n]=f[-n]

Hence

R[n,k]=R_{Y}[n-k],\qquad R_{Y}[k]={\frac {1}{N}}f\Theta {\bar {f}}[k]

Since R_Y is N periodic, Y is a circular stationary random vector. The covariance operator is a circular convolution with R_Y and is therefore diagonalized in the discrete Fourier Karhunen–Loève basis

{\displaystyle \left\{{\frac {1}{\sqrt {N}}}e^{i2\pi mn/N}\right\}_{0\leq m

The power spectrum is Fourier transform of $R Y$ :

P_{Y}[m]={\hat {R}}_{Y}[m]={\frac {1}{N}}\left|{\hat {f}}[m]\right|^{2}

Example: Consider an extreme case where $f[n]=\delta [n]-\delta [n-1]$ . A theorem stated above guarantees that the Fourier Karhunen–Loève basis produces a smaller expected approximation error than a canonical basis of Diracs ${\displaystyle \left\{g_{m}[n]=\delta [n-m]\right\}_{0\leq m$ . Indeed, we do not know a priori the abscissa of the non-zero coefficients of Y, so there is no particular Dirac that is better adapted to perform the approximation. But the Fourier vectors cover the whole support of Y and thus absorb a part of the signal energy.

\mathbf {E} \left\{\left|\left\langle Y[n],{\frac {1}{\sqrt {N}}}e^{i2\pi mn/N}\right\rangle \right|^{2}\right\}=P_{Y}[m]={\frac {4}{N}}\sin ^{2}\left({\frac {\pi k}{N}}\right)

Selecting higher frequency Fourier coefficients yields a better mean-square approximation than choosing a priori a few Dirac vectors to perform the approximation. The situation is totally different for non-linear approximations. If $f[n]=\delta [n]-\delta [n-1]$ then the discrete Fourier basis is extremely inefficient because f and hence Y have an energy that is almost uniformly spread among all Fourier vectors. In contrast, since f has only two non-zero coefficients in the Dirac basis, a non-linear approximation of Y with $M \geq 2$ gives zero error.^[6]

Análisis de componentes principales

We have established the Karhunen–Loève theorem and derived a few properties thereof. We also noted that one hurdle in its application was the numerical cost of determining the eigenvalues and eigenfunctions of its covariance operator through the Fredholm integral equation of the second kind

\int _{a}^{b}K_{X}(s,t)e_{k}(s)\,ds=\lambda _{k}e_{k}(t).

However, when applied to a discrete and finite process $\left(X_{n}\right)_{n\in \{1,\ldots ,N\}}$ , the problem takes a much simpler form and standard algebra can be used to carry out the calculations.

Note that a continuous process can also be sampled at N points in time in order to reduce the problem to a finite version.

We henceforth consider a random N-dimensional vector $X=\left(X_{1}~X_{2}~\ldots ~X_{N}\right)^{T}$ . As mentioned above, X could contain N samples of a signal but it can hold many more representations depending on the field of application. For instance it could be the answers to a survey or economic data in an econometrics analysis.

As in the continuous version, we assume that X is centered, otherwise we can let $X:=X-\mu _{X}$ (where $\mu _{X}$ is the mean vector of X) which is centered.

Let us adapt the procedure to the discrete case.

Covariance matrix

Recall that the main implication and difficulty of the KL transformation is computing the eigenvectors of the linear operator associated to the covariance function, which are given by the solutions to the integral equation written above.

Define Σ, the covariance matrix of X, as an N × N matrix whose elements are given by:

\Sigma _{ij}=\mathbf {E} [X_{i}X_{j}],\qquad \forall i,j\in \{1,\ldots ,N\}

Rewriting the above integral equation to suit the discrete case, we observe that it turns into:

\sum _{j=1}^{N}\Sigma _{ij}e_{j}=\lambda e_{i}\quad \Leftrightarrow \quad \Sigma e=\lambda e

where $e=(e_{1}~e_{2}~\ldots ~e_{N})^{T}$ is an N-dimensional vector.

The integral equation thus reduces to a simple matrix eigenvalue problem, which explains why the PCA has such a broad domain of applications.

Since Σ is a positive definite symmetric matrix, it possesses a set of orthonormal eigenvectors forming a basis of $\mathbb {R} ^{N}$ , and we write $\{\lambda _{i},\varphi _{i}\}_{i\in \{1,\ldots ,N\}}$ this set of eigenvalues and corresponding eigenvectors, listed in decreasing values of $λ i$ . Let also $Φ$ be the orthonormal matrix consisting of these eigenvectors:

{\begin{aligned}\Phi &:=\left(\varphi _{1}~\varphi _{2}~\ldots ~\varphi _{N}\right)^{T}\\\Phi ^{T}\Phi &=I\end{aligned}}

Principal component transform

It remains to perform the actual KL transformation, called the principal component transform in this case. Recall that the transform was found by expanding the process with respect to the basis spanned by the eigenvectors of the covariance function. In this case, we hence have:

X=\sum _{i=1}^{N}\langle \varphi _{i},X\rangle \varphi _{i}=\sum _{i=1}^{N}\varphi _{i}^{T}X\varphi _{i}

In a more compact form, the principal component transform of X is defined by:

{\begin{cases}Y=\Phi ^{T}X\\X=\Phi Y\end{cases}}

The i-th component of Y is $Y_{i}=\varphi _{i}^{T}X$ , the projection of X on $\varphi _{i}$ and the inverse transform $X = Φ Y$ yields the expansion of $X$ on the space spanned by the $\varphi _{i}$ :

X=\sum _{i=1}^{N}Y_{i}\varphi _{i}=\sum _{i=1}^{N}\langle \varphi _{i},X\rangle \varphi _{i}

As in the continuous case, we may reduce the dimensionality of the problem by truncating the sum at some $K\in \{1,\ldots ,N\}$ such that

{\frac {\sum _{i=1}^{K}\lambda _{i}}{\sum _{i=1}^{N}\lambda _{i}}}\geq \alpha

where α is the explained variance threshold we wish to set.

We can also reduce the dimensionality through the use of multilevel dominant eigenvector estimation (MDEE).^[7]

Ejemplos de

The Wiener process

There are numerous equivalent characterizations of the Wiener process which is a mathematical formalization of Brownian motion. Here we regard it as the centered standard Gaussian process W_t with covariance function

K_{W}(t,s)=\operatorname {cov} (W_{t},W_{s})=\min(s,t).

We restrict the time domain to [a, b]=[0,1] without loss of generality.

The eigenvectors of the covariance kernel are easily determined. These are

e_{k}(t)={\sqrt {2}}\sin \left(\left(k-{\tfrac {1}{2}}\right)\pi t\right)

and the corresponding eigenvalues are

\lambda _{k}={\frac {1}{(k-{\frac {1}{2}})^{2}\pi ^{2}}}.

[Proof]

In order to find the eigenvalues and eigenvectors, we need to solve the integral equation:

{\begin{aligned}\int _{a}^{b}K_{W}(s,t)e(s)\,ds&=\lambda e(t)\qquad \forall t,0\leq t\leq 1\\\int _{0}^{1}\min(s,t)e(s)\,ds&=\lambda e(t)\qquad \forall t,0\leq t\leq 1\\\int _{0}^{t}se(s)\,ds+t\int _{t}^{1}e(s)\,ds&=\lambda e(t)\qquad \forall t,0\leq t\leq 1\end{aligned}}

differentiating once with respect to t yields:

\int _{t}^{1}e(s)\,ds=\lambda e'(t)

a second differentiation produces the following differential equation:

-e(t)=\lambda e''(t)

The general solution of which has the form:

e(t)=A\sin \left({\frac {t}{\sqrt {\lambda }}}\right)+B\cos \left({\frac {t}{\sqrt {\lambda }}}\right)

where A and B are two constants to be determined with the boundary conditions. Setting t = 0 in the initial integral equation gives e(0) = 0 which implies that B = 0 and similarly, setting t = 1 in the first differentiation yields e' (1) = 0, whence:

\cos \left({\frac {1}{\sqrt {\lambda }}}\right)=0

which in turn implies that eigenvalues of T_{K_X} are:

\lambda _{k}=\left({\frac {1}{(k-{\frac {1}{2}})\pi }}\right)^{2},\qquad k\geq 1

The corresponding eigenfunctions are thus of the form:

e_{k}(t)=A\sin \left((k-{\frac {1}{2}})\pi t\right),\qquad k\geq 1

A is then chosen so as to normalize e_k:

\int _{0}^{1}e_{k}^{2}(t)\,dt=1\quad \implies \quad A={\sqrt {2}}

This gives the following representation of the Wiener process:

Theorem. There is a sequence {Z_i}_i of independent Gaussian random variables with mean zero and variance 1 such that

W_{t}={\sqrt {2}}\sum _{k=1}^{\infty }Z_{k}{\frac {\sin \left(\left(k-{\frac {1}{2}}\right)\pi t\right)}{\left(k-{\frac {1}{2}}\right)\pi }}.

Note that this representation is only valid for $t\in [0,1].$ On larger intervals, the increments are not independent. As stated in the theorem, convergence is in the L² norm and uniform in t.

The Brownian bridge

Similarly the Brownian bridge $B_{t}=W_{t}-tW_{1}$ which is a stochastic process with covariance function

K_{B}(t,s)=\min(t,s)-ts

can be represented as the series

B_{t}=\sum _{k=1}^{\infty }Z_{k}{\frac {{\sqrt {2}}\sin(k\pi t)}{k\pi }}

Aplicaciones

Adaptive optics systems sometimes use K–L functions to reconstruct wave-front phase information (Dai 1996, JOSA A). Karhunen–Loève expansion is closely related to the Singular Value Decomposition. The latter has myriad applications in image processing, radar, seismology, and the like. If one has independent vector observations from a vector valued stochastic process then the left singular vectors are maximum likelihood estimates of the ensemble KL expansion.

Applications in signal estimation and detection

Detection of a known continuous signal S(t)

In communication, we usually have to decide whether a signal from a noisy channel contains valuable information. The following hypothesis testing is used for detecting continuous signal s(t) from channel output X(t), N(t) is the channel noise, which is usually assumed zero mean Gaussian process with correlation function $R_{N}(t,s)=E[N(t)N(s)]$

H:X(t)=N(t),

K:X(t)=N(t)+s(t),\quad t\in (0,T)

Signal detection in white noise

When the channel noise is white, its correlation function is

R_{N}(t)={\tfrac {1}{2}}N_{0}\delta (t),

and it has constant power spectrum density. In physically practical channel, the noise power is finite, so:

S_{N}(f)={\begin{cases}{\frac {N_{0}}{2}}&|f|w\end{cases}}

Then the noise correlation function is sinc function with zeros at ${\frac {n}{2\omega }},n\in \mathbf {Z} .$ Since are uncorrelated and gaussian, they are independent. Thus we can take samples from X(t) with time spacing

\Delta t={\frac {n}{2\omega }}{\text{ within }}(0,''T'').

Let $X_{i}=X(i\,\Delta t)$ . We have a total of $n={\frac {T}{\Delta t}}=T(2\omega )=2\omega T$ i.i.d observations $\{X_{1},X_{2},\ldots ,X_{n}\}$ to develop the likelihood-ratio test. Define signal $S_{i}=S(i\,\Delta t)$ , the problem becomes,

H:X_{i}=N_{i},

K:X_{i}=N_{i}+S_{i},i=1,2,\ldots ,n.

The log-likelihood ratio

{\mathcal {L}}({\underline {x}})=\log {\frac {\sum _{i=1}^{n}(2S_{i}x_{i}-S_{i}^{2})}{2\sigma ^{2}}}\Leftrightarrow \Delta t\sum _{i=1}^{n}S_{i}x_{i}=\sum _{i=1}^{n}S(i\,\Delta t)x(i\,\Delta t)\,\Delta t\gtrless \lambda _{\cdot }2

As $t \to 0$ , let:

G=\int _{0}^{T}S(t)x(t)\,dt.

Then G is the test statistics and the Neyman–Pearson optimum detector is

{\displaystyle G({\underline {x}})>G_{0}\Rightarrow K

As G is Gaussian, we can characterize it by finding its mean and variances. Then we get

H:G\sim N\left(0,{\tfrac {1}{2}}N_{0}E\right)

K:G\sim N\left(E,{\tfrac {1}{2}}N_{0}E\right)

where

\mathbf {E} =\int _{0}^{T}S^{2}(t)\,dt

is the signal energy.

The false alarm error

\alpha =\int _{G_{0}}^{\infty }N\left(0,{\tfrac {1}{2}}N_{0}E\right)\,dG\Rightarrow G_{0}={\sqrt {{\tfrac {1}{2}}N_{0}E}}\Phi ^{-1}(1-\alpha )

And the probability of detection:

\beta =\int _{G_{0}}^{\infty }N\left(E,{\tfrac {1}{2}}N_{0}E\right)\,dG=1-\Phi \left({\frac {G_{0}-E}{\sqrt {{\tfrac {1}{2}}N_{0}E}}}\right)=\Phi \left({\sqrt {\frac {2E}{N_{0}}}}-\Phi ^{-1}(1-\alpha )\right),

where Φ is the cdf of standard normal, or Gaussian, variable.

Signal detection in colored noise

When N(t) is colored (correlated in time) Gaussian noise with zero mean and covariance function $R_{N}(t,s)=E[N(t)N(s)],$ we cannot sample independent discrete observations by evenly spacing the time. Instead, we can use K–L expansion to uncorrelate^{[check spelling]} the noise process and get independent Gaussian observation 'samples'. The K–L expansion of N(t):

{\displaystyle N(t)=\sum _{i=1}^{\infty }N_{i}\Phi _{i}(t),\quad 0

where $N_{i}=\int N(t)\Phi _{i}(t)\,dt$ and the orthonormal bases $\{\Phi _{i}{t}\}$ are generated by kernel $R_{N}(t,s)$ , i.e., solution to

\int _{0}^{T}R_{N}(t,s)\Phi _{i}(s)\,ds=\lambda _{i}\Phi _{i}(t),\quad \operatorname {var} [N_{i}]=\lambda _{i}.

Do the expansion:

S(t)=\sum _{i=1}^{\infty }S_{i}\Phi _{i}(t),

where $S_{i}=\int _{0}^{T}S(t)\Phi _{i}(t)\,dt$ , then

X_{i}=\int _{0}^{T}X(t)\Phi _{i}(t)\,dt=N_{i}

under H and $N_{i}+S_{i}$ under K. Let ${\overline {X}}=\{X_{1},X_{2},\dots \}$ , we have

N_{i}

are independent Gaussian r.v's with variance

\lambda _{i}

under H:

\{X_{i}\}

are independent Gaussian r.v's.

{\displaystyle f_{H}[x(t)|0

under K:

\{X_{i}-S_{i}\}

are independent Gaussian r.v's.

{\displaystyle f_{K}[x(t)\mid 0

Hence, the log-LR is given by

{\mathcal {L}}({\underline {x}})=\sum _{i=1}^{\infty }{\frac {2S_{i}x_{i}-S_{i}^{2}}{2\lambda _{i}}}

and the optimum detector is

{\displaystyle G=\sum _{i=1}^{\infty }S_{i}x_{i}\lambda _{i}>G_{0}\Rightarrow K,

Define

{\displaystyle k(t)=\sum _{i=1}^{\infty }\lambda _{i}S_{i}\Phi _{i}(t),0

then $G=\int _{0}^{T}k(t)x(t)\,dt.$

How to find k(t)

Since

\int _{0}^{T}R_{N}(t,s)k(s)\,ds=\sum _{i=1}^{\infty }\lambda _{i}S_{i}\int _{0}^{T}R_{N}(t,s)\Phi _{i}(s)\,ds=\sum _{i=1}^{\infty }S_{i}\Phi _{i}(t)=S(t),

k(t) is the solution to

\int _{0}^{T}R_{N}(t,s)k(s)\,ds=S(t).

If N(t)is wide-sense stationary,

\int _{0}^{T}R_{N}(t-s)k(s)\,ds=S(t),

which is known as the Wiener–Hopf equation. The equation can be solved by taking fourier transform, but not practically realizable since infinite spectrum needs spatial factorization. A special case which is easy to calculate k(t) is white Gaussian noise.

{\displaystyle \int _{0}^{T}{\frac {N_{0}}{2}}\delta (t-s)k(s)\,ds=S(t)\Rightarrow k(t)=CS(t),\quad 0

The corresponding impulse response is h(t) = k(T − t) = CS(T − t). Let C = 1, this is just the result we arrived at in previous section for detecting of signal in white noise.

Test threshold for Neyman–Pearson detector

Since X(t) is a Gaussian process,

G=\int _{0}^{T}k(t)x(t)\,dt,

is a Gaussian random variable that can be characterized by its mean and variance.

{\begin{aligned}\mathbf {E} [G\mid H]&=\int _{0}^{T}k(t)\mathbf {E} [x(t)\mid H]\,dt=0\\\mathbf {E} [G\mid K]&=\int _{0}^{T}k(t)\mathbf {E} [x(t)\mid K]\,dt=\int _{0}^{T}k(t)S(t)\,dt\equiv \rho \\\mathbf {E} [G^{2}\mid H]&=\int _{0}^{T}\int _{0}^{T}k(t)k(s)R_{N}(t,s)\,dt\,ds=\int _{0}^{T}k(t)\left(\int _{0}^{T}k(s)R_{N}(t,s)\,ds\right)=\int _{0}^{T}k(t)S(t)\,dt=\rho \\\operatorname {var} [G\mid H]&=\mathbf {E} [G^{2}\mid H]-(\mathbf {E} [G\mid H])^{2}=\rho \\\mathbf {E} [G^{2}\mid K]&=\int _{0}^{T}\int _{0}^{T}k(t)k(s)\mathbf {E} [x(t)x(s)]\,dt\,ds=\int _{0}^{T}\int _{0}^{T}k(t)k(s)(R_{N}(t,s)+S(t)S(s))\,dt\,ds=\rho +\rho ^{2}\\\operatorname {var} [G\mid K]&=\mathbf {E} [G^{2}|K]-(\mathbf {E} [G|K])^{2}=\rho +\rho ^{2}-\rho ^{2}=\rho \end{aligned}}

Hence, we obtain the distributions of H and K:

H:G\sim N(0,\rho )

K:G\sim N(\rho ,\rho )

The false alarm error is

\alpha =\int _{G_{0}}^{\infty }N(0,\rho )\,dG=1-\Phi \left({\frac {G_{0}}{\sqrt {\rho }}}\right).

So the test threshold for the Neyman–Pearson optimum detector is

G_{0}={\sqrt {\rho }}\Phi ^{-1}(1-\alpha ).

Its power of detection is

\beta =\int _{G_{0}}^{\infty }N(\rho ,\rho )\,dG=\Phi \left({\sqrt {\rho }}-\Phi ^{-1}(1-\alpha )\right)

When the noise is white Gaussian process, the signal power is

\rho =\int _{0}^{T}k(t)S(t)\,dt=\int _{0}^{T}S(t)^{2}\,dt=E.

Prewhitening

For some type of colored noise, a typical practise is to add a prewhitening filter before the matched filter to transform the colored noise into white noise. For example, N(t) is a wide-sense stationary colored noise with correlation function

R_{N}(\tau )={\frac {BN_{0}}{4}}e^{-B|\tau |}

S_{N}(f)={\frac {N_{0}}{2(1+({\frac {w}{B}})^{2})}}

The transfer function of prewhitening filter is

H(f)=1+j{\frac {w}{B}}.

Detection of a Gaussian random signal in Additive white Gaussian noise (AWGN)

When the signal we want to detect from the noisy channel is also random, for example, a white Gaussian process X(t), we can still implement K–L expansion to get independent sequence of observation. In this case, the detection problem is described as follows:

H_{0}:Y(t)=N(t)

{\displaystyle H_{1}:Y(t)=N(t)+X(t),\quad 0

X(t) is a random process with correlation function $R_{X}(t,s)=E\{X(t)X(s)\}$

The K–L expansion of X(t) is

X(t)=\sum _{i=1}^{\infty }X_{i}\Phi _{i}(t),

where

X_{i}=\int _{0}^{T}X(t)\Phi _{i}(t)\,dt

and $\Phi _{i}(t)$ are solutions to

\int _{0}^{T}R_{X}(t,s)\Phi _{i}(s)ds=\lambda _{i}\Phi _{i}(t).

So $X_{i}$ 's are independent sequence of r.v's with zero mean and variance $\lambda _{i}$ . Expanding Y(t) and N(t) by $\Phi _{i}(t)$ , we get

Y_{i}=\int _{0}^{T}Y(t)\Phi _{i}(t)\,dt=\int _{0}^{T}[N(t)+X(t)]\Phi _{i}(t)=N_{i}+X_{i},

where

N_{i}=\int _{0}^{T}N(t)\Phi _{i}(t)\,dt.

As N(t) is Gaussian white noise, $N_{i}$ 's are i.i.d sequence of r.v with zero mean and variance ${\tfrac {1}{2}}N_{0}$ , then the problem is simplified as follows,

H_{0}:Y_{i}=N_{i}

H_{1}:Y_{i}=N_{i}+X_{i}

The Neyman–Pearson optimal test:

\Lambda ={\frac {f_{Y}\mid H_{1}}{f_{Y}\mid H_{0}}}=Ce^{-\sum _{i=1}^{\infty }{\frac {y_{i}^{2}}{2}}{\frac {\lambda _{i}}{{\tfrac {1}{2}}N_{0}({\tfrac {1}{2}}N_{0}+\lambda _{i})}}},

so the log-likelihood ratio is

{\mathcal {L}}=\ln(\Lambda )=K-\sum _{i=1}^{\infty }{\tfrac {1}{2}}y_{i}^{2}{\frac {\lambda _{i}}{{\frac {N_{0}}{2}}\left({\frac {N_{0}}{2}}+\lambda _{i}\right)}}.

Since

{\widehat {X}}_{i}={\frac {\lambda _{i}}{{\frac {N_{0}}{2}}\left({\frac {N_{0}}{2}}+\lambda _{i}\right)}}

is just the minimum-mean-square estimate of $X_{i}$ given $Y_{i}$ 's,

{\mathcal {L}}=K+{\frac {1}{N_{0}}}\sum _{i=1}^{\infty }Y_{i}{\widehat {X}}_{i}.

K–L expansion has the following property: If

f(t)=\sum f_{i}\Phi _{i}(t),g(t)=\sum g_{i}\Phi _{i}(t),

where

f_{i}=\int _{0}^{T}f(t)\Phi _{i}(t)\,dt,\quad g_{i}=\int _{0}^{T}g(t)\Phi _{i}(t)\,dt.

then

\sum _{i=1}^{\infty }f_{i}g_{i}=\int _{0}^{T}g(t)f(t)\,dt.

So let

{\widehat {X}}(t\mid T)=\sum _{i=1}^{\infty }{\widehat {X}}_{i}\Phi _{i}(t),\quad {\mathcal {L}}=K+{\frac {1}{N_{0}}}\int _{0}^{T}Y(t){\widehat {X}}(t\mid T)\,dt.

Noncausal filter Q(t,s) can be used to get the estimate through

{\widehat {X}}(t\mid T)=\int _{0}^{T}Q(t,s)Y(s)\,ds.

By orthogonality principle, Q(t,s) satisfies

{\displaystyle \int _{0}^{T}Q(t,s)R_{X}(s,t)\,ds+{\tfrac {N_{0}}{2}}Q(t,\lambda )=R_{X}(t,\lambda ),0<\lambda

However, for practical reasons, it's necessary to further derive the causal filter h(t,s), where h(t,s) = 0 for s > t, to get estimate ${\widehat {X}}(t\mid t)$ . Specifically,

Q(t,s)=h(t,s)+h(s,t)-\int _{0}^{T}h(\lambda ,t)h(s,\lambda )\,d\lambda

Ver también

Principal component analysis
Polynomial chaos

Notas

^ Sapatnekar, Sachin (2011), "Overcoming variations in nanometer-scale technologies", IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 1 (1): 5–18, Bibcode:2011IJEST...1....5S, CiteSeerX 10.1.1.300.5659, doi:10.1109/jetcas.2011.2138250
^ Ghoman, Satyajit; Wang, Zhicun; Chen, PC; Kapania, Rakesh (2012), A POD-based Reduced Order Design Scheme for Shape Optimization of Air Vehicles Unknown parameter |book-title= ignored (help)
^ Karhunen–Loeve transform (KLT), Computer Image Processing and Analysis (E161) lectures, Harvey Mudd College
^ Raju, C.K. (2009), "Kosambi the Mathematician", Economic and Political Weekly, 44 (20): 33–45
^ Kosambi, D. D. (1943), "Statistics in Function Space", Journal of the Indian Mathematical Society, 7: 76–88, MR 0009816.
^ A wavelet tour of signal processing-Stéphane Mallat
^ X. Tang, “Texture information in run-length matrices,” IEEE Transactions on Image Processing, vol. 7, No. 11, pp. 1602–1609, Nov. 1998

Referencias

Stark, Henry; Woods, John W. (1986). Probability, Random Processes, and Estimation Theory for Engineers. Prentice-Hall, Inc. ISBN 978-0-13-711706-2. OL 21138080M.
Ghanem, Roger; Spanos, Pol (1991). Stochastic finite elements: a spectral approach. Springer-Verlag. ISBN 978-0-387-97456-9. OL 1865197M.
Guikhman, I.; Skorokhod, A. (1977). Introduction a la Théorie des Processus Aléatoires. Éditions MIR.
Simon, B. (1979). Functional Integration and Quantum Physics. Academic Press.
Karhunen, Kari (1947). "Über lineare Methoden in der Wahrscheinlichkeitsrechnung". Ann. Acad. Sci. Fennicae. Ser. A. I. Math.-Phys. 37: 1–79.
Loève, M. (1978). Probability theory. Vol. II, 4th ed. Graduate Texts in Mathematics. 46. Springer-Verlag. ISBN 978-0-387-90262-3.
Dai, G. (1996). "Modal wave-front reconstruction with Zernike polynomials and Karhunen–Loeve functions". JOSA A. 13 (6): 1218. Bibcode:1996JOSAA..13.1218D. doi:10.1364/JOSAA.13.001218.
Wu B., Zhu J., Najm F.(2005) "A Non-parametric Approach for Dynamic Range Estimation of Nonlinear Systems". In Proceedings of Design Automation Conference(841-844) 2005
Wu B., Zhu J., Najm F.(2006) "Dynamic Range Estimation". IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 25 Issue:9 (1618–1636) 2006
Jorgensen, Palle E. T.; Song, Myung-Sin (2007). "Entropy Encoding, Hilbert Space and Karhunen–Loeve Transforms". Journal of Mathematical Physics. 48 (10): 103503. arXiv:math-ph/0701056. Bibcode:2007JMP....48j3503J. doi:10.1063/1.2793569.

enlaces externos

Mathematica KarhunenLoeveDecomposition function.
E161: Computer Image Processing and Analysis notes by Pr. Ruye Wang at Harvey Mudd College [1]

[sapatnekar-1] Sapatnekar, Sachin (2011), "Overcoming variations in nanometer-scale technologies", IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 1 (1): 5–18, Bibcode:2011IJEST...1....5S, CiteSeerX 10.1.1.300.5659, doi:10.1109/jetcas.2011.2138250

[ghoman-2] Ghoman, Satyajit; Wang, Zhicun; Chen, PC; Kapania, Rakesh (2012), A POD-based Reduced Order Design Scheme for Shape Optimization of Air Vehicles Unknown parameter |book-title= ignored (help)

[3] Karhunen–Loeve transform (KLT), Computer Image Processing and Analysis (E161) lectures, Harvey Mudd College

[Raju-4] Raju, C.K. (2009), "Kosambi the Mathematician", Economic and Political Weekly, 44 (20): 33–45

[Kosambi-5] Kosambi, D. D. (1943), "Statistics in Function Space", Journal of the Indian Mathematical Society, 7: 76–88, MR 0009816.

[6] A wavelet tour of signal processing-Stéphane Mallat

[7] X. Tang, “Texture information in run-length matrices,” IEEE Transactions on Image Processing, vol. 7, No. 11, pp. 1602–1609, Nov. 1998

[1]