Time Series Analysis
Best MSE (Mean Square Error) Predictor
对于所有可能的预测函数 (f(X_{n})),找到一个使 (mathbb{E}big[big(X_{n} - f(X_{n})big)^{2} big]) 最小的 (f) 的 predictor。这样的 predictor 假设记为 (m(X_{n})), 称作 best MSE predictor,i.e.,
[m(X_{n}) = mathop{argmin}limits_{f} mathbb{E}big[ big( X_{n+h} - f(X_{n}) big)^{2} big] ]
我们知道:(mathop{argmin}limits_{f} mathbb{E}big[ big( X_{n+h} - f(X_{n}) big)^{2} big]) 的解即为:
[mathbb{E}big[ X_{n+h} ~ big| ~ X_{n} big] ]
证明:
基于 (X_{n}) 求 (mathbb{E}big[ big( X_{n+h} - f(X_{n}) big)^{2} big]) 的最小值,实际上:
[mathop{argmin}limits_{f} mathbb{E}big[ big( X_{n+h} - f(X_{n}) big)^{2} big] iff mathop{argmin}limits_{f} mathbb{E}big[ big( X_{n+h} - f(X_{n}) big)^{2} ~ big| ~ X_{n} big] ]
- 私以为更严谨的写法是 (mathop{text{argmin}}limits_{f} ~ mathbb{E}Big[Big(X_{n+h} - fbig( X_{n}big)Big)^{2} ~ | ~ mathcal{F}_{n}Big]),其中 (left{ mathcal{F}_{t}right}_{tgeq 0}) 为 (left{ X_{t} right}_{tgeq 0}) 相关的 natural filtration,but whatever。
等式右侧之部分:
[begin{align*} mathbb{E}big[ big( X_{n+h} - f(X_{n}) big)^{2} ~ big| ~ X_{n} big] & = mathbb{E}[X_{n+h}^{2} ~ | ~ X_{n}] - 2f(X_{n})mathbb{E}[X_{n+h} ~ | ~ X_{n}] + f^{2}(X_{n}) \ end{align*} ]
其中由于:
[begin{align*} Var(X_{n+h} ~ | ~ X_{n}) & = mathbb{E}Big[ big( X_{n+h} - mathbb{E}big[ X_{n+h}^{2} ~ | ~ X_{n} big] big)^{2} ~ Big| ~ X_{n} Big] \ & = mathbb{E}big[ X_{n+h}^{2} ~ big| ~ X_{n} big] - 2mathbb{E}^{2}big[ X_{n+h}^{2} ~ big| ~ X_{n} big] + mathbb{E}^{2}big[ X_{n+h}^{2} ~ big| ~ X_{n} big] \ & = mathbb{E}big[ X_{n+h}^{2} ~ big| ~ X_{n} big] - mathbb{E}^{2}big[ X_{n+h}^{2} ~ big| ~ X_{n} big] end{align*} ]
which gives that:
[implies Var(X_{n+h} ~ | ~ X_{n}) = mathbb{E}big[ X_{n+h}^{2} ~ big| ~ X_{n} big] - mathbb{E}^{2}big[ X_{n+h} ~ big| ~ X_{n} big] ]
因此,
[begin{align*} mathbb{E}big[ big( X_{n+h} - f(X_{n}) big)^{2} ~ big| ~ X_{n} big] & = Var(X_{n+h} ~ | ~ X_{n}) + mathbb{E}^{2}big[ X_{n+h} ~ big| ~ X_{n}big] - 2f(X_{n})mathbb{E}[X_{n+h} ~ | ~ X_{n}] + f^{2}(X_{n}) \ & = Var(X_{n+h} ~ | ~ X_{n}) + Big( mathbb{E}big[ X_{n+h} ~ big| ~ X_{n}big] - f(X_{n}) Big)^{2} end{align*} ]
方差 (Var(X_{n+h} ~ | ~ X_{n})) 为定值,那么 optimal solution (m(X_{n})) 显而易见:
[m(X_{n}) = mathbb{E}big[ X_{n+h} ~ big| ~ X_{n} big] ]
此时 (left{ X_{t} right}) 为一个 Stationary Gaussian Time Series, i.e.,
[begin{pmatrix} X_{n+h}\ X_{n} end{pmatrix} sim N begin{pmatrix} begin{pmatrix} mu \ mu end{pmatrix}, ~ begin{pmatrix} gamma(0) & gamma(h) \ gamma(h) & gamma(0) end{pmatrix} end{pmatrix} ]
那么我们有:
[X_{n+h} ~ | ~ X_{n} sim NBig( mu + rho(h)big(X_{n} - mubig), ~ gamma(0)big(1 - rho^{2}(h)big) Big) ]
其中 (rho(h)) 为 (left{ X_{t} right}) 的 ACF,因此,
[mathbb{E}big[ X_{n+h} ~ big| ~ X_{n} big] = m(X_{n}) = mu + rho(h) big( X_{n} - mu big) ]
注意:
若 (left{ X_{t} right}) 是一个 Gaussian time series,则一定能计算 best MSE predictor。而若 (left{ X_{t} right}) 并非 Gaussian time series,则计算通常十分复杂。
因此,我们通常不找 best MSE predictor,而寻找 best linear predictor。
Best Linear Predictor (BLP)
在 BLP 假设下,我们寻找一个形如 (f(X_{n}) propto aX_{n} + b) 的 predictor。
则目标为:
[text{minimize: } ~ S(a,b) = mathbb{E} big[ big( X_{n+h} - aX_{n} -b big)^{2} big] ]
推导:
分别对 (a, b) 求偏微分:
[begin{align*} frac{partial}{partial b} S(a, b) & = frac{partial}{partial b} mathbb{E} big[ big( X_{n+h} - aX_{n} -b big)^{2} big] \ & = -2 mathbb{E} big[ X_{n+h} - aX_{n} - b big] \ end{align*} ]
令:
[frac{partial}{partial b} S(a, b) = 0 ]
则:
[begin{align*} -2 cdot & mathbb{E} big[ X_{n+h} - aX_{n} - b big] = 0 \ implies & qquad mathbb{E}[X_{n+h}] - amathbb{E}[X_{n}] - b = 0\ implies & qquad mu - amu - b = 0 \ implies & qquad b^{star} = (1 - a^{star}) mu end{align*} ]
回代并 take partial derivative on (a):
[begin{align*} frac{partial}{partial a} S(a, b) & = frac{partial}{partial a} mathbb{E} big[ big( X_{n+h} - aX_{n} - (1 - a)mu big)^{2} big] \ & = frac{partial}{partial a} mathbb{E} Big[ Big( big(X_{n+h} - mu big) - big( X_{n} - mu big) a Big)^{2} Big] \ & = mathbb{E} Big[ - big( X_{n} - mu big) Big( big(X_{n+h} - mu big) - big( X_{n} - mu big) a Big)Big] \ end{align*} ]
令:
[frac{partial}{partial a} S(a, b) = 0 ]
则:
[begin{align*} & mathbb{E} Big[ - big( X_{n} - mu big) Big( big(X_{n+h} - mu big) - big( X_{n} - mu big) a Big)Big] = 0 \ implies & qquad mathbb{E} Big[big( X_{n} - mu big) Big( big(X_{n+h} - mu big) - big( X_{n} - mu big) a Big)Big] = 0 \ implies & qquad mathbb{E} Big[big( X_{n} - mu big) big(X_{n+h} - mu big) - a big( X_{n} - mu big) big( X_{n} - mu big) Big] = 0 \ implies & qquad mathbb{E} Big[big( X_{n} - mu big) big(X_{n+h} - mu big) Big] = a cdot mathbb{E} Big[big( X_{n} - mu big) big( X_{n} - mu big) Big] \ implies & qquad mathbb{E} Big[big( X_{n} - mathbb{E}[X_{n}] big) big(X_{n+h} - mathbb{E}[X_{n+h}] big) Big] = a cdot mathbb{E} Big[big( X_{n} - mathbb{E}[X_{n}] big)^{2} Big] \ implies & qquad text{Cov}(X_{n}, X_{n+h}) = a cdot text{Var}(X_{n}) \ implies & qquad a^{star} = frac{gamma(h)}{gamma(0)} = rho(h) end{align*} ]
综上,time series (left{ X_{n} right}) 的 BLP 为:
[f(X_{n}) = l(X_{n}) = mu + rho(h) big( X_{n} - mu big) ]
且 BLP 相关的 MSE 为:
[begin{align*} text{MSE} & = mathbb{E}big[ big( X_{n+h} - l(X_{n}) big)^{2} big] \ & = mathbb{E} Big[ Big( X_{n+h} - mu - rho(h) big( X_{n} - mu big) Big)^{2} Big] \ & = rho(0) cdot big( 1 - rho^{2}(h) big) end{align*} ]