Issues with regressions involving time series data
Losing IID conditions: Laws of large numbers and central limit theorems have to be modified.
Spurious regressions: Do regressions help in making conditional predictions? Do regressions uncover meaningful relationships?
Unit roots: What happens to regressions when variables are trending?
And many more we do not get to cover…
Motivation: Moving beyond IID
IID is restrictive for economic data: we have time series, spatial, and panel data.
So far, we can drop “identical” and move on to the concept of stationarity. You were introduced to this idea already in Chapter 4 of SDAFE.
We also need to understand how to weaken the assumption of independence.
Motivation: Is OLS working?
Suppose you are interested in estimating the parameters of a first-order autoregression or AR(1) process \[Y_{t}=\beta_0^*+\beta_1^* Y_{t-1}+u_{t},\] where \(u_t\) is the error from best linear prediction.
To give you a sense of what the data on \(\left\{ Y_{t}\right\}_{t=1}^{n}\) would look like, you will see some pictures under the following settings:
\(\beta_0^*=0\) and \(\beta_1^*\) can be 0, 0.5, 0.95, and 1
\(u_{t}\sim N\left(0,1\right)\) and \(Y_{0}\sim N\left(0,1\right)\)
Motivation: Is OLS working?
You will see two plots side-by-side.
One is a time-series plot where \(Y_{t}\) is plotted against \(t\).
The other is a scatterplot where \(Y_{t}\) is plotted against \(Y_{t-1}\).
To enhance comparability, I fix the use the set of randomly drawn \(u_{t}\)’s and \(Y_{0}\)’s.
Generate 500 observations for each process.
Simulated data from \(Y_t=0*Y_{t-1}+u_t\)
Simulated data from \(Y_t=0.5Y_{t-1}+u_t\)
Simulated data from \(Y_t=0.95Y_{t-1}+u_t\)
Simulated data from \(Y_t=Y_{t-1}+u_t\)
Design of Monte Carlo
Now, let us evaluate the performance of OLS when we generate multiple “instances” of the first-order autoregression given earlier.
Examine center and spread of the sampling distribution of the OLS estimator of \(\beta_1^*\).
A 5% significance level was used for testing the null that \(\beta_1^*\) is equal to the value in the indicated column.
OLS seems to be working well, but suffers when \(\beta_1^*\) is getting closer to 1.
The OLS estimator for \(\beta_1^*\) seem to be downward biased, i.e., the OLS estimator seems to be systematically underestimating \(\beta_1^*\).
The case of \(\beta_1^*=1\) is extremely different from \(0<\beta_1^* <1\).
Curiosities
A curiosity
The plot you just saw is for the case where \(n=640\) and \(\beta_1^*=1\).
The blue curve is the standard normal.
In this case, you reject the null \(H_{0}:\;\beta_1^*=1\) more often than you should. Therefore you need new critical values to decide whether or not there is evidence in support or against the null.
Sampling distribution of \(t\)-statistic under the null of a unit root
Dickey and Fuller (1979) have shown that when testing the null of a unit root, the asymptotic distribution of the test statistic under the null is nonstandard.
But their research further indicates that the asymptotic distribution of the test statistic under the null changes depending on the presence or absence of deterministic variables in the autoregression (e.g. time trends, intercepts), and the nature of the null being tested.
Sampling distribution of \(t\)-statistic under the null of a unit root
For more on the nonstandard behavior in the unit root case, see Chang and Park (2002).
It is also important to know how to generate your own critical values for the testing in these nonstandard situations. See, for example, Cheung and Lai (1995).
Granger and Newbold (1974) also show that measures of fit from spurious regressions will typically indicate very good fit even if the two variables are truly unrelated.
This is yet another instance where standard measures of fit like the R-squared have to be interpreted with caution.
Nonsense regressions can also happen in the context of IID data. Try simulating a case where there are many unrelated \(X\)’s included relative to sample size.
Spurious regressions: solutions
There are two broad ways of solving the spurious regression problem:
Run a regression in first differences.
Explore a cointegration analysis of the two series (“equilibrium relationship” exists between the two series).
The first option is the most appropriate course of action given the simulation setting we have and the result is shown below.
Spurious regressions: a solution for Case 2
summary(dyn$lm(diff(y2)~diff(x2)))
Call:
lm(formula = dyn(diff(y2) ~ diff(x2)))
Residuals:
Min 1Q Median 3Q Max
-3.080 -0.669 -0.011 0.683 3.791
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0792 0.0311 2.55 0.011 *
diff(x2) -0.0230 0.0320 -0.72 0.472
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.975 on 997 degrees of freedom
Multiple R-squared: 0.000518, Adjusted R-squared: -0.000484
F-statistic: 0.517 on 1 and 997 DF, p-value: 0.472
A path beyond IID in the time series case
Replace identical distribution with a time series concept called stationarity. In the IID case, the distribution of every random variable is the same.
Because we want to cover cases where there is dependence, we need something that can have dependence and at the same time ensures that distributions stay the same somehow.
In the IID case, we can essentially “shuffle” the random variables.
A path beyond IID in the time series case
To have some dependence, we have to prevent “shuffling”. So, we look at “blocks” of random variables. These “blocks” need to have the same joint distribution over time. This is the key intuition for stationarity.
Next we have to find a way to avoid imposing independence. Some insight can be obtained from the proof that the sample mean is consistent for the population mean.
Getting a sense of how to weaken independence
Assume that there is stationarity, along with finite moments. As a result, \(\mathbb{E}\left(Z_{t}\right)\) does not depend on \(t\).
Recall that \(\mathbb{E}\left(\overline{Z}\right)=\mathbb{E}\left(Z_{t}\right)\) even under dependence.
Next, if we assume that for all \(q\) and \(r\), \(\mathsf{Cov}\left(Z_{q},Z_{r}\right)\) does not depend on \(q\) nor \(r\), then \[\mathsf{Var}\left(\overline{Z}\right) =\frac{1}{n}\left[\mathsf{Var}\left(Z_{t}\right)+2\sum_{j=1}^{n-1}\left(1-\frac{j}{n}\right)\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\right].\]
\(\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\) is called an \(j\)th-order autocovariance.
Getting a sense of how to weaken independence
One way for \(\mathsf{Var}\left(\overline{Z}\right)\to0\) as \(n\to\infty\) is when \(\mathsf{Var}\left(Z_{t}\right)\) is bounded and …
Getting a sense of how to weaken independence
… when \[\begin{aligned}&\left\vert \sum_{j=1}^{n-1}\left(1-\frac{j}{n}\right)\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)]\right\vert \\ & \leq\sum_{j=1}^{n-1}\left\vert \left(1-\frac{j}{n}\right)\right\vert \left\vert \mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\right\vert \\
& \leq \sum_{j=1}^{n-1}\left\vert \mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\right\vert
\end{aligned}\] is bounded as \(n\to\infty\).
Getting a sense of how to weaken independence
Thus, under certain conditions on the autocovariances, it is possible to show that \(\overline{Z}\overset{p}{\to}\mathbb{E}\left(Z_{t}\right)\).
What you saw is the simplest version of a law of large numbers without independence imposed.
It is possible to have a slightly complicated version of this ergodic theorem under nonstationarity, see the very accessible note by Shalizi (2022).
What are the options to control these autocovariances?
Assume that \(\left\{Z_{t}\right\}\) is an IID sequence of random variables with finite second moments.
This effectively renders \(\mathsf{Var}\left(Z_{t}\right)<\infty\) and \(\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)=0\) for all \(j\neq0\).
What are the options to control these autocovariances?
Remove the independence assumption, as you only need zero autocovariances.
Consider processes called martingale difference sequences (MDS).
If \(\left\{Z_{t}\right\}\) is an MDS, then \(\mathbb{E}\left(Z_{t}|Z_{t-1},Z_{t-2}\ldots,\right)=0\).
What are the options to control these autocovariances?
Accept that \(\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\neq0\) for all \(j\neq0\).
Consider processes called covariance stationary processes.
If \(\left\{ Z_{t}\right\}\) is covariance stationary, then (i) \(\mathbb{E}\left(Z_{t}\right)\) does not depend on \(t\) (ii) \(\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)=\gamma\left(j\right)\) is only a function of \(j\) for all \(t\).
Independence and predictability
One way to look at the processes from the previous slides is to understand how these processes capture predictability.
Recall that if \(\left\{ Z_{t}\right\}\) is an IID sequence, then \(Z_{t}|Z_{t-1},Z_{t-2},\ldots,Z_{1}\sim Z_{t}\).
This means that knowing the past values of \(Z_{t}\) does not provide any new information.
In this sense, IID sequences are essentially sequences that are completely unpredictable.
Independence and predictability
Compare this unpredictability of MDS.
Note that the expression \(\mathbb{E}\left(Z_{t}|Z_{t-1},Z_{t-2}\ldots,\right)=0\) means that our best prediction of \(Z_{t}\) given all past information is zero.
That means \(Z_{t}\) is unpredictable in mean only.