Time series

Motivation and big picture

Andrew Pua

2024-06-11

Issues with regressions involving time series data

Losing IID conditions: Laws of large numbers and central limit theorems have to be modified.
Spurious regressions: Do regressions help in making conditional predictions? Do regressions uncover meaningful relationships?
Unit roots: What happens to regressions when variables are trending?
And many more we do not get to cover…

Motivation: Moving beyond IID

IID is restrictive for economic data: we have time series, spatial, and panel data.
So far, we can drop “identical” and move on to the concept of stationarity. You were introduced to this idea already in Chapter 4 of SDAFE.
We also need to understand how to weaken the assumption of independence.

Motivation: Is OLS working?

Suppose you are interested in estimating the parameters of a first-order autoregression or AR(1) process \[Y_{t}=\beta_0^*+\beta_1^* Y_{t-1}+u_{t},\] where \(u_t\) is the error from best linear prediction.
To give you a sense of what the data on \(\left\{ Y_{t}\right\}_{t=1}^{n}\) would look like, you will see some pictures under the following settings:
- \(\beta_0^*=0\) and \(\beta_1^*\) can be 0, 0.5, 0.95, and 1
- \(u_{t}\sim N\left(0,1\right)\) and \(Y_{0}\sim N\left(0,1\right)\)

Motivation: Is OLS working?

You will see two plots side-by-side.
- One is a time-series plot where \(Y_{t}\) is plotted against \(t\).
- The other is a scatterplot where \(Y_{t}\) is plotted against \(Y_{t-1}\).
To enhance comparability, I fix the use the set of randomly drawn \(u_{t}\)’s and \(Y_{0}\)’s.
Generate 500 observations for each process.

Simulated data from \(Y_t=0*Y_{t-1}+u_t\)

Simulated data from \(Y_t=0.5Y_{t-1}+u_t\)

Simulated data from \(Y_t=0.95Y_{t-1}+u_t\)

Simulated data from \(Y_t=Y_{t-1}+u_t\)

Design of Monte Carlo

Now, let us evaluate the performance of OLS when we generate multiple “instances” of the first-order autoregression given earlier.
Examine center and spread of the sampling distribution of the OLS estimator of \(\beta_1^*\).
A 5% significance level was used for testing the null that \(\beta_1^*\) is equal to the value in the indicated column.

Monte Carlo simulation: code

set.seed(20220318)
require(dyn)
reps <- 10^4
mod <- 1
coefs <- matrix(NA, nrow=reps, ncol=4)
SEs <- matrix(NA, nrow=reps, ncol=4)
t.stat <- matrix(NA, nrow=reps, ncol=4)
for (i in 1:reps)
{
  y1 <- arima.sim(n = 40*mod, list(order=c(0,0,0)))
  y2 <- arima.sim(n = 40*mod, list(order=c(1,0,0), ar = 0.5), innov = y1)
  y3 <- arima.sim(n = 40*mod, list(order=c(1,0,0), ar = 0.95), innov = y1)
  y4 <- ts(cumsum(y1))
  model.y1 <- dyn$lm(y1~lag(y1,-1))
  model.y2 <- dyn$lm(y2~lag(y2,-1))
  model.y3 <- dyn$lm(y3~lag(y3,-1))
  model.y4 <- dyn$lm(y4~lag(y4,-1))
  temp.c <- c(coef(model.y1)[2],coef(model.y2)[2],coef(model.y3)[2],coef(model.y4)[2])
  temp.d <- sqrt(c(vcov(model.y1)[2,2],vcov(model.y2)[2,2],vcov(model.y3)[2,2],vcov(model.y4)[2,2]))
  coefs[i,] <- temp.c
  SEs[i,] <- temp.d 
  t.stat[i,] <- (temp.c-c(0,0.5,0.95,1))/temp.d
}

Monte Carlo simulation: results for \(n=40\)

mean.ols <- colMeans(coefs)
mean.reg.se <- colMeans(SEs)
sd.ols <- apply(coefs, 2, sd)
p.vals <- (2*pnorm(-abs(t.stat)))<0.05
p.vals <- apply(p.vals, 2, mean)
results <- rbind(mean.ols, mean.reg.se, sd.ols, p.vals)
colnames(results) <- c("beta1=0", "beta1=0.5", "beta1=0.95", "beta1=1")
results

            beta1=0 beta1=0.5 beta1=0.95 beta1=1
mean.ols    -0.0278    0.4353     0.8368  0.8724
mean.reg.se  0.1624    0.1457     0.0856  0.0748
sd.ols       0.1556    0.1448     0.1068  0.1029
p.vals       0.0521    0.0639     0.1733  0.2986

Monte Carlo simulation: results for \(n=160, 640\)

Change the number assigned to mod to 4 and 16 and rerun the code to produce results for \(n=160\) and \(n=640\), respectively.

             beta1=0 beta1=0.5 beta1=0.95 beta1=1
mean.ols    -0.00368    0.4882     0.9247  0.9677
mean.reg.se  0.07957    0.0694     0.0297  0.0188
sd.ols       0.07570    0.0663     0.0334  0.0274
p.vals       0.04300    0.0350     0.0870  0.3010

             beta1=0 beta1=0.5 beta1=0.95 beta1=1
mean.ols    -0.00228    0.4964     0.9445 0.99160
mean.reg.se  0.03959    0.0344     0.0129 0.00483
sd.ols       0.03922    0.0342     0.0133 0.00679
p.vals       0.04900    0.0460     0.0490 0.29300

Curiosities

OLS seems to be working well, but suffers when \(\beta_1^*\) is getting closer to 1.
The OLS estimator for \(\beta_1^*\) seem to be downward biased, i.e., the OLS estimator seems to be systematically underestimating \(\beta_1^*\).
The case of \(\beta_1^*=1\) is extremely different from \(0<\beta_1^* <1\).

Curiosities

A curiosity

The plot you just saw is for the case where \(n=640\) and \(\beta_1^*=1\).
The blue curve is the standard normal.
In this case, you reject the null \(H_{0}:\;\beta_1^*=1\) more often than you should. Therefore you need new critical values to decide whether or not there is evidence in support or against the null.

Sampling distribution of \(t\)-statistic under the null of a unit root

Dickey and Fuller (1979) have shown that when testing the null of a unit root, the asymptotic distribution of the test statistic under the null is nonstandard.
But their research further indicates that the asymptotic distribution of the test statistic under the null changes depending on the presence or absence of deterministic variables in the autoregression (e.g. time trends, intercepts), and the nature of the null being tested.

Sampling distribution of \(t\)-statistic under the null of a unit root

For more on the nonstandard behavior in the unit root case, see Chang and Park (2002).
It is also important to know how to generate your own critical values for the testing in these nonstandard situations. See, for example, Cheung and Lai (1995).

Regressions with non-trending variables

Case 1: We have

\(X_{t}\sim N\left(0.1,1\right)\), \(Y_{t}\sim N\left(0.1,1\right)\)
\(X_{t}\) and \(Y_{t}\) are independent

y1 <- 0.1 + arima.sim(n = 1000, list(order = c(0, 0, 0)))
x1 <- 0.1 + arima.sim(n = 1000, list(order = c(0, 0, 0)))

Regressions with non-trending variables

summary(dyn$lm(y1~x1))


Call:
lm(formula = dyn(y1 ~ x1))

Residuals:
   Min     1Q Median     3Q    Max 
-3.080 -0.670 -0.011  0.682  3.790 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   0.0799     0.0311    2.57     0.01 *
x1           -0.0226     0.0320   -0.70     0.48  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.975 on 998 degrees of freedom
Multiple R-squared:  0.000497,  Adjusted R-squared:  -0.000505 
F-statistic: 0.496 on 1 and 998 DF,  p-value: 0.481

Regressions with trending variables

Case 2: We have

\(Y_{t}=0.1+Y_{t-1}+\epsilon_{t}\) and \(X_{t}=0.1+X_{t-1}+\eta_{t}\), where \(\epsilon_{t}\sim N\left(0,1\right)\)
\(Y_{0}\sim N\left(0,1\right)\), \(\eta_{t}\sim N\left(0,1\right)\), \(X_{0}\sim N\left(0,1\right)\)
\(\epsilon_{t}\), \(\eta_{t}\), \(Y_{0}\), \(X_{0}\) are mutually independent.

y2 <- ts(cumsum(y1))
x2 <- ts(cumsum(x1))

Regressions with trending variables

summary(dyn$lm(y2~x2))


Call:
lm(formula = dyn(y2 ~ x2))

Residuals:
   Min     1Q Median     3Q    Max 
-32.47  -5.87   2.31   7.57  20.53 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 22.87092    0.54291    42.1   <2e-16 ***
x2           0.48433    0.00822    58.9   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.1 on 998 degrees of freedom
Multiple R-squared:  0.777, Adjusted R-squared:  0.777 
F-statistic: 3.47e+03 on 1 and 998 DF,  p-value: <2e-16

Spurious regressions: explanation

What you have observed in the second case is a phenomenon called spurious regression or “nonsense regression”.
A version of this phenomenon was noted by Yule (1926) but pointed out more recently by Granger and Newbold (1974).

Spurious regressions: explanation

Granger and Newbold (1974) also show that measures of fit from spurious regressions will typically indicate very good fit even if the two variables are truly unrelated.
- This is yet another instance where standard measures of fit like the R-squared have to be interpreted with caution.
Nonsense regressions can also happen in the context of IID data. Try simulating a case where there are many unrelated \(X\)’s included relative to sample size.

Spurious regressions: solutions

There are two broad ways of solving the spurious regression problem:
- Run a regression in first differences.
- Explore a cointegration analysis of the two series (“equilibrium relationship” exists between the two series).
The first option is the most appropriate course of action given the simulation setting we have and the result is shown below.

Spurious regressions: a solution for Case 2

summary(dyn$lm(diff(y2)~diff(x2)))


Call:
lm(formula = dyn(diff(y2) ~ diff(x2)))

Residuals:
   Min     1Q Median     3Q    Max 
-3.080 -0.669 -0.011  0.683  3.791 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   0.0792     0.0311    2.55    0.011 *
diff(x2)     -0.0230     0.0320   -0.72    0.472  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.975 on 997 degrees of freedom
Multiple R-squared:  0.000518,  Adjusted R-squared:  -0.000484 
F-statistic: 0.517 on 1 and 997 DF,  p-value: 0.472

A path beyond IID in the time series case

Replace identical distribution with a time series concept called stationarity. In the IID case, the distribution of every random variable is the same.
Because we want to cover cases where there is dependence, we need something that can have dependence and at the same time ensures that distributions stay the same somehow.
In the IID case, we can essentially “shuffle” the random variables.

A path beyond IID in the time series case

To have some dependence, we have to prevent “shuffling”. So, we look at “blocks” of random variables. These “blocks” need to have the same joint distribution over time. This is the key intuition for stationarity.
Next we have to find a way to avoid imposing independence. Some insight can be obtained from the proof that the sample mean is consistent for the population mean.

Getting a sense of how to weaken independence

Assume that there is stationarity, along with finite moments. As a result, \(\mathbb{E}\left(Z_{t}\right)\) does not depend on \(t\).
Recall that \(\mathbb{E}\left(\overline{Z}\right)=\mathbb{E}\left(Z_{t}\right)\) even under dependence.
Thus, \(\lim_{n\to\infty}\mathbb{E}\left(\overline{Z}\right)=\mathbb{E}\left(Z_{t}\right)\).

Getting a sense of how to weaken independence

Next, if we assume that for all \(q\) and \(r\), \(\mathsf{Cov}\left(Z_{q},Z_{r}\right)\) does not depend on \(q\) nor \(r\), then \[\mathsf{Var}\left(\overline{Z}\right) =\frac{1}{n}\left[\mathsf{Var}\left(Z_{t}\right)+2\sum_{j=1}^{n-1}\left(1-\frac{j}{n}\right)\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\right].\]
- \(\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\) is called an \(j\)th-order autocovariance.

Getting a sense of how to weaken independence

One way for \(\mathsf{Var}\left(\overline{Z}\right)\to0\) as \(n\to\infty\) is when \(\mathsf{Var}\left(Z_{t}\right)\) is bounded and …

Getting a sense of how to weaken independence

… when \[\begin{aligned}&\left\vert \sum_{j=1}^{n-1}\left(1-\frac{j}{n}\right)\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)]\right\vert \\ & \leq\sum_{j=1}^{n-1}\left\vert \left(1-\frac{j}{n}\right)\right\vert \left\vert \mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\right\vert \\ & \leq \sum_{j=1}^{n-1}\left\vert \mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\right\vert \end{aligned}\] is bounded as \(n\to\infty\).

Getting a sense of how to weaken independence

Thus, under certain conditions on the autocovariances, it is possible to show that \(\overline{Z}\overset{p}{\to}\mathbb{E}\left(Z_{t}\right)\).
- What you saw is the simplest version of a law of large numbers without independence imposed.
- It is possible to have a slightly complicated version of this ergodic theorem under nonstationarity, see the very accessible note by Shalizi (2022).

What are the options to control these autocovariances?

Assume that \(\left\{Z_{t}\right\}\) is an IID sequence of random variables with finite second moments.
- This effectively renders \(\mathsf{Var}\left(Z_{t}\right)<\infty\) and \(\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)=0\) for all \(j\neq0\).

What are the options to control these autocovariances?

Remove the independence assumption, as you only need zero autocovariances.
- Consider processes called martingale difference sequences (MDS).
- If \(\left\{Z_{t}\right\}\) is an MDS, then \(\mathbb{E}\left(Z_{t}|Z_{t-1},Z_{t-2}\ldots,\right)=0\).

What are the options to control these autocovariances?

Accept that \(\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)\neq0\) for all \(j\neq0\).
- Consider processes called covariance stationary processes.
- If \(\left\{ Z_{t}\right\}\) is covariance stationary, then (i) \(\mathbb{E}\left(Z_{t}\right)\) does not depend on \(t\) (ii) \(\mathsf{Cov}\left(Z_{t},Z_{t-j}\right)=\gamma\left(j\right)\) is only a function of \(j\) for all \(t\).

Independence and predictability

One way to look at the processes from the previous slides is to understand how these processes capture predictability.
Recall that if \(\left\{ Z_{t}\right\}\) is an IID sequence, then \(Z_{t}|Z_{t-1},Z_{t-2},\ldots,Z_{1}\sim Z_{t}\).
- This means that knowing the past values of \(Z_{t}\) does not provide any new information.
- In this sense, IID sequences are essentially sequences that are completely unpredictable.

Independence and predictability

Compare this unpredictability of MDS.
- Note that the expression \(\mathbb{E}\left(Z_{t}|Z_{t-1},Z_{t-2}\ldots,\right)=0\) means that our best prediction of \(Z_{t}\) given all past information is zero.
- That means \(Z_{t}\) is unpredictable in mean only.