(I) Background

Instructor: Hong-Kun Zhang, Department of Mathematics and Statistics, University of Massachusetts Amherst (Summer 2016).
The research started in May 2016 and ended in July 2016. The original data used was Straits Times Index (^STI) from Yahoo Finance.
In this research, the model “Markov Switching - AR(1) - GARCH(1,1)” was built, and it was used to forecast the log return of the Straits Times Index.
~~The historical data for Straits Times Index is no longer available, so Alphabet Inc. (GOOG) is used in this research summary.~~
The most recent update was performed on November 22, 2020. The data is from August 19, 2004 to January 12, 2018.

(II) Mathematics

1. Estimation of log return

\[r_t=\ln\left(\frac{P_{t}}{P_{t-1}}\right)\] \[\hat{r_t}=\mu_t+\sigma_t\cdot\epsilon_t\qquad\text{where}\quad\epsilon_t\sim{i,i,d(0,1)}\]

In the research, we need to estimate \(\mu_{t+1}\) and \(\sigma_{t+1}\) with the given data from \(T=1\) to \(T=t\).

\[\widehat{r_{t+1}}=\widehat{\mu_{t+1}}+\widehat{\sigma_{t+1}}\cdot\epsilon_{t+1}\]

2. AR(1) model

\[\widehat{r_t}=a_{t-1}+b_{t-1}\cdot{r_{t-1}}\]

In this research,

\[\left\{\begin{align}\widehat{r_{t+1}}&=a_t+b_t\cdot{r_t}\\\mu_t&=\frac{1}{t}\cdot\sum_{i=1}^{t}r_i\end{align}\right.\\\Rightarrow\widehat{\mu_{t+1}}=a_t+b_t\cdot\mu_t\]

3. GARCH(1,1) model

Reference

\[\left\{\begin{align}\hat{r_t}&=\mu_t+\sigma_t\cdot\epsilon_t\\{\widehat{\sigma_{t}}}^2&=\omega+\alpha\cdot(\sigma_{t-1}\cdot\epsilon_{t-1})^2+\beta\cdot{\sigma_{t-1}}^2\end{align}\right.\]

In this research,

\[\begin{align}{\widehat{\sigma_{t+1}}}^2&=\omega_t+\alpha_t\cdot(\sigma_t\cdot\epsilon_t)^2+\beta_t\cdot{\sigma_t}^2\\&=\omega_t+\alpha_t\cdot(r_t-\mu_t)^2+\beta_t\cdot{\sigma_t}^2\end{align}\\\Rightarrow\widehat{\sigma_{t+1}}=\sqrt{\omega_t+\alpha_t\cdot(r_t-\mu_t)^2+\beta_t\cdot{\sigma_t}^2}\]

(III) Calculations & Simulations

1. Loading, cleaning, and transformation

2. Get log return

3. Data Splitting

80% for modeling, 20% for back testing.

4. Modeling

EDA

arima model:

          ar1     intercept 
-4.156632e-02  9.576804e-05

linear model:

  (Intercept)           r_x 
 9.959056e-05 -4.157858e-02

compare two models:

Modeling

STI_modeling_df = STI_df %>%
  dplyr::filter(split_type %>% str_detect(pattern = "Model"))

r = STI_modeling_df %>% pull(`Log Return`)
curr_arima_coef = get_arima_coef(r)
curr_arima_coef

    intercept           ar1 
-7.819124e-05 -4.576361e-02

rm(r, curr_arima_coef)

STI_df %>% pull(`Log Return`) %>% length()

[1] 3373

STI_df %>% pull(`Log Return`) %>% extract(1:3373) %>% get_arima_coef()

    intercept           ar1 
 9.576804e-05 -4.156632e-02

STI_df %>% pull(`Log Return`) %>% extract(2:3373) %>% get_arima_coef()

    intercept           ar1 
 9.577186e-05 -4.156582e-02

STI_df %>% pull(`Log Return`) %>% extract(2:3372) %>% get_arima_coef()

    intercept           ar1 
 9.509012e-05 -4.157595e-02

STI_df %>% pull(`Log Return`) %>% extract(3:3373) %>% get_arima_coef()

    intercept           ar1 
 9.715037e-05 -4.156207e-02

STI_df %>% pull(`Log Return`) %>% get_lm_coef()

  (Intercept) x_log_returns 
 9.959056e-05 -4.157858e-02

OLD NOTE

Model: AR(1) - GARCH(1,1)

\[\hat{r}_{t}=\mu_{t}+\sigma_{t}\cdot\epsilon_{t}\] We have:

\(r_{t}\): Log Return of Adjusted Close Value
\(\mu_{t}\): Rolling Average of Log Returns
\(\sigma_{t}\): Standard Deviation of Log Returns
\(\epsilon_{t}=\cfrac{r_{t}-\mu_{t}}{\sigma_{t}}\)

AR(1) \[\hat\mu_{t}=a_{t-1}+b_{t-1}\cdot\mu_{t-1}\]

GARCH(1,1) \[{\hat\sigma_{t}}^2=\omega_{t-1}+\alpha_{t-1}\cdot{\mu_{t-1}}^2+\beta_{t-1}\cdot{\sigma_{t-1}}^2\]

Therefore, \[\hat{r}_{t}=\hat\mu_{t}+\hat\sigma_{t}\cdot\epsilon_{t}=(a_{t-1}+b_{t-1}\cdot\mu_{t-1})+\sqrt{\omega_{t-1}+\alpha_{t-1}\cdot{\mu_{t-1}}^2+\beta_{t-1}\cdot{\sigma_{t-1}}^2}\cdot{\epsilon_{t}}\]

#read data online
goog=read.csv("https://raw.githubusercontent.com/HarryLu95/dalu/master/Project%20Files/FM%20Research/GOOG.csv")

The original data has 7 columns: Date, Open, High, Low, Close, Adj.Close, and Volumn.

Day and Log.Return (\(r_t\)) are added to the data frame. \[r_t=\ln(\frac{P_t}{P_{t-1}})\] where \(P\) represents adjusted close value.

# load package: fGarch
# library(fGarch)

#plot the adjusted value
plot.ts(goog$Adj.Close,ylab="Adjusted Value",main="Alphabet Inc. Adjusted Stock Value from Aug/19/2004 to Jan/12/2018")

plot.ts(diff(goog$Adj.Close),main="Differencing Level 1")

#compute log return value
l=dim(goog)[1]
goog$Day=1:l
goog$Log.Return=NA
for (i in 2:l){
  goog$Log.Return[i]=log(goog$Adj.Close[i]/goog$Adj.Close[i-1],base=exp(1))
}

#plot the log return value
plot.ts(goog$Log.Return,ylab="Log Return",main="Alphabet Inc. Log Return")

The first \(n\) data (g1) were used for Model, and the rest (g2) were used for back testing.

#divide the date into two groups, the second part is used for back testing
n=as.integer(2800)
#n=floor(l/6*5)
goog$g1=NA
goog$g2=NA
goog$g1[2:n]=goog$Log.Return[2:n]
goog$g2[(n+1):l]=goog$Log.Return[(n+1):l]

Using linear model on Rolling Log.Return to compute \(a\) and \(b\), and then \(\hat\mu\).

#compute the value of mu hat
goog$mu.hat=NA
for (i in n:(l-1)){
  lm=lm(goog$Log.Return[2:i]~goog$Day[2:i])
  a=lm$coefficient[[1]]
  b=lm$coefficient[[2]]
  mu=mean(goog$Log.Return[2:i])
  temp.mu.hat=a+b*mu
  goog$mu.hat[i+1]=temp.mu.hat
}
#plot.ts(goog$mu.hat[(n+1):l],main="mu hat",ylab="")

Using GARCH model on Rolling Log.Return to compute \(\omega\), \(\alpha\) and \(\beta\), and then \(\hat\sigma^2\).

#compute the value of sigma hat square
goog$sigma.hat.square=NA
for (i in n:(l-1)){
  g=garchFit(~garch(1,1),goog$Log.Return[2:i],trace=FALSE)
  omega=g@fit$coef[[2]]
  alpha=g@fit$coef[[3]]
  beta=g@fit$coef[[4]]
  mu=mean(goog$Log.Return[2:i])
  sigma=sd(goog$Log.Return[2:i])
  temp.sigma.hat.sq=omega+alpha*mu^2+beta*sigma^2
  goog$sigma.hat.square[i+1]=temp.sigma.hat.sq
}

#compute the value of sigma hat
goog$sigma.hat=sqrt(goog$sigma.hat.square)

#compute the r hat, and plot the back testing log return value with the r hat value (predicted value)
goog$r.hat=goog$mu.hat+goog$sigma.hat*rnorm(1,0,1)
plot(goog$Log.Return[2801:l],type="l",lty=3,col=1,ylim=c(-0.06,0.08),ylab="")
par(new=TRUE)
plot(goog$r.hat[2801:l],type="l",lty=1,col=2,ylim=c(-0.06,0.08),ylab="")
legend("topright",legend=c("original data","predicted data"),lty=c(3,1),col=c(1,2))

Model: Markov Switching - AR(1) - GARCH(1,1)

In this model, the Log.Return was divided into two groups: one with relative high volatility and one with relative low volatility.

Method 1: Set a pair of threshold level, \(n_1\) and \(n_2\), where \(n_1>n_2\). In Log.Return data, any value greater than \(n_1\) or less than \(n_2\) would be put into group High, and the left would be put into group Low.

#find the thresholds, plot out the graph
n1=mean(goog$Log.Return[2:l])+0.02
n2=mean(goog$Log.Return[2:l])-0.02
count=0
while (count<l*0.4){
  count=0
  for (i in 2:l){
    if (goog$Log.Return[i]>n1|goog$Log.Return[i]<n2){
      count=count+1
    }
  }
  if (count<l*0.4){
    n1=n1-0.001
    n2=n2+0.001
  }
}
print(count)

[1] 1375

plot(goog$Log.Return[2:l],type="l",lty=3,ylab="Log Return",main="Alphabet Inc. Log Return (Grouping)")
abline(h=n1,col="red",lty=1)
abline(h=n2,col="blue",lty=1)
legend("topright",legend=c("Log Return","n1","n2"),col=c("black","red","blue"),lty=c(3,1,1))

#get the data of group high and group low
High=c()
Low=c()
for (i in 2:l){
  if (goog$Log.Return[i]>n1|goog$Log.Return[i]<n2){
    High=c(High,goog$Log.Return[i])
  }
  else{
    Low=c(Low,goog$Log.Return[i])
  }
}

Forecasting Stock Value