(I) Background

  • Instructor: Hong-Kun Zhang, Department of Mathematics and Statistics, University of Massachusetts Amherst (Summer 2016).
  • The research started in May 2016 and ended in July 2016. The original data used was Straits Times Index (^STI) from Yahoo Finance.
  • In this research, the model “Markov Switching - AR(1) - GARCH(1,1)” was built, and it was used to forecast the log return of the Straits Times Index.
  • The historical data for Straits Times Index is no longer available, so Alphabet Inc. (GOOG) is used in this research summary.
  • The most recent update was performed on November 22, 2020. The data is from August 19, 2004 to January 12, 2018.

(II) Mathematics

1. Estimation of log return

\[r_t=\ln\left(\frac{P_{t}}{P_{t-1}}\right)\] \[\hat{r_t}=\mu_t+\sigma_t\cdot\epsilon_t\qquad\text{where}\quad\epsilon_t\sim{i,i,d(0,1)}\]

  • In the research, we need to estimate \(\mu_{t+1}\) and \(\sigma_{t+1}\) with the given data from \(T=1\) to \(T=t\).

\[\widehat{r_{t+1}}=\widehat{\mu_{t+1}}+\widehat{\sigma_{t+1}}\cdot\epsilon_{t+1}\]

2. AR(1) model

\[\widehat{r_t}=a_{t-1}+b_{t-1}\cdot{r_{t-1}}\]

  • In this research,

\[\left\{\begin{align}\widehat{r_{t+1}}&=a_t+b_t\cdot{r_t}\\\mu_t&=\frac{1}{t}\cdot\sum_{i=1}^{t}r_i\end{align}\right.\\\Rightarrow\widehat{\mu_{t+1}}=a_t+b_t\cdot\mu_t\]

3. GARCH(1,1) model

\[\left\{\begin{align}\hat{r_t}&=\mu_t+\sigma_t\cdot\epsilon_t\\{\widehat{\sigma_{t}}}^2&=\omega+\alpha\cdot(\sigma_{t-1}\cdot\epsilon_{t-1})^2+\beta\cdot{\sigma_{t-1}}^2\end{align}\right.\]

  • In this research,

\[\begin{align}{\widehat{\sigma_{t+1}}}^2&=\omega_t+\alpha_t\cdot(\sigma_t\cdot\epsilon_t)^2+\beta_t\cdot{\sigma_t}^2\\&=\omega_t+\alpha_t\cdot(r_t-\mu_t)^2+\beta_t\cdot{\sigma_t}^2\end{align}\\\Rightarrow\widehat{\sigma_{t+1}}=\sqrt{\omega_t+\alpha_t\cdot(r_t-\mu_t)^2+\beta_t\cdot{\sigma_t}^2}\]

(III) Calculations & Simulations

1. Loading, cleaning, and transformation

2. Get log return

3. Data Splitting

  • 80% for modeling, 20% for back testing.

4. Modeling

  • EDA
arima model:
          ar1     intercept 
-4.156632e-02  9.576804e-05 

linear model:
  (Intercept)           r_x 
 9.959056e-05 -4.157858e-02 

compare two models:

  • Modeling
STI_modeling_df = STI_df %>%
  dplyr::filter(split_type %>% str_detect(pattern = "Model"))

r = STI_modeling_df %>% pull(`Log Return`)
curr_arima_coef = get_arima_coef(r)
curr_arima_coef
    intercept           ar1 
-7.819124e-05 -4.576361e-02 
rm(r, curr_arima_coef)
STI_df %>% pull(`Log Return`) %>% length()
[1] 3373
STI_df %>% pull(`Log Return`) %>% extract(1:3373) %>% get_arima_coef()
    intercept           ar1 
 9.576804e-05 -4.156632e-02 
STI_df %>% pull(`Log Return`) %>% extract(2:3373) %>% get_arima_coef()
    intercept           ar1 
 9.577186e-05 -4.156582e-02 
STI_df %>% pull(`Log Return`) %>% extract(2:3372) %>% get_arima_coef()
    intercept           ar1 
 9.509012e-05 -4.157595e-02 
STI_df %>% pull(`Log Return`) %>% extract(3:3373) %>% get_arima_coef()
    intercept           ar1 
 9.715037e-05 -4.156207e-02 
STI_df %>% pull(`Log Return`) %>% get_lm_coef()
  (Intercept) x_log_returns 
 9.959056e-05 -4.157858e-02 



OLD NOTE

Model: AR(1) - GARCH(1,1)

\[\hat{r}_{t}=\mu_{t}+\sigma_{t}\cdot\epsilon_{t}\] We have:

  • \(r_{t}\): Log Return of Adjusted Close Value
  • \(\mu_{t}\): Rolling Average of Log Returns
  • \(\sigma_{t}\): Standard Deviation of Log Returns
  • \(\epsilon_{t}=\cfrac{r_{t}-\mu_{t}}{\sigma_{t}}\)

AR(1) \[\hat\mu_{t}=a_{t-1}+b_{t-1}\cdot\mu_{t-1}\]

GARCH(1,1) \[{\hat\sigma_{t}}^2=\omega_{t-1}+\alpha_{t-1}\cdot{\mu_{t-1}}^2+\beta_{t-1}\cdot{\sigma_{t-1}}^2\]

Therefore, \[\hat{r}_{t}=\hat\mu_{t}+\hat\sigma_{t}\cdot\epsilon_{t}=(a_{t-1}+b_{t-1}\cdot\mu_{t-1})+\sqrt{\omega_{t-1}+\alpha_{t-1}\cdot{\mu_{t-1}}^2+\beta_{t-1}\cdot{\sigma_{t-1}}^2}\cdot{\epsilon_{t}}\]

#read data online
goog=read.csv("https://raw.githubusercontent.com/HarryLu95/dalu/master/Project%20Files/FM%20Research/GOOG.csv")

The original data has 7 columns: Date, Open, High, Low, Close, Adj.Close, and Volumn.

Day and Log.Return (\(r_t\)) are added to the data frame. \[r_t=\ln(\frac{P_t}{P_{t-1}})\] where \(P\) represents adjusted close value.

# load package: fGarch
# library(fGarch)
#plot the adjusted value
plot.ts(goog$Adj.Close,ylab="Adjusted Value",main="Alphabet Inc. Adjusted Stock Value from Aug/19/2004 to Jan/12/2018")

plot.ts(diff(goog$Adj.Close),main="Differencing Level 1")

#compute log return value
l=dim(goog)[1]
goog$Day=1:l
goog$Log.Return=NA
for (i in 2:l){
  goog$Log.Return[i]=log(goog$Adj.Close[i]/goog$Adj.Close[i-1],base=exp(1))
}
#plot the log return value
plot.ts(goog$Log.Return,ylab="Log Return",main="Alphabet Inc. Log Return")

The first \(n\) data (g1) were used for Model, and the rest (g2) were used for back testing.

#divide the date into two groups, the second part is used for back testing
n=as.integer(2800)
#n=floor(l/6*5)
goog$g1=NA
goog$g2=NA
goog$g1[2:n]=goog$Log.Return[2:n]
goog$g2[(n+1):l]=goog$Log.Return[(n+1):l]

Using linear model on Rolling Log.Return to compute \(a\) and \(b\), and then \(\hat\mu\).

#compute the value of mu hat
goog$mu.hat=NA
for (i in n:(l-1)){
  lm=lm(goog$Log.Return[2:i]~goog$Day[2:i])
  a=lm$coefficient[[1]]
  b=lm$coefficient[[2]]
  mu=mean(goog$Log.Return[2:i])
  temp.mu.hat=a+b*mu
  goog$mu.hat[i+1]=temp.mu.hat
}
#plot.ts(goog$mu.hat[(n+1):l],main="mu hat",ylab="")

Using GARCH model on Rolling Log.Return to compute \(\omega\), \(\alpha\) and \(\beta\), and then \(\hat\sigma^2\).

#compute the value of sigma hat square
goog$sigma.hat.square=NA
for (i in n:(l-1)){
  g=garchFit(~garch(1,1),goog$Log.Return[2:i],trace=FALSE)
  omega=g@fit$coef[[2]]
  alpha=g@fit$coef[[3]]
  beta=g@fit$coef[[4]]
  mu=mean(goog$Log.Return[2:i])
  sigma=sd(goog$Log.Return[2:i])
  temp.sigma.hat.sq=omega+alpha*mu^2+beta*sigma^2
  goog$sigma.hat.square[i+1]=temp.sigma.hat.sq
}
#compute the value of sigma hat
goog$sigma.hat=sqrt(goog$sigma.hat.square)
#compute the r hat, and plot the back testing log return value with the r hat value (predicted value)
goog$r.hat=goog$mu.hat+goog$sigma.hat*rnorm(1,0,1)
plot(goog$Log.Return[2801:l],type="l",lty=3,col=1,ylim=c(-0.06,0.08),ylab="")
par(new=TRUE)
plot(goog$r.hat[2801:l],type="l",lty=1,col=2,ylim=c(-0.06,0.08),ylab="")
legend("topright",legend=c("original data","predicted data"),lty=c(3,1),col=c(1,2))

Model: Markov Switching - AR(1) - GARCH(1,1)

In this model, the Log.Return was divided into two groups: one with relative high volatility and one with relative low volatility.

Method 1: Set a pair of threshold level, \(n_1\) and \(n_2\), where \(n_1>n_2\). In Log.Return data, any value greater than \(n_1\) or less than \(n_2\) would be put into group High, and the left would be put into group Low.

#find the thresholds, plot out the graph
n1=mean(goog$Log.Return[2:l])+0.02
n2=mean(goog$Log.Return[2:l])-0.02
count=0
while (count<l*0.4){
  count=0
  for (i in 2:l){
    if (goog$Log.Return[i]>n1|goog$Log.Return[i]<n2){
      count=count+1
    }
  }
  if (count<l*0.4){
    n1=n1-0.001
    n2=n2+0.001
  }
}
print(count)
[1] 1375
plot(goog$Log.Return[2:l],type="l",lty=3,ylab="Log Return",main="Alphabet Inc. Log Return (Grouping)")
abline(h=n1,col="red",lty=1)
abline(h=n2,col="blue",lty=1)
legend("topright",legend=c("Log Return","n1","n2"),col=c("black","red","blue"),lty=c(3,1,1))

#get the data of group high and group low
High=c()
Low=c()
for (i in 2:l){
  if (goog$Log.Return[i]>n1|goog$Log.Return[i]<n2){
    High=c(High,goog$Log.Return[i])
  }
  else{
    Low=c(Low,goog$Log.Return[i])
  }
}