ABSTRACT
The study proposed a Stochastic Search Variable Selection Diffuse (SSVS-Diffuse) method for selecting restriction in Vector Autoregressive (VAR) models. This was done by eliciting a new class of Stochastic Search Variable Selection (SSVS) prior using diffuse prior for the variance covariance which allows for non-diagonal treatment of the variance covariance matrix. The performance of the SSVS-Diffuse was evaluated using a Monte Carlo experiment with 50 replications after deriving the posterior distribution which have no closed form solution. The study generated different sample sizes of VAR, namely T=50,100, 200 and 500 from a two variable, three variable and four variable VAR models with VAR order set at one, VAR(1), two, VAR(2), three, VAR(3) and four, VAR(4) and these models were fitted. The VAR model was simulated from a Multivariate Normal distribution under two scenarios when the variables were independent and when the variables were correlated with various levels of correlations; very high, , high, , moderate, and low . The forecast performance of these scenarios were evaluated in two ways depending on the type of forecast. For the point forecast, the Mean Square Forecast Error (MSFE) was used as the performance measure and for the density forecast the energy score, a multivariate performance measure was used since VAR models are multivariate models. The SSVS-Diffuse prior outperformed the existing Bayesian VAR and classical VAR models namely classical VAR, Minnesota, SSVS-SSVS and SSVS-Wishart in terms of density forecast with minimum energy scores. The study further applied SSVS-Diffuse using the posterior inclusion probability to determine the VAR coefficients that are important to be included in the model. The optimal lags obtained using the SSVS-Diffuse were compared to the optimal lags obtained using classical methods of selecting lag order such as, Final prediction error (FPE), Akaike Information Criterion (AIC), Schwarz information criterion (SC), Sequential modified LR test statistic (each test at 5% level) (LR) and Hannan-Quinn information criterion (HQ).In all the cases considered, the posterior inclusion probability of SSVS-Modified correctly identified the optimal lags. The classical method exhibits fluctuations with SC and HC failing in some cases considered. The Study concludes by applying SSVS-Diffuse to real life data, where SSVS-Diffuse out-performed the existing methods based on historical performance.
TABLE OF
CONTENTS
Title
page i
Declaration ii
Certification iii
Dedication iv
Acknowledgement v
Table of
Contents vi
List of
Tables x
Abstract xi
CHAPTER 1: INTRODUCTION
1
1.1
Background of Study 1
1.2 Statement of
the Problem 5
1.3
Objectives of the Study 6
1.4
Justification of Study 6
1.5 Definitions of Terms
7
1.5.1 Bayesian Terms 7
1.5.2 Vectorization 8
1.5.3
Kronecker Product 8
1.5.3 VAR coefficients 8
1.5.4
Stochastic search variable selection 9
1.5.5
Markov chain monte carlo 9
1.5.6
Distributions 10
CHAPTER
2: LITERATURE REVIEW 12
2.1
Introduction 12
2.2
Theoretical Framework 12
2.3
Prior Elicitation in Bayesian VAR 14
2.3.1
Minnesota prior 15
2.3.2
Diffuse prior 17
2.3.3 Natural conjugate prior 18
2.3.4 Normal diffuse prior 19
2.3.5 Extended natural conjugate prior 20
2.3.6 Independent normal wishart 21
2.4
Stochastic Search Variable Selection 26
2.5
Forecasting and Evaluation 34
2.6
Empirical Application 39
2.6.1 Sequential testing procedure 40
2.6.2 Model selection for VAR models 41
CHAPTER
3 METHODOLOGY 45
3.1
Introduction 45
3.2
Likelihood Function 45
3.3
Stochastic Search Variable Selection Diffuse Procedure 46 3.4 Gibbs
Sampling for SSVS-Diffuse 51
3.4.1 Bayesian computation: markov chain monte
carlo diagnostics 54
3.5
Design of the Monte Carlo Studies 56
3.5.1 Energy score 57
3.5.2 Mean square forecast error 58
3.6
Determination of Optimal Lag Length 58
CHAPTER
4 RESULTS AND DISCUSSION 61
4.1
Introduction 61
4.2
Simulated Numerical Example 61
4.3
Application to the SSVS-Diffuse 92
CHAPTER
5 CONCLUSION AND RECOMMENDATIONS 99
5.1
Summary 99
5.2
Contribution to Knowledge 100
5.3
Area of Further Research 100
5.4
Conclusion 101
References 102
Appendices 105
LIST OF
TABLES
2.1: Summary
of some existing priors in bayesian VAR 33
4.1:
MSFE across forecast horizons 62
4.2:
Density forecast across various forecast horizons 63
4.3:
MSFE for evaluating point forecast of different BVAR priors 63
4.4:
Energy scores for evaluating entire density for independent variables 65
4.5:
Scores for evaluating density forecast
with low correlation
66
4.6:
Scores for evaluating density forecast
with low correlation
66
4.7:
Scores for evaluating density forecast
with low correlation
67
4.8:
Scores for evaluating density forecast
with low correlation
67
4.9:
Predictive Mean for Data 1 93
4.10:
MSFE for Evaluating Point forecast for Data 1 94
4.11:
Scores for Evaluating Density forecast for Data 1 94
4.12:
Predictive Mean for Data 2 96
4.13:
MSFE for Evaluating Point forecast for Data 2 97
4.14:
Scores for Evaluating Density forecast for Data 2 97
CHAPTER 1
INTRODUCTION
1.1 BACKGROUND OF STUDY
Bayesian
Statistics is a school of thought in the field of Statistics that uses the
subjective view of probability to interpret uncertainty. Bayesian Statistics
interprets probability based on personal belief, this is in contrast to the
relative frequency interpretation, which is at the heart of classical
Statistics. Bayesian Statistical inference specifies how belief should be
changed in the light of new information. Bayesian Statistics makes available a
mathematical means of incorporating our individual belief to evidence at hand
in order to arrive at a new belief (posterior). Bayesian Statistics was named
after an English Clergyman, Rev. Thomas Bayes (1702-1761). This was after the
article “An Essay towards solving a
problem in Doctrine of Chances”, was posthumously published in his honour.
Bayesian
method provides a complete paradigm shift for both Statistical Inference and decision
making under uncertainty. This is based on a subjective view of probability,
which argues that our uncertainty about anything unknown can be expressed using
the rules of probability. Bayesian Statistics looks at the unknown parameter of
the model that the researcher wishes to estimate, as a random variable which
has probability distribution. This is against the view of the classical
Statistics that sees the unknown parameter as a constant while the estimates of
the unknown as a random variable. The probability statement about the unknown
parameter is interpreted as a degree of belief. The belief about the unknown
parameter is updated after seeing the data by using Bayes theorem. Bayesian
Statistics involves combining the past (the things which we know before seeing
the data i.e. the prior) with the present (data generating process i.e. the
likelihood) to arrive at the future (Posterior).
The distinguishing factor of Bayesian
methods and classical methods is the incorporation of the prior information
into the model. Prior Elicitation is the
process of formulating personal beliefs about parameter of interest into
probability distribution (Garthwaite et
al., 2005). It is pertinent that priors are elicited correctly. A good
prior will perform better than a no prior and a bad prior will be worse than a
no prior. A major impediment to widespread use of the Bayesian paradigm has
been that of the determination of the appropriate form of the prior
distribution. This is often an arduous task. Typically, these prior
distributions are specified based on information accumulated from past studies
or from the opinions of subject-area experts. The prior refers to the subjective view or
belief of the researcher about the parameter of interest that is expressed in
the form of probability distribution. This belief about the parameter of
interest can be based on information available to the researcher from previous
knowledge or expert opinion. The prior in this case is referred to as
informative prior. The informative prior
usually dominates the data. The researcher may not have much information about
the parameter of interest as result of knowing a little or even ignorance about
the parameter of interest. In this case, the non-informative prior distribution
is used to represent “knowing little or ignorance”. Here, the data will
dominates the prior. The subjective view of the researcher can be made after
seeing the data unlike the informative and non-informative prior that is done
before seeing the data, this is known as empirical prior.
The
posterior distribution is the marriage of the sample information and prior
information, it is of basic interest in Bayesian Statistics as it updates the
belief of the researcher after seeing the data. The posterior is proportional
to likelihood function (the data generating process) times the prior
distribution. The posterior distribution
is the fundamental thing in Bayesian Statistics because, it is the fulcrum of
statistical analysis Bayesian Statistics like parameter estimation, test of
hypothesis, model comparison, forecasting etc.
Bayesian Statistics has continued to grow, as bayesian method is
now applied in virtually every aspect of Statistics from Biostatistics,
Econometrics, Experimental Design, Sampling Methods and Techniques, Time Series
Analysis etc. Application of Bayesian method in research has increased
tremendously as the number of Bayesian journal articles, conference
presentations, text books and
statistical software
packages has increased in the past few years. In general, work in
Bayesian statistics now focuses on the development of bayesian counterparts to
the existing classical statistical methods, application of bayesian methods in
data analysis, bayesian computation, prior elicitation etc.
Bayesian
Statistics became more prominent in the past 50 years, the earlier challenge
encountered in the area of Bayesian Statistics was that some of the posterior
distributions obtained had no close form solution and this made Bayesian
Statistics to be jettisoned by most researchers. The computing revolution of
the past 50 years has overcome this hurdle and has led to a blossoming of
Bayesian methods in many fields (Koop, 2003). Bayesian data analysis is now
accessible to scientists because of recent advances in computational
algorithms, software, hardware, and textbooks. Indeed, whereas the 20th century
was dominated by classical Statistics, the 21st century is becoming Bayesian
according to Kruschke (2011). Poirier (2006) reported that there has been an
upward growth of Bayesian Methods in Statistics and Economics since 1970; this,
he attributed to increase in Bayesian thinking among authors.
Bayesian
Statistics has been widely applied in the area of Econometrics, leading to
Bayesian Econometrics. Bayesian Econometrics has enjoyed huge popularity that
has seen Bayesian methods applied in Econometrics (Zeller, 1971; Poirier, 1995;
Koop, 2003; Lancaster, 2004; Geweke, 2005) to mention but a few.
There
are several interdependent economic variables used in macroeconomic modeling.
Sims (1980) developed the Vector Autoregressive (VAR) model as a means of
modeling interdependency between Time Series data.
VAR models are multiple time series models
that is used to study dynamic interrelationships between
series under consideration in order to make forecast and structural analysis.
VAR models have become the work horse of
macroeconomic forecasting (Karlsson, 2013). VAR models often performs better
than other macroeconomic models but it is still prone to several problems such
as over parameterization. The classical VAR models requires the estimation of
so many coefficients and doing this without restriction will lead to excess
parameter that the available data cannot even calculate all the parameter. According to Koop and Korobilis (2010), to get the number of parameters to be
estimated, the formula,
is used. Where n is
the number of variables and p is the optimal lag length. For instance, for a
three variable VAR model with an optimal lag length of four there will a total
of 39 parameters to be estimated and for a five variable VAR with optimal lag length
of 4 there will be a whopping number of 105 parameters to be estimated. Imagine
the number of parameters that will be estimated for a 10 variable VAR with lag
length of 10. This obviously makes VAR models to be over parameterized. Over
parameterization makes estimate imprecise and has consequences on the precision
of the inference and reliability of prediction. The estimates can be improved
if the analyst has any information about the parameters beyond that contained
in the sample. Bayesian estimation provides a convenient framework for
incorporating prior information with as much weight as the analyst feels it
merits (Hamilton, 1994).
Bayesian
VAR (BVAR) is a flexible way to both reduce the dimensionality of the parameter
space and incorporate additional information. A BVAR specification is the
shrinkage of dynamic parameters towards a specific representation of the data
which reflects researchers’ prior beliefs and deals with the
over-parameterization problem.
The
bayesian VAR literature has seen the elicitation of various priors to shrink
the parameters of the VAR models to avoid over parameterization. The following
prior gives various levels of shrinkage Minnesota (Litterman) prior and the
various modifications of Litterman prior, steady state prior, hierarchical
prior, stochastic search variable selection prior. Stochastic Search Variable Selection
prior is a shrinkage method that gives data based restrictions on the
parameters of the VAR model.
1.2 STATEMENT OF THE
PROBLEM
The
Stochastic Search Variable Selection (SSVS) priors as applied in bayesian VAR
models have been reported to have better forecast when compared to the existing
Minnesota priors and its various modifications as shown in studies carried out
by George et al. (2008), Koop and Korobilis
(2010), Korobilis (2013), George et al.
(2018) to mention a few. SSVS has better forecast performance, though it is
faced with the challenges of the Normal-Wishart
restrictions on the variance–covariance matrix,
. This implies that every equation must have the same set of
explanatory variables. For a researcher who wishes to put restrictions on
various equations using different explanatory variables in order to avoid over
parameterization, this Normal-Wishart restriction needs to be dealt with. (Koop
and Korobilis, 2010). To overcome the Normal-Wishart
restriction on the variance covariance matrix, this study elicited Stochastic
Search Variable Selection Diffuse model using diffuse prior for the variance
covariance which allows for non-diagonal treatment of the variance covariance
matrix.
1.3 OBJECTIVES OF THE
STUDY
The
global objective of this study is to propose a Stochastic Search Variable
Selection Diffuse that overcomes over parameterization in VAR models using restrictions on various equations involving different
explanatory variables.
The
specific objectives are as follows
i.
To use SSVS-Diffuse to address over parameterization of VAR model
ii.
To identify the optimal
lag length using SSVS-Diffuse
iii.
To examine the
performance of the elicited prior under the posterior model inference when
compared with existing methods of Bayesian VAR
iv.
To demonstrate the application of Modified
Stochastic Search Variable Selection model for bayesian VAR to a real life data
1.4 JUSTIFICATION OF
STUDY
In
Vector Autoregressive models there are so many parameters to estimate and doing
so without restricting some of the parameters to zero leads to over parameterization
which affects inference. There are two parameters of interest in VAR models,
the VAR coefficients and variance and the variance covariance matrix. The study
provides restriction on the variance covariance matrix using diffuse prior thus
allowing different equations of the model to have different explanatory
variables. .Also,
the study will provide an alternative method for finding the optimal lag
length under the Bayesian paradigm using posterior inclusion probability.
1.5 DEFINIFION OF TERMS
In
this section, some terminologies used in this study will be defined.
1.5.1 Bayesian terms
Parameter: A parameter is an attribute or
quantity that is calculated from the population examples are population mean,
population variance, population standard deviation etc.
Likelihood is the expression for the distribution
of the data conditional on the parameter
Prior is a probability statement about a
parameter which is expressed as degree of ones belief about the parameter
before observing the data. It is a non-data information.
Diffuse
(vague) prior is a prior that encompasses all reasonable beliefs that can be
done by using a uniform or flat distribution for the parameter of interest.
Natural
conjugate prior is a prior that is from a family of density function that after
multiplication with the likelihood produce a posterior in the same family.
Informative
prior is a prior based on the information available about a parameter of
interest through expert opinion or previous study. Informative prior dominates
the data.
Non-informative
prior is a prior form based on no information about the parameter of interest.
Hierarchical
priors is when the prior is specified as a series of priors known as multistage
priors. It can be two stage priors, in the sense that a prior is placed on
another prior. They are more flexible than non-hierarchical priors, making the
posterior distribution less sensitive to the main prior.
Hyperparameters are parameters which themselves are given a
probabilistic specification in terms of further parameters. It is a prior
distribution of parameter
depending on one or more
parameters
Posterior
distribution is a function of the parameter given the data. It represents the
belief about a parameter of interest after seeing the data. It is a combination
of prior and the likelihood
Posterior
model probability is to access the
degree of support for a particular model i.e. the weighted probability of a
model.
Posterior
inclusion probability is the probability statement that shows the importance of
a particular VAR coefficient in the VAR model. In this study we proposed the
use of posterior inclusion probability to determine the number of important VAR
coefficients in bayesian VAR.
Bayes
theorem is a probability theorem linking the unconditional distribution of a
parameter with the conditional distribution. It connects the likelihood
function with the prior probability
1.5.2 Vectorization: vectorization of a matrix is a linear transformation
which convert a matrix to a column vector. For matrix C with dimensions
, the vectorization of matrix C denoted by vec(C) is obtained
by stacking the columns of matrix C on top of one another resulting to a column
vector of dimension
.
1.5.3 Kronecker product,
: is the operation on two
matrices of arbitrary size resulting to a block matrix. If C is
matrix and D
matrix then the Kronecker product,
is the
block matrix.
1.5.4 VAR coefficients: Numerical
values that shows the relationship between current own variable and lagged
variables with lagged variables of other variables
1.5.5 Stochastic search
variable selection (SSVS)
Variable selection deals with which subset
of variables is to be included in the act of model building. In bayesian
paradigm variable selection is literally parameter estimation i.e. the task is
to estimate the marginal posterior probability that
a subset should be in the model.
Stochastic search means model space is too
large to assess in a deterministic manner so therefore a data based restriction
on the parameter is required.
SSVS
is a method of restricting some of the coefficients in the model to be equal to
zero in other to reduce the number of excess parameters.
Mixture
distribution mixes or averages two distributions over a mixing distribution
(1.5.1)
Here x ( ) is the mixing
distribution.
x ( ) can be discrete or
continuous .
1.5.6
Markov chain monte carlo (MCMC) is
a posterior simulation algorithm that provides iterative procedures to
approximately sample from complicated
posterior densities by avoiding the independence assumption. It approximates
the expectations of functions of interest with their sample
averages.
Gibbs sampling is the strategy of
sequentially drawing from the full conditional
posterior distributions.
Metropolis Hastings is an algorithm that
draw samples from any probability distribution P(x), provided the value of a
function f(x) that is proportional to the density of P
can be computed. These sample values are produced iteratively, with the
distribution of the next sample being dependent only on the
current sample value. Specifically, at each iteration, the
algorithm picks a candidate for the next sample value based on the current sample
value. Then, with some probability, the candidate is either
accepted or rejected
Login To Comment