ABSTRACT
Unconstrained optimization is an optimization process where no restriction is placed on the range of the unknown variable. Several methods can be adopted in the solution of unconstrained optimization problems. In this work, emphasis is on the Broyden class updating methods of Davidon-Fletcher-Powel (DFP) algorithm, Broyden Fletcher Goldfarb Shannon(BFGS) algorithm and their linear combinations. Linear constants ranging from 0.1 to 0.9 are simulated in MATLAB and the numerical solutions are extensively presented for the following commonly tested functions: Freudenstein and Roth function, Beales function and Woods function. The result of the simulations in MATLAB environment and the graphical analysis in Microsoft excel reveal that BFGS formula gives the best performance as it gives the same convergence rate and better average time than DFP in all the test functions. The average time and the number of iterations of the test functions increase as the linear constant increases for the Linear Combination (LC) algorithm. It is recommended that BFGS algorithm should be adopted when finding solution to unconstrained optimization problems.
TABLE OF CONTENTS
Title
page
Cover
page i
Declaration ii
Dedication iii
Certification iv
Acknowledgements v
Table
of contents vi
List
of tables vii
List of figures viii
List of abbreviations, symbols and terms ix
Abstract x
CHAPTER 1: INTRODUCTION
1.1 Background 1
1.2 Statement of the
Problem 4
1.3 Aim and Objectives 6
1.4 Scope
of the Study 6
1.5 Structure of Thesis 6
1.6 Definition
of Term 7
CHAPTER
2: LITERATURE REVIEW AND THEORETICAL FRAMEWORK
2.1 Introduction 9
2.2 Literature
Review 9
2.3 Test
functions 14
2.3.1 Freudenstein
and Roth function 14
2.3.2 Beales
function 14
2.3.3 Woods
function 14
CHAPTER 3: METHODOLOGY
3.1 Introduction 15
3.2 Derivation
of Broyden Class Updating Method for Unconstrained
Optimization Problems. 15
3.2.1 Construction of the Inverse 15
3.2.2 Rank One Correction. 17
3.3 Davidon Fletcher Powell Method 19
3.3.1 DFP algorithm (Nocedal and Yuan (1998)) 20
3.4 Broyden Fletcher Goldfarb Shannon 21
3.4.1 BFGS algorithm (Adewale and Oruh, 2013) 22
3.5 Linear Combination of DFP and BFGS 23
3.5.1 Linear combination of DFP and BFGS
algorithm (Wen Huang et al., 2014) 24
3.6
Material and Method 25
CHAPTER 4:
ANALYSIS AND DISCUSSIONS OF RESULTS
4.1 Analysis 26
4.1.1 Analytical
procedure 26
4.1.2 Numerical
procedure 51
4.2 Matlab
Simulation result 51
4.2.1 Freudenstein
and Roth function 52
4.2.2 Beales
function 58
4.2.3 Woods function
66
4.3 Graphical Analysis of Results 75
4.4 Discussions
of Results 76
CHAPTER 5:
CONCLUSION AND RECOMMENDATION
5.1 Conclusion 78
5.2 Recommendation 78
References 79
Appendices 82
LIST OF TABLES
4.1a Result generated using DFP
(Freudenstein and Roth) 53
4.1b Result generated using BFGS
(Freudenstein and Roth) 53
4.1ci Result generated using LC at
0.1(Freudenstein and Roth) 54
4.1cii Result generated using LC at
0.2(Freudenstein and Roth) 54
4.1ciii Result generated using LC at
0.3(Freudenstein and Roth) 55
4.1civ Result generated using LC at
0.4(Freudenstein and Roth) 55
4.1cv Result generated using LC at
0.5(Freudenstein and Roth) 56
4.1cvi Result generated using LC at
0.6(Freudenstein and Roth) 56
4.1cvii Result generated using LC at
0.7(Freudenstein and Roth) 57
4.1cviii Result generated using LC at
0.8(Freudenstein and Roth) 57
4.1cix Result generated using LC at
0.9(Freudenstein and Roth) 58
4.2a Result
generated using DFP 59
4.2b Result
generated using BFGS 59
4.2ci Result
generated using linear combination for linear constant 0.1 60
4.2cii Result generated using linear combination for linear
constant 0.2 60
4.2ciii Result generated using linear combination for linear
constant 0.3 61
4.2civ Result generated using linear combination for linear
constant 0.4 61
4.2cv Result
generated using linear combination for linear constant 0.5 62
4.2cvi Result generated using linear combination for linear
constant 0.6 63
4.2cvii Result generated using linear combination for linear
constant 0.7 63
4.2cviii Result generated using linear combination for linear constant 0.8 64
4.2cix Result generated using linear combination for linear
constant 0.9 65
4.3a Result
generated using DFP 66
4.3b Result
generated using BFGS 67
4.3ci Result generated using linear combination for linear
constant 0.1 67
4.3cii Result generated using linear combination for linear
constant 0.2 68
4.3ciii Result generated using linear combination for linear
constant 0.3 68
4.3civ Result generated using linear combination for linear
constant 0.4 69
4.3cv Result
generated using linear combination for linear constant 0.5 70
4.3cvi Result generated using linear
combination for linear constant 0.6 70
4.3cvii Result
generated using linear combination for linear constant 0.7 71
4.3cviii Result
generated using linear combination for linear constant 0.8 72
4.3cxi Result
generated using linear combination for linear constant 0.9 72
4.4 Average
time for the test functions 73
4.5 Number
of iterations for the test functions 74
LIST OF FIGURES
4.1a Excel Graph of Average Time of Test Functions 75
4.1b
Matlab Plot of Average Time of
Text Functions 75
4.
2a Excel Graph of Number of Iterations
of Test Functions 76
4.2b Matlab Plot of Average Time of Text
Functions 76
LIST OF
ABBREVIATION, SYMBOLS AND TERMS
Average time Time it
took the program to get the result
BFGS Broyden Fletcher Goldfarb Shannon
DFP Davidon Fletcher Powell
LBFGS Limited
memory Broyden Fletcher Goldfarb
Shannon
LC Linear
combination
NM Newton’s method
FN Fixed Newton
CN Classical
Newton
CB Classical
Broyden
QP Quadratic
Programming
SQP Sequential
Quadratic Programming
WORHP We Optimize Really Huge Problems
ODE Ordinary
Differential Equation
CPU Central
Processing Unit
CUTEr This
is an open source testing environment for optimization and linear algebra
solvers
Step length
Norm
of the gradient
Hessian matrix
A symmetric and positive definite Hessian
matrix
Inverse
Hessian approximation
A
function of each iteration
Function of x
Gradient of
function
Number
of iteration
and
Variables
CHAPTER
1
INTRODUCTION
1.1 BACKGROUND
Unconstrained optimization problems can be solved
using various methods. Amongst the methods of solution are steepest (gradient)
descent method, Newton method and quasi-Newton method. Gradient descent method
is a way to find a local minimum of a function.
The way it works is we start with an initial guess of the solution and we take
the gradient of the function at that point. We step the solution in the negative direction of the gradient and we
repeat the process. The algorithm will eventually converge where the gradient
is zero (which corresponds to a local minimum). Its brother, the gradient ascent,
finds the local maximum nearer the current solution by stepping it towards the
positive direction of the gradient. They are both first-order algorithms
because they take only the first derivative of the function. In optimization, Newton's
method is applied to the derivative
of a twice differentiable function f to find the roots of the derivative (solutions
to f ′(x)=0), also known as the critical
points of f. This
method stems from the Newton's method for solving systems of nonlinear
equations. Newton's method uses curvature
information to take a more direct route, that is, can identify better search
directions than can be obtained via the gradient method. Quasi-Newton methods are arguably the most
effective methods for finding a minimizer of a smooth nonlinear function
when second derivatives are either unavailable
or too expensive to calculate. Quasi-Newton methods build up second-derivative
information by estimating the curvature along a sequence of search directions.
Each curvature estimate is installed in an approximate Hessian by applying a
rank-one or rank-two update. One of the most successful updates is the Broyden
Fletcher Goldfarb Shanno (BFGS) formula, which is a member of the wider Broyden
class of rank-two updates. Despite the success of these methods on a wide range
of problems, it is well known that conventional quasi-Newton methods can
require a disproportionately large number of iterations and function
evaluations on some problems. This inefficiency may be caused by a poor choice
of initial approximate Hessian or may result from the search directions being
poorly defined when the Hessian is ill-conditioned (Philip and Michael, 2001).
In the quasi-Newton context, the availability of
an explicit basis for the gradient subspace makes it possible to represent the
approximate curvature in terms of a reduced approximate Hessian matrix with
order at most k + 1. Quasi-Newton algorithms that explicitly calculate a
reduced Hessian have been proposed by Fenelon (1981) and Nazareth (1986), who
also considered modified Newton methods in the same context. Siegel (1992)
proposed methods that work with a reduced inverse approximate Hessian. In
practical terms, the reduced-Hessian formulation can require significantly less
work per iteration when k is small relative to n. This property can be
exploited by forcing iterates to linger on a manifold while the objective
function is minimized to greater accuracy. While iterates linger, the search
direction is calculated from a system that is generally smaller than the
reduced Hessian. In many practical situations convergence occurs before the
dimension of the lingering subspace reaches n, resulting in substantial savings
in computing time.
More recently, Siegel (1994) proposed the conjugate-direction
scaling algorithm, which is a quasi-Newton method based on a
conjugate-direction factorization of the inverse approximate Hessian. Although
no explicit reduced Hessian is updated, the method maintains a basis for the
expanding subspaces and allows iterates to linger on a manifold. The method
also has the benefit of finite termination on a quadratic function (Leonard,
1995). More importantly, Siegel’s method includes a feature that can
considerably enhance the benefits of lingering. Siegel notes that the search
direction is the sum of two vectors: one with the scale of the estimated
derivatives and the other with the scale of the initial approximate Hessian.
Siegel suggests rescaling the second vector using newly computed approximate
curvature. Algorithms that combine lingering and rescaling have the potential
for giving significant improvements over conventional quasi-Newton methods.
Lingering forces iterates to remain on a manifold until the curvature has been
sufficiently established; rescaling ensures that the initial curvature in the
unexplored manifold is commensurate with curvature already found.
Several algorithms based on maintaining the
triangular factors of an explicit reduced Hessian have been proposed. How these
factors can be used to force iterates to linger while curvature information
continues to be accumulated along directions away from the manifold has been
demonstrated. With the Broyden Fletcher Goldfarb Shanno (BFGS) method, it is
shown that while lingering takes place, the new curvature is restricted to an
upper-trapezoidal portion of the factor of the reduced Hessian and the
remaining portion retains the diagonal structure of the initial approximate
Hessian. It follows that conjugate-direction scaling is equivalent to simply
reinitializing the diagonal part of the reduced Hessian with freshly computed
curvature information. Despite the similarities between reduced Hessian
re-initialization and conjugate direction scaling, it must be emphasized that
these methods are not the same, in the sense that they involve very different
storage and computational overheads. Moreover, the reduced-Hessian
factorization has both practical and theoretical advantages over Siegel’s
conjugate-direction factorization. On the practical side, the early search
directions can be calculated with significantly less work. This can result in a
significantly faster minimization when the dimension of the manifold grows
relatively slowly, as it does on many problems. On the theoretical side, the
simple structure exhibited by the reduced-Hessian factor allows the benefits of
re initialization to be extended to the large-scale case (Gill and Leonard,
1997).
A reduced-Hessian method allows expansion of the
manifold on which curvature information is known. Thus, when implementing
software, it is necessary either to allocate new memory dynamically as the
reduced Hessian grows or to reserve sufficient storage space in advance. In
practice, however, the order of the reduced Hessian often remains much less
than n, i.e., the problem is solved without needing room for an n × n matrix.
Notwithstanding this benefit, on very large problems it may be necessary to
explicitly limit the amount of storage used, by placing a limit on the order of
the reduced Hessian. Such limited-memory reduced-Hessian methods discard old
curvature information whenever the addition of new information causes a
predefined storage limit to be exceeded. Methods of this type have been
considered by Fenelon (1981) and Siegel (1994).
In recent years, a
class of iterative processes for solving non-linear equation or minimization
problem in
has been considered
frequently and applied to many practical problems. Its members are variously
called quasi-Newton methods, variable metric methods, modification methods, or
update methods. Most of the studies about these methods were framed in a
minimization setting. In this setting, Powell (1970 and 1978) proved a global
convergence theorem for the best known member of this class, the Davidon
Fletcher Powell (DFP) method. Only recently, Dennis and More (1977), in
continuation of earlier work by Dennis (1970 and 1971) developed highly
interesting local convergence results for a class of update methods without
assuming the minimization setting or requiring specific relaxation factors.
In
numerical analysis, Broyden’s method is a quasi-Newton method for finding roots
in k variables. Newton’s method for solving
uses the Jacobian matrix, J, at every
iteration. However, computing this Jacobian is a difficult and expensive
operation. The idea behind Broyden’s method is to compute the whole Jacobian
only at the first iteration, and to do a rank-one update at the other
iterations.
1.2 STATEMENT
OF THE PROBLEM
Many methods suggested different ways for solving unconstrained
optimization problems, among these are steepest descent method, Newton method
and quasi Newton method. Even among the quasi Newton, there are varieties of
algorithms such as Davidon Fletcher Powell’s method, Broyden Fletcher Goldfarb
Shannon’s method, etc.
To improve the existing limited memory Broyden Fletcher Goldfarb Shannon (BFGS) method, the Limited memory Broyden Fletcher Goldfarb Shannon (LBFGS) method has been developed. The limited
memory Broyden Fletcher Goldfarb
Shannon (LBFGS)
method has been specially created for solving nonlinear optimization problems
by an SQP method. In the limited memory Broyden Fletcher Goldfarb Shannon (LBFGS), the compact representation of the
limited memory in Broyden Fletcher Goldfarb
Shannon (BFGS)
formula for the Hessian matrix is replaced in quadratic programming (QP) sub
problems arising from the sequential quadratic programming (SQP) method.
Introducing a fix small number of auxiliary variables and imposing an
additional constraint, the QP problem is transformed. Then, the obtained QP
problem is solved using an interior point method. In this way, the
computational cost is less than the one needed by the limited memory BFGS. The
LBFGS method has been implemented successfully in “We Optimize Really Huge Problems” (WORHP). Its performance has been tested on
the optimal control problem arising from a robotic application, on multi-phase
optimal control problems arising from an aerospace application, and on two sets
of optimization problems arising from various applications. The results were
compared with other Hessian approximation methods available in WORHP. The test
results have shown that LBFGS is a robust and efficient method. It was able to
solve highly nonlinear optimal control problems and difficult multi-phase
optimal control problems with less computational time and less major iterations
than the other methods. The LBFGS method solved the same number or in some
cases even more problems from the CUTEr test set than the exact Hessian method
(Sonja, 2014). It is
important to compare the performance of the quasi Newton’s algorithms and also
determine the effect of their linear combination at various points.
This thesis intends to analyze and compare some algorithms for
solving unconstrained optimization problems, and determine the best method for
solving unconstrained minimization problems with the best result.
1.3 AIM AND
OBJECTIVES
The main purpose of this study is to analyze and compare some
algorithms for solving unconstrained optimization problems, and determine the
best method for solving unconstrained minimization problems with the best
result among them.
The following objectives
are stated to achieve the aim of the study.
1.
To review various methods of
solving unconstrained optimization problems.
2.
To compare the algorithms for
solving unconstrained optimization problems using the Broyden class updating
methods.
3.
To obtain the linear
combination of the Broyden class updating methods for all the tested functions.
4.
To determine the method with
the best performance for each tested function.
1.4 SCOPE OF THE STUDY
This
work is limited to unconstrained minimization problem with focus on the Broyden class updating methods (DFP and
BFGS). It is applied to the following commonly
tested functions:
·
Freudenstein and Roth function
·
Beales function
·
Woods function
1.5 STRUCTURE OF THESIS
The thesis is structured as follows: Chapter
two contains the reviewed literatures and the basic mathematical theory for nonlinear
optimization problems is formulated. Then the necessary and sufficient conditions for
the existence of the optimal solution are discussed. Afterwards some numerical
methods for finding a solution of nonlinear optimization problems are
presented. In Chapter three, the systematic methods
of achieving the aim and objectives of the research are described. The
algorithms for Davidon
Fletcher Powell (DFP), Broyden Fletcher Goldfarb Shanno (BFGS) and the linear
combination of Davidon Fletcher Powell (DFP) and Broyden Fletcher Goldfarb
Shanno (BFGS) for the minimization problem are presented. Here, the focus is
the Broyden class updating methods (DFP and BFGS).
In Chapter 4, Davidon Fletcher Powell (DFP),
Broyden Fletcher Goldfarb Shanno (BFGS) and the linear combination of Davidon
Fletcher Powell (DFP) and Broyden Fletcher Goldfarb Shanno (BFGS) are simulated
in MATLAB; graphical analysis is done in Microsoft excel and discussions of
results are made. The numerical solutions are extensively presented in MATLAB for
the following commonly tested functions:
·
Freudenstein and Roth function
·
Beales function
·
Woods function
In Chapter 5 the final conclusions are drawn and
recommendation made.
1.6 DEFINITION OF TERM
Broyden class of quasi Newton methods
A Broyden class is family of quasi Newton
methods that depend on a real parameter
Its Hessian approximation update formula is
where
stands
for the update obtained by the Broyden Fletcher Goldfarb Shanno (BFGS) method
and
for the
update of the Davidon Fletcher Powell (DFP) method. In this case, all members
of the Broyden class satisfy the well known secant equation,
central to many quasi Newton method.
Both the DFP and BFGS methods are just special
cases of Broyden family.
= 0 – DFP method and
1- BFGS
method.
Linear
combination
In
mathematics, a linear combination
is an expression constructed from
a set of terms by multiplying each
term by a constant and adding the results (e.g. a linear combination of x
and y would be any expression of the form ax + by, where a
and b are constants) (Strang, 2006). The concept of linear combinations
is central to linear algebra and related fields of mathematics. In optimization, you compute the search direction by a linear combination of the current
gradient and the previous search direction.
Login To Comment