Unconstrained NonLinear Optimization¶
Unconstrained nonlinear optimization is a broad topic with a numerous to choose from for solving. DDogleg specializes in solving small to large scale problems, where large scale is defined around 100,000 parameters in a sparse system.
This manual will start with a summary of DDogleg’s capabilities. If you have trouble understanding the summary then the next section will be helpful. There a terse summary of the problem is defined and references papers and books for you to read and learn more about this subject. DDogleg is designed so that a novice can easily use it, but to take full advantage of all of its features, many of which are not available in other packages, then you need to understand the math.
Peter Abeles, “DDogleg Technical Report: Nonlinear Optimization” Revision 20181
Unconstrained Minimization Methods¶
Method 
Iteration 
Convergence 
Singular 
NegativeDefinite 
Dense 
Sparse 
QuasiNewton BFGS 
\(N^2\) 
Super Linear 
Yes 
Yes 
Yes 

Trust Region BFGS Cauchy 
\(N^2\) 
Linear 
Yes 
Yes 
Yes 
Yes 
Trust Region BFGS Dogleg 
\(N^2\) 
Super Linear 
[1] 
[1] 
Yes 
Yes 
Iteration: Runtime complexity of update step. N is number of parameters.
Convergence: how fast it converged.
Singular: indicates that it can process singular systems.
NegativeDefinite: indicate that it can process negative definite systems
Dense and Sparse: indicate that dense and/or sparse matrices can be processed.
Unconstrained Least Squares Methods¶
Method 
Iteration 
Convergence 
Singular 
Dense 
Sparse 
Schur 
Trust Region LS Cauchy 
\(N^3\) 
Linear 
Yes 
Yes 
Yes 
Yes 
Trust Region LS Dogleg 
\(N^3\) 
Super Linear 
Yes [2] 
Yes 
Yes 
Yes 
LevenbergMarquardt 
\(N^3\) 
Super Linear 
Yes 
Yes 
Yes 
Yes 
Schur: Indicates if a variant is available that uses the Schur Complement
Sparse Structures¶
Name 
Description 
General 
General purpose sparse solver. Can be prone to fill in. 
Schur Complement 
Huge speed up for matrices with diagonal A and D submatrices. [A B;C D] 
[1] Switches to Cauchy in this situation and convergence slows down.
[2] For dense problems you can handle singular systems using a robust solver like QRP or SVD. For sparse systems it currently can’t handle singular systems but a fix for this is planned using LDL.
Introduction¶
This manual covers the API and is primarily example based. A technical report is provided to cover the numerical and algorithmic implementation details. For those who are new to the subject of optimization and for see this being an important part of their career we suggest picking up a copy of “Numerical Optimization”. It covers many of the methods included in this library. The source code is also intended to be browsed and contains references to other source material.
If you wish to use the sparse solvers you need to be familiar with the strength and weaknesses of sparse linear algebra. An excellent source to learn more about this subject and see how complex of a problem it is can be found in “Direct Methods for Sparse Linear Systems”.
Apologies for this manual being incomplete. Please drop of a note if you find this library useful. Knowing that someone is using it really helps the motivation!
Recommended Reading
Kaj Madsen, Hans Bruun Nielsen, Ole Tingleff, “Methods for NonLinear Least Squares Problems” 2nd ed., 2004 Lecture Notes
Jorge Nocedal and Stephen J. Wright, “Numerical Optimization” 2nd Ed. Springer
Timothy A. Davis, “Direct Methods for Sparse Linear Systems” 2006 SIAM
Peter Abeles, “DDogleg Technical Report: Nonlinear Optimization” Revision 20181
Usage Examples¶
NonLinear Minimization (QuasiNewton)
NonLinear Least Squares (LevenbergMarquardt)
NonLinear Least Squares: Sparse Schur Complement (Sparse Schur Complement)
Convergence Tests¶
Deciding when an optimization is done searching and should be stopped can be a tricky issue. If you make the criteria too strict then it can take an excessive amount of time. Too loose and the solution can be poor. DDogleg attempts to provide two parameters across all of its routines, FTest and GTest.
FTest checks to see when the cost function stops changing significantly and the GTest when the gradient no longer significant. Some libraries omit the FTest when it is known that the optimal solution will have a cost of zero, e.g. least squares. It is true that in this scenario the ftest will never be true. However, DDogleg keeps the ftest no matter what since it is inexpensive to compute and when fitting realworld data there is almost always noise and the minimum isn’t zero.
When you look through the low level implementations there are sometimes other parameters available. If you don’t mind writing code for that specific algorithm only you can have direct access to those. In general though, they are not necissary and reasonable defaults are selected.
Numerical Derivatives¶
Forward (Default)
ForwardBackwards
Schur Complement¶
The Schur Complement is a “trick” which enables you to avoid decompose an entire matrix when solving a linear system. Instead only much smaller internal submatrices are decomposed. In sparse systems this trick also reduces fill in by carefully taking advantage of the matrice’s structure.
https://en.wikipedia.org/wiki/Schur_complement
Schur Complement based optimization routines are implemented by extending the SchurJacobian class. The SchurJacobian will compute the left and right hand side of the Jacobian. Internally this when be converted into an approximate Hessian.
where H is the approximate Hessian, J is the full Jacobian matrix, and L and R are the left and right outputs from your Jacobian calculation. All the other implementation details are handled internally. See the JavaDoc for additional details.
Weighted LeastSquares¶
Being able to directly specify a weight vector is planned for the future. For now you can scale the residuals directly and accomplish the same thing.
Configuring¶
The easiest and strongly recommend way to create a new instance of any optimization routine is by using one of the following factors:
Each function will create a different algorithm and takes in a configuration class. These configuration classes enable you to change most important parameters. The JavaDoc describes what each parameter does.
Customizing¶
Whether or not it’s a good idea, there are time you want to customize the behavior of an optimization. For example, you might want to normalize parameters every iteration or print out aditional debugging information. The code has been intentionally written to enable you to do this.
This is an advance feature and will require browsing through the source code and being very familiar with how these algorithms work. If after some effort you’re not sure how to do this post a question on the user forum and someone will try to help.
Hessian Scaling¶
For Trust Region (including LevenbergMarquardt) algorithms Hessian scaling cab be applied automatically. At each iteration a diagonal matrix \(D\) is found such that when applied \(D_k^{1} B_k D_k^{1}\) the resulting matrix will have diagonal elements of one. Configuration classes can be used to clamp the scaling so that \(d_{min} \le D_{ii} \le d_{max}\). After scaling has been applied the resulting matrix is more numerically favorable to linear algebra operations. You will not get exactly the same answer with and without hessian scaling due to scaling changing the gradient direction. It will not change the solution to the Newton step.
Hessian Scaling applied to TrustRegion subproblem changes the trust region in an ellipse: