Volume 2 applies the linear algebra concepts presented in Volume 1 to optimization problems which frequently occur throughout machine learning. This book blends theory with practice by not only carefully discussing the mathematical under pinnings of each optimization technique but by applying these techniques to linear programming, support vector machines (SVM), principal component analysis (PCA), and ridge regression. Volume 2 begins by discussing preliminary concepts of optimization theory such as metric spaces, derivatives, and the Lagrange multiplier technique for finding extrema of real valued functions. The focus then shifts to the special case of optimizing a linear function over a region determined by affine constraints, namely linear programming. Highlights include careful derivations and applications of the simplex algorithm, the dual-simplex algorithm, and the primal-dual algorithm. The theoretical heart of this book is the mathematically rigorous presentation of various nonlinear optimization methods, including but not limited to gradient decent, the Karush-Kuhn-Tucker (KKT) conditions, Lagrangian duality, alternating direction method of multipliers (ADMM), and the kernel method. These methods are carefully applied to hard margin SVM, soft margin SVM, kernel PCA, ridge regression, lasso regression, and elastic-net regression. Matlab programs implementing these methods are included.