tailieunhanh - Bài giảng Tối ưu hóa nâng cao: Chương 8 - Hoàng Nam Dũng

Bài giảng "Tối ưu hóa nâng cao - Chương 8: Proximal gradient descent (and acceleration)" cung cấp cho người học các kiến thức: Subgradient method, decomposable functions, proximal mapping, proximal gradient descent, . | Bài giảng Tối ưu hóa nâng cao: Chương 8 - Hoàng Nam Dũng Proximal Gradient Descent (and Acceleration) Hoàng Nam Dũng Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội Last time: subgradient method Consider the problem min f (x) x with f convex, and dom(f ) = Rn . Subgradient method: choose an initial x (0) ∈ Rn , and repeat: x (k) = x (k−1) − tk · g (k−1) , k = 1, 2, 3, . . . where g (k−1) ∈ ∂f (x (k−1) ). We use pre-set rules for the step sizes (., diminshing step sizes rule). If f is Lipschitz, then subgradient method has a convergence rate O(1/ε2 ). Upside: very generic. Downside: can be slow — addressed today. 1 Outline Today I Proximal gradient descent I Convergence analysis I ISTA, matrix completion I Special cases I Acceleration 2 Decomposable functions Suppose f (x) = g (x) + h(x) where I g is convex, differentiable, dom(g ) = Rn I h is convex, not necessarily differentiable. If f were differentiable, then gradient descent update would be x + = x − t · ∇f (x) Recall motivation: minimize quadratic approximation to f around x, replace ∇2 f (x) by 1t I 1 x + = argminz f (x) + ∇f (x)T (z − x) + kz − xk22 . | {z 2t } f˜t (z) 3 Decomposable functions In our case f is not differentiable, but f = g + h, g differentiable. Why don’t we make quadratic approximation to g , leave h alone? ., update x + = argminz g˜t (z) + h(z) 1 = argminz g (x) + ∇g (x)T (z − x) + kz − xk22 + h(z) 2t 1 = argminz kz − (x − t∇g (x))k22 + h(z). 2t 1 2t kz − (x − t∇g (x))k22 stay close to gradient update for g h(z) also make h small 4 Proximal mapping The proximal mapping (or prox-operator) of a convex function h is defined as 1 proxh (x) = argminz kx − zk22 + h(z). 2 5 Proximal mapping The proximal mapping (or prox-operator) of a convex function h is defined as 1 proxh (x) = argminz kx − zk22 + h(z). 2 Examples: I h(x) = 0: proxh

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG