tailieunhanh - Bài giảng Tối ưu hóa nâng cao: Chương 9 - Hoàng Nam Dũng
Bài giảng "Tối ưu hóa nâng cao - Chương 9: Stochastic gradient descent" cung cấp cho người học các kiến thức: Proximal gradient descent, stochastic gradient descent, convergence rates, early stopping, mini-batches. . | Bài giảng Tối ưu hóa nâng cao: Chương 9 - Hoàng Nam Dũng Stochastic Gradient Descent Hoàng Nam Dũng Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội Last time: proximal gradient descent Consider the problem min g (x) + h(x) x with g , h convex, g differentiable, and h “simple” in so much as 1 proxt (x) = argminz kx − zk22 + h(z) 2t is computable. Proximal gradient descent: let x (0) ∈ Rn , repeat: x (k) = proxtk (x (k−1) − tk ∇g (x (k−1) )), k = 1, 2, 3, . . . Step sizes tk chosen to be fixed and small, or via backtracking. If ∇g is Lipschitz with constant L, then this has convergence rate √ O(1/ε). Lastly we can accelerate this, to optimal rate O(1/ ε). 1 Outline Today: I Stochastic gradient descent I Convergence rates I Mini-batches I Early stopping 2 Stochastic gradient descent Consider minimizing an average of functions m 1 X min fi (x). x m i=1 Pm Pm As ∇ i=1 fi (x) = i=1 ∇fi (x), gradient descent would repeat m 1 X x (k) = x (k−1) − tk · ∇fi (x (k−1) ), k = 1, 2, 3, . . . m i=1 In comparison, stochastic gradient descent or SGD (or incremental gradient descent) repeats: x (k) = x (k−1) − tk · ∇fik (x (k−1) ), k = 1, 2, 3, . . . where ik ∈ {1, . . . , m} is some chosen index at iteration k. 3 Stochastic gradient descent Two rules for choosing index ik at iteration k: I Randomized rule: choose ik ∈ {1, .m} uniformly at random. I Cyclic rule: choose ik = 1, 2, . . . , m, 1, 2, . . . , m, . . . Randomized rule is more common in practice. For randomized rule, note that E[∇fik (x)] = ∇f (x), so we can view SGD as using an unbiased estimate of the gradient at each step. Main appeal of SGD: I Iteration cost is independent of m (number of functions). I Can also be a big savings in terms of memory usage. 4 Example: stochastic logistic regression Given (xi , yi ) ∈ Rp × {0, 1}, i = 1, . . . , n, recall logistic regression n
đang nạp các trang xem trước