A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

Li, Zhize; Li, Jian

Mathematics > Optimization and Control

arXiv:1802.04477 (math)

[Submitted on 13 Feb 2018 (v1), last revised 1 Dec 2018 (this version, v4)]

Title:A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

Authors:Zhize Li, Jian Li

View PDF

Abstract:We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly non-differentiable but convex component. We propose a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+. Our main contribution lies in the analysis of ProxSVRG+. It recovers several existing convergence results and improves/generalizes them (in terms of the number of stochastic gradient oracle calls and proximal oracle calls). In particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm, recently proposed by [Lei et al., 2017] for the smooth nonconvex case. ProxSVRG+ is also more straightforward than SCSG and yields simpler analysis. Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, which partially solves an open problem proposed in [Reddi et al., 2016b]. Also, ProxSVRG+ uses much less proximal oracle calls than ProxSVRG [Reddi et al., 2016b]. Moreover, for nonconvex functions satisfied Polyak-Łojasiewicz condition, we prove that ProxSVRG+ achieves a global linear convergence rate without restart unlike ProxSVRG. Thus, it can \emph{automatically} switch to the faster linear convergence in some regions as long as the objective function satisfies the PL condition locally in these regions. ProxSVRG+ also improves ProxGD and ProxSVRG/SAGA, and generalizes the results of SCSG in this case. Finally, we conduct several experiments and the experimental results are consistent with the theoretical results.

Comments:	32nd Conference on Neural Information Processing Systems (NeurIPS 2018)
Subjects:	Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1802.04477 [math.OC]
	(or arXiv:1802.04477v4 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.1802.04477

Submission history

From: Zhize Li [view email]
[v1] Tue, 13 Feb 2018 06:34:22 UTC (316 KB)
[v2] Sun, 20 May 2018 18:56:56 UTC (212 KB)
[v3] Sat, 27 Oct 2018 11:31:17 UTC (211 KB)
[v4] Sat, 1 Dec 2018 20:10:29 UTC (211 KB)

Mathematics > Optimization and Control

Title:A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators