Convergence of a Relaxed Variable Splitting Method for Learning Sparse Neural Networks via $\ell_1, \ell_0$, and transformed-$\ell_1$ Penalties

Dinh, Thu; Xin, Jack

Mathematics > Optimization and Control

arXiv:1812.05719 (math)

[Submitted on 13 Dec 2018 (v1), last revised 25 Feb 2020 (this version, v3)]

Title:Convergence of a Relaxed Variable Splitting Method for Learning Sparse Neural Networks via $\ell_1, \ell_0$, and transformed-$\ell_1$ Penalties

Authors:Thu Dinh, Jack Xin

View PDF

Abstract:Sparsification of neural networks is one of the effective complexity reduction methods to improve efficiency and generalizability. We consider the problem of learning a one hidden layer convolutional neural network with ReLU activation function via gradient descent under sparsity promoting penalties. It is known that when the input data is Gaussian distributed, no-overlap networks (without penalties) in regression problems with ground truth can be learned in polynomial time at high probability. We propose a relaxed variable splitting method integrating thresholding and gradient descent to overcome the lack of non-smoothness in the loss function. The sparsity in network weight is realized during the optimization (training) process. We prove that under $\ell_1, \ell_0$; and transformed-$\ell_1$ penalties, no-overlap networks can be learned with high probability, and the iterative weights converge to a global limit which is a transformation of the true weight under a novel thresholding operation. Numerical experiments confirm theoretical findings, and compare the accuracy and sparsity trade-off among the penalties.

Subjects:	Optimization and Control (math.OC)
MSC classes:	90C26, 97R40, 68T05
Cite as:	arXiv:1812.05719 [math.OC]
	(or arXiv:1812.05719v3 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.1812.05719

Submission history

From: Thu Dinh [view email]
[v1] Thu, 13 Dec 2018 22:51:09 UTC (1,537 KB)
[v2] Mon, 25 Feb 2019 23:01:17 UTC (1,725 KB)
[v3] Tue, 25 Feb 2020 00:56:47 UTC (842 KB)

Mathematics > Optimization and Control

Title:Convergence of a Relaxed Variable Splitting Method for Learning Sparse Neural Networks via $\ell_1, \ell_0$, and transformed-$\ell_1$ Penalties

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:Convergence of a Relaxed Variable Splitting Method for Learning Sparse Neural Networks via $\ell_1, \ell_0$, and transformed-$\ell_1$ Penalties

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators