On the Provable Generalization of Recurrent Neural Networks

Wang, Lifu; Shen, Bo; Hu, Bo; Cao, Xing

Computer Science > Machine Learning

arXiv:2109.14142 (cs)

[Submitted on 29 Sep 2021 (v1), last revised 26 Jan 2022 (this version, v4)]

Title:On the Provable Generalization of Recurrent Neural Networks

Authors:Lifu Wang, Bo Shen, Bo Hu, Xing Cao

View PDF

Abstract:Recurrent Neural Network (RNN) is a fundamental structure in deep learning. Recently, some works study the training process of over-parameterized neural networks, and show that over-parameterized networks can learn functions in some notable concept classes with a provable generalization error bound. In this paper, we analyze the training and generalization for RNNs with random initialization, and provide the following improvements over recent works:
1) For a RNN with input sequence $x=(X_1,X_2,...,X_L)$, previous works study to learn functions that are summation of $f(\beta^T_lX_l)$ and require normalized conditions that $||X_l||\leq\epsilon$ with some very small $\epsilon$ depending on the complexity of $f$. In this paper, using detailed analysis about the neural tangent kernel matrix, we prove a generalization error bound to learn such functions without normalized conditions and show that some notable concept classes are learnable with the numbers of iterations and samples scaling almost-polynomially in the input length $L$.
2) Moreover, we prove a novel result to learn N-variables functions of input sequence with the form $f(\beta^T[X_{l_1},...,X_{l_N}])$, which do not belong to the "additive" concept class, i,e., the summation of function $f(X_l)$. And we show that when either $N$ or $l_0=\max(l_1,..,l_N)-\min(l_1,..,l_N)$ is small, $f(\beta^T[X_{l_1},...,X_{l_N}])$ will be learnable with the number iterations and samples scaling almost-polynomially in the input length $L$.

Comments:	Accepted to Neurips 2021
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2109.14142 [cs.LG]
	(or arXiv:2109.14142v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2109.14142

Submission history

From: Lifu Wang [view email]
[v1] Wed, 29 Sep 2021 02:06:33 UTC (33 KB)
[v2] Thu, 9 Dec 2021 14:32:34 UTC (31 KB)
[v3] Mon, 3 Jan 2022 15:23:58 UTC (32 KB)
[v4] Wed, 26 Jan 2022 17:10:50 UTC (34 KB)

Computer Science > Machine Learning

Title:On the Provable Generalization of Recurrent Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On the Provable Generalization of Recurrent Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators