LM-Critic: Language Models for Unsupervised Grammatical Error Correction

Yasunaga, Michihiro; Leskovec, Jure; Liang, Percy

Computer Science > Computation and Language

arXiv:2109.06822 (cs)

[Submitted on 14 Sep 2021 (v1), last revised 8 Oct 2021 (this version, v2)]

Title:LM-Critic: Language Models for Unsupervised Grammatical Error Correction

Authors:Michihiro Yasunaga, Jure Leskovec, Percy Liang

View PDF

Abstract:Training a model for grammatical error correction (GEC) requires a set of labeled ungrammatical / grammatical sentence pairs, but manually annotating such pairs can be expensive. Recently, the Break-It-Fix-It (BIFI) framework has demonstrated strong results on learning to repair a broken program without any labeled examples, but this relies on a perfect critic (e.g., a compiler) that returns whether an example is valid or not, which does not exist for the GEC task. In this work, we show how to leverage a pretrained language model (LM) in defining an LM-Critic, which judges a sentence to be grammatical if the LM assigns it a higher probability than its local perturbations. We apply this LM-Critic and BIFI along with a large set of unlabeled sentences to bootstrap realistic ungrammatical / grammatical pairs for training a corrector. We evaluate our approach on GEC datasets across multiple domains (CoNLL-2014, BEA-2019, GMEG-wiki and GMEG-yahoo) and show that it outperforms existing methods in both the unsupervised setting (+7.7 F0.5) and the supervised setting (+0.5 F0.5).

Comments:	EMNLP 2021. Code & data available at this https URL
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2109.06822 [cs.CL]
	(or arXiv:2109.06822v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2109.06822

Submission history

From: Michihiro Yasunaga [view email]
[v1] Tue, 14 Sep 2021 17:06:43 UTC (1,517 KB)
[v2] Fri, 8 Oct 2021 01:06:43 UTC (1,513 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-09

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Michihiro Yasunaga
Jure Leskovec
Percy Liang

export BibTeX citation

Computer Science > Computation and Language

Title:LM-Critic: Language Models for Unsupervised Grammatical Error Correction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:LM-Critic: Language Models for Unsupervised Grammatical Error Correction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators