Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Yamaguchi, Atsuki; Chrysostomou, George; Margatina, Katerina; Aletras, Nikolaos

Computer Science > Computation and Language

arXiv:2109.01819 (cs)

[Submitted on 4 Sep 2021]

Title:Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Authors:Atsuki Yamaguchi, George Chrysostomou, Katerina Margatina, Nikolaos Aletras

View PDF

Abstract:Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural language processing for learning text representations. MLM trains a model to predict a random sample of input tokens that have been replaced by a [MASK] placeholder in a multi-class setting over the entire vocabulary. When pretraining, it is common to use alongside MLM other auxiliary objectives on the token or sequence level to improve downstream performance (e.g. next sentence prediction). However, no previous work so far has attempted in examining whether other simpler linguistically intuitive or not objectives can be used standalone as main pretraining objectives. In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of MLM. Empirical results on GLUE and SQuAD show that our proposed methods achieve comparable or better performance to MLM using a BERT-BASE architecture. We further validate our methods using smaller models, showing that pretraining a model with 41% of the BERT-BASE's parameters, BERT-MEDIUM results in only a 1% drop in GLUE scores with our best objective.

Comments:	Accepted at EMNLP 2021
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2109.01819 [cs.CL]
	(or arXiv:2109.01819v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2109.01819

Submission history

From: Atsuki Yamaguchi [view email]
[v1] Sat, 4 Sep 2021 08:52:37 UTC (5,960 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-09

Change to browse by:

cs
cs.AI
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Katerina Margatina
Nikolaos Aletras

export BibTeX citation

Computer Science > Computation and Language

Title:Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators