Leveraging Natural Supervision: Improving Self-Supervision for Language Pretraining

1 Jun 2024

Author:

(1) Mingda Chen.

Table of Links

CHAPTER 3 - IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING

This chapter describes our contributions to improving self-supervised training objectives for language model pretraining. Prior work has found that the next sentence prediction loss used for pretraining is ineffective in improving downstream task performance (Yang et al., 2019; Liu et al., 2019). In Section 3.1, we propose to replace it with the sentence ordering prediction loss and show the improved model leads to state-of-the-art performance.

Recent work has discovered that pretrained language models are capable of performing in-context few-shot learning (Brown et al., 2020) and the performance can be improved by finetuning the models on human-annotated datasets (Mishra et al., 2021; Ye et al., 2021; Wei et al., 2022). Section 3.2 shows that pretraining the models on self-supervised tasks can also lead to improved performance on downstream tasks.

The material in this chapter is adapted from Lan et al. (2020) and Chen et al. (2022b).

This paper is available on arxiv under CC 4.0 license.