ELECTRA: Efficient Pretraining of Text Encoders as Discriminators

ELECTRA: Efficient Pretraining of Text Encoders as Discriminators

ELECTRA is a transformer-based language model that introduces a more sample-efficient pretraining method by replacing masked language modeling with a discriminative task. Instead of predicting masked tokens, ELECTRA trains a model to distinguish between real and fake tokens generated by a small generator model. This results in faster training and improved downstream performance compared to traditional methods like BERT.

Key Features

  • Uses a generator-discriminator setup for efficient pretraining
  • Achieves better performance with significantly fewer training steps
  • Outperforms BERT on several NLP benchmarks with less compute
  • Provides pretrained models and training scripts in TensorFlow
  • Suitable for a wide range of tasks including classification and QA

Project Screenshots

Project Screenshot
Project Screenshot