ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ELECTRA is a sample-efficient pretraining method for transformer-based language models. Instead of masking and predicting tokens like BERT, ELECTRA introduces a discriminative approach where a small generator replaces masked tokens, and a larger discriminator learns to detect which tokens are original and which have been replaced. This method results in faster training and improved performance on a variety of NLP tasks.

Key Features

  • More compute-efficient than traditional masked language models
  • Trains using a replacement detection objective instead of masked token prediction
  • Combines a small generator and a large discriminator
  • Outperforms BERT on multiple downstream NLP tasks
  • Open-source implementation with pretrained models and training scripts

Project Screenshots

Project Screenshot