GENRE: Autoregressive Entity Retrieval

GENRE: Autoregressive Entity Retrieval

GENRE (Generative ENtity REtrieval) is a sequence-to-sequence system designed for entity retrieval tasks such as entity disambiguation, end-to-end entity linking, and document retrieval. Unlike traditional approaches that rely on dense vector similarity, GENRE generates entity names token-by-token in an autoregressive manner, conditioned on the input context. This method enables efficient retrieval with a reduced memory footprint and eliminates the need for negative sampling during training. GENRE achieves state-of-the-art or competitive results across more than 20 benchmark datasets

Key Features

  • Autoregressive Generation: Generates entity names left-to-right, capturing fine-grained interactions between context and entity.
  • Memory Efficiency: Reduces memory usage by scaling model parameters with vocabulary size rather than the number of entities.
  • No Negative Sampling Required: Utilizes exact softmax loss computation without the need for negative data subsampling.
  • Multilingual Support (mGENRE): Extends GENRE to support over 100 languages, treating language as a latent variable and marginalizing over it during prediction.
  • Flexible Integration: Compatible with both Fairseq and Hugging Face Transformers frameworks.
  • Pretrained Models and Datasets: Provides pretrained models and scripts for downloading relevant datasets and resources

Project Screenshots

Project Screenshot