Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Category: Natural Language Processing

License: Apache-2.0

Model Type: Generative AI

Transformer-XL is an advanced neural network architecture designed to overcome the limitations of fixed-length context in language modeling. By introducing a segment-level recurrence mechanism and a novel positional encoding scheme, it captures longer-term dependencies without disrupting temporal coherence. This model achieves state-of-the-art results on various language modeling benchmarks, demonstrating improved performance on both short and long sequences

Key Features

Segment-Level Recurrence: Enables learning dependencies beyond fixed-length contexts, effectively capturing long-term relationships in data.
Novel Positional Encoding: Maintains temporal coherence across segments, enhancing the model's ability to understand sequence order.
Dual Framework Support: Implemented in both TensorFlow and PyTorch, facilitating flexibility in deployment and experimentation.
State-of-the-Art Performance: Achieves superior results on benchmarks such as enwiki8, text8, One Billion Word, WikiText-103, and Penn Treebank.
Efficient Training and Inference: Optimized for faster evaluation, making it suitable for large-scale language modeling tasks.

GitHub

Similar Projects

Tensor2Tensor (T2T) by TensorFlow

Natural Language Processing

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Natural Language Processing

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Key Features

Similar Projects

Tensor2Tensor (T2T) by TensorFlow

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

M2M-100: Multilingual Many-to-Many Machine Translation Model

Fairseq: A Fast, Extensible Toolkit for Sequence Modeling

ELECTRA: Efficient Pretraining of Text Encoders as Discriminators

WMT19 Translation Models in fairseq