GPT-Neo is an open-source implementation of GPT-3-like transformer models using Mesh TensorFlow developed by EleutherAI. As of August 2021, this repository has been archived in favor of GPT-NeoX, optimized for GPU usage. GPT-Neo supports training and inference on TPUs and GPUs and provides pre-trained models up to 2.7B parameters trained on The Pile dataset.
While the repository is no longer maintained, it remains a valuable resource for developers and researchers interested in transformer architecture, model parallelism, and large-scale language models.
Key Features
✅ GPT-3-like Transformer Architecture
Implements autoregressive transformer models similar to OpenAI's GPT-3 using Mesh TensorFlow.
? Parallelism Support
Full support for model and data parallelism, enabling scaling on multiple TPUs and GPUs.
? Advanced Attention Techniques
Includes support for:
Local Attention
Linear Attention
Mixture of Experts (MoE)
Axial Positional Embeddings
? Pre-trained Models Available
Trained on The Pile, two GPT-Neo model checkpoints are available:
1.3B: Download
2.7B: Download
? Benchmark Results Provided
Evaluations on:
Language Understanding: Lambada, Wikitext, Winogrande, HellaSwag
Scientific Reasoning: PubMedQA, MathQA, PIQA
Compared against GPT-2 and GPT-3 across multiple sizes.
? Evaluation Harness
Comes with an open evaluation suite for reproducible benchmarking.
? Hugging Face Integration
Easily load and use GPT-Neo models through the Hugging Face Transformers library.