GPT-Neo: Open Source GPT-3 Style Language Model with TPU & GPU Support

GPT-Neo: Open Source GPT-3 Style Language Model with TPU & GPU Support

GPT-Neo is an open-source implementation of GPT-3-like transformer models using Mesh TensorFlow developed by EleutherAI. As of August 2021, this repository has been archived in favor of GPT-NeoX, optimized for GPU usage. GPT-Neo supports training and inference on TPUs and GPUs and provides pre-trained models up to 2.7B parameters trained on The Pile dataset.

While the repository is no longer maintained, it remains a valuable resource for developers and researchers interested in transformer architecture, model parallelism, and large-scale language models.

Key Features

  • ✅ GPT-3-like Transformer Architecture
  • Implements autoregressive transformer models similar to OpenAI's GPT-3 using Mesh TensorFlow.
  • ? Parallelism Support
  • Full support for model and data parallelism, enabling scaling on multiple TPUs and GPUs.
  • ? Advanced Attention Techniques
  • Includes support for:
  • Local Attention
  • Linear Attention
  • Mixture of Experts (MoE)
  • Axial Positional Embeddings
  • ? Pre-trained Models Available
  • Trained on The Pile, two GPT-Neo model checkpoints are available:
  • 1.3B: Download
  • 2.7B: Download
  • ? Benchmark Results Provided
  • Evaluations on:
  • Language Understanding: Lambada, Wikitext, Winogrande, HellaSwag
  • Scientific Reasoning: PubMedQA, MathQA, PIQA
  • Compared against GPT-2 and GPT-3 across multiple sizes.
  • ? Evaluation Harness
  • Comes with an open evaluation suite for reproducible benchmarking.
  • ? Hugging Face Integration
  • Easily load and use GPT-Neo models through the Hugging Face Transformers library.

Project Screenshots

Project Screenshot
Project Screenshot