GPT-Neo: Open Source GPT-3 Style Language Model with TPU & GPU Support

Category: Natural Language Processing

License: MIT

Model Type: Generative AI

GPT-Neo is an open-source implementation of GPT-3-like transformer models using Mesh TensorFlow developed by EleutherAI. As of August 2021, this repository has been archived in favor of GPT-NeoX, optimized for GPU usage. GPT-Neo supports training and inference on TPUs and GPUs and provides pre-trained models up to 2.7B parameters trained on The Pile dataset.

While the repository is no longer maintained, it remains a valuable resource for developers and researchers interested in transformer architecture, model parallelism, and large-scale language models.

Key Features

✅ GPT-3-like Transformer Architecture
Implements autoregressive transformer models similar to OpenAI's GPT-3 using Mesh TensorFlow.
? Parallelism Support
Full support for model and data parallelism, enabling scaling on multiple TPUs and GPUs.
? Advanced Attention Techniques
Includes support for:
Local Attention
Linear Attention
Mixture of Experts (MoE)
Axial Positional Embeddings
? Pre-trained Models Available
Trained on The Pile, two GPT-Neo model checkpoints are available:
1.3B: Download
2.7B: Download
? Benchmark Results Provided
Evaluations on:
Language Understanding: Lambada, Wikitext, Winogrande, HellaSwag
Scientific Reasoning: PubMedQA, MathQA, PIQA
Compared against GPT-2 and GPT-3 across multiple sizes.
? Evaluation Harness
Comes with an open evaluation suite for reproducible benchmarking.
? Hugging Face Integration
Easily load and use GPT-Neo models through the Hugging Face Transformers library.

GitHub Live Demo

Project Screenshots

Similar Projects

GPT-Neo: Open Source GPT-3 Style Language Model with TPU & GPU Support

Key Features

Project Screenshots

Similar Projects

ViLBERT-Multi-Task: Multi-Task Vision and Language Representation Learning

LinguaLens Translation Assistant

WMT19 Translation Models in fairseq

CuBERT: BERT Pretrained on Programming Languages

MasterProject: VideoBERT for Video-Text Representation Learning

llmtranslate