Sandwich Transformer: Balancing Depth and Efficiency in Transformer Models

Category: Natural Language Processing

License: MIT

Model Type: Other

The Sandwich Transformer introduces a novel architecture that blends shallow and deep layers to optimize both model performance and computational efficiency. By combining multiple shallow sub-networks with a deep backbone, the Sandwich Transformer achieves better utilization of model capacity, enabling faster inference and improved generalization in various natural language processing tasks.

Key Features

Architecture combines shallow and deep transformer layers in parallel
Improves parameter efficiency without sacrificing model quality
Enables faster training and inference compared to standard transformers
Applicable to a variety of NLP tasks, including classification and generation
Open implementation with configuration options for sandwich structure

GitHub Demo Video Ofir

Similar Projects

Sandwich Transformer: Balancing Depth and Efficiency in Transformer Models

Key Features

Similar Projects

Transformers by Hugging Face

GPT-Neo: Open Source GPT-3 Style Language Model with TPU & GPU Support

Chat-with-PDF-Locally

WMT19 Translation Models in fairseq

Entmax: Sparse Alternative to Softmax for Neural Networks

OpenAI GPT-2 — Large-Scale Generative Pretrained Transformer 2