Entmax: Sparse Alternative to Softmax for Neural Networks

Category: Natural Language Processing

License: MIT

Model Type: Other

Entmax is a family of activation functions that generalize softmax by allowing for sparse probability distributions. Developed by DeepSPIN, it includes entmax15 and alpha-entmax functions, which are particularly useful for attention mechanisms in neural networks where sparsity is desired. Entmax improves interpretability and efficiency in tasks where traditional softmax yields dense distributions.

Key Features

Generalization of softmax enabling sparse attention outputs
Includes entmax15, a drop-in replacement with minimal tuning
Supports alpha-entmax for adjustable sparsity via a hyperparameter
Enhances interpretability by producing sparse probability vectors
Compatible with modern deep learning libraries including PyTorch

GitHub

Similar Projects

Entmax: Sparse Alternative to Softmax for Neural Networks

Key Features

Similar Projects

OpenAI GPT-2 — Large-Scale Generative Pretrained Transformer 2

Chat-with-PDF-Locally

MasterProject: VideoBERT for Video-Text Representation Learning

Fairseq: A Fast, Extensible Toolkit for Sequence Modeling

ViLBERT-Multi-Task: Multi-Task Vision and Language Representation Learning

AI Minimalist Translation Assistant (Chrome Extension)