Entmax is a family of activation functions that generalize softmax by allowing for sparse probability distributions. Developed by DeepSPIN, it includes entmax15 and alpha-entmax functions, which are particularly useful for attention mechanisms in neural networks where sparsity is desired. Entmax improves interpretability and efficiency in tasks where traditional softmax yields dense distributions.
Key Features
Generalization of softmax enabling sparse attention outputs
Includes entmax15, a drop-in replacement with minimal tuning
Supports alpha-entmax for adjustable sparsity via a hyperparameter
Enhances interpretability by producing sparse probability vectors
Compatible with modern deep learning libraries including PyTorch