Entmax: Sparse Alternative to Softmax for Neural Networks

Entmax: Sparse Alternative to Softmax for Neural Networks

License: MIT
Model Type: Other
Entmax is a family of activation functions that generalize softmax by allowing for sparse probability distributions. Developed by DeepSPIN, it includes entmax15 and alpha-entmax functions, which are particularly useful for attention mechanisms in neural networks where sparsity is desired. Entmax improves interpretability and efficiency in tasks where traditional softmax yields dense distributions.

Key Features

  • Generalization of softmax enabling sparse attention outputs
  • Includes entmax15, a drop-in replacement with minimal tuning
  • Supports alpha-entmax for adjustable sparsity via a hyperparameter
  • Enhances interpretability by producing sparse probability vectors
  • Compatible with modern deep learning libraries including PyTorch