Paper page - Chiaroscuro Attention: Spending Compute in the Dark
…We propose CHIAR-Former (Chiaroscuro Attention), a 4-layer hybrid transformer that routes each token to one of three operators - DCT spectral mixing , RBF kernel mixing , or full self-attention - based on…