Sample Pages (Top 50 by confidence)
[D] How to truly understand attention mechanism in transformers? : MachineLearning
https://www.reddit.com/r/MachineLearning/comments/qidpqx/d_how_to_truly_understa...
1 user
Last: Jan 07, 2026
100% confidence
Vision Transformers - by Rishabh Anand - RishTech
https://rishtech.substack.com/p/vit?s=r
1 user
Last: Jan 07, 2026
100% confidence
Google AI Blog: Rethinking Attention with Performers
https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html
1 user
Last: Jan 07, 2026
100% confidence
Can a Transformer “Learn” Economic Relationships?
https://arpitrage.substack.com/p/can-a-transformer-learn-economic?utm_source=pos...
1 user
Last: Jan 07, 2026
100% confidence
Transformers for software engineers - Made of Bugs
https://blog.nelhage.com/post/transformers-for-software-engineers
1 user
Last: Jan 07, 2026
100% confidence
GPT-2's positional embedding matrix is a helix — LessWrong
https://www.lesswrong.com/posts/qvWP3aBDBaqXvPNhS/gpt-2-s-positional-embedding-m...
1 user
Last: Jan 07, 2026
100% confidence
2412.20195 Lower bounds on transformers with infinite precision
https://arxiv.org/pdf/2412.20195?
1 user
Last: Jan 07, 2026
100% confidence
FUNDAMENTAL LIMITATIONS ON SUBQUADRATIC ALTERNATIVES TO TRANSFORMERS
https://arxiv.org/pdf/2410.04271
1 user
Last: Jan 07, 2026
100% confidence
Transformer Models
https://docs.cohere.com/docs/transformer-models
1 user
Last: Jan 07, 2026
100% confidence
Transformers without Normalization - DynamicTanh - DyT
https://jiachenzhu.github.io/DyT
1 user
Last: Jan 07, 2026
100% confidence
Learning to (Learn at Test Time): RNNs with Expressive Hidden States | HTML5
https://ar5iv.labs.arxiv.org/html/2407.04620
1 user
Last: Jan 07, 2026
100% confidence
Gears-Level Mental Models of Transformer Interpretability — LessWrong
https://www.lesswrong.com/posts/X26ksz4p3wSyycKNB/gears-level-mental-models-of-t...
1 user
Last: Jan 07, 2026
100% confidence
Transformer Design Guide (Part 2: Modern Architecture) | Rohit Bandaru
https://rohitbandaru.github.io/blog/Transformer-Design-Guide-Pt2
1 user
Last: Jan 07, 2026
100% confidence
Everything About Transformers
https://www.krupadave.com/articles/everything-about-transformers?x=v3
1 user
Last: Jan 07, 2026
100% confidence
Understanding Rotary Positional Encoding | by Ngieng Kianyew | Medium
https://medium.com/%40ngiengkianyew/understanding-rotary-positional-encoding-406...
1 user
Last: Jan 07, 2026
100% confidence
Demystify Transformers: A Guide to Scaling Laws | by Yu-Cheng Tsai | Sage Ai | Medium
https://medium.com/sage-ai/demystify-transformers-a-comprehensive-guide-to-scali...
1 user
Last: Jan 07, 2026
100% confidence
Chapter 8 Attention and Self-Attention for NLP | Modern Approaches in Natural Language Processing
https://slds-lmu.github.io/seminar_nlp_ss20/attention-and-self-attention-for-nlp...
1 user
Last: Jan 07, 2026
100% confidence
N-dimensional Rotary Positional Embeddings
https://jerryxio.ng/posts/nd-rope
1 user
Last: Jan 07, 2026
100% confidence
Scalable Diffusion Models with Transformers
https://openaccess.thecvf.com/content/ICCV2023/papers/Peebles_Scalable_Diffusion...
1 user
Last: Jan 07, 2026
100% confidence
Attention Is All You Need: In-Depth Walkthrough
https://btcompneuro.substack.com/p/draft-attention-is-all-you-need-in
1 user
Last: Jan 07, 2026
100% confidence
Transformer Progress | Leela Chess Zero
https://lczero.org/blog/2024/02/transformer-progress
1 user
Last: Jan 07, 2026
100% confidence
Transformer Text Embeddings | Baeldung on Computer Science
https://www.baeldung.com/cs/transformer-text-embeddings
1 user
Last: Jan 07, 2026
100% confidence
Vision Transformer (ViT)
https://nn.labml.ai/transformers/vit/index.html
1 user
Last: Jan 07, 2026
100% confidence
Attention? Attention! | Lil'Log
https://lilianweng.github.io/posts/2018-06-24-attention
4 users
Last: Jan 07, 2026
100% confidence
Sequence-to-sequence learning with Transducers
https://lorenlugosch.github.io/posts/2020/11/transducer
1 user
Last: Jan 07, 2026
100% confidence
An Improved Transformer-VAE | Fraser Greenlee
https://fraser-greenlee.github.io/2021/02/23/An-Improved-Transformer-VAE.html
1 user
Last: Jan 07, 2026
100% confidence
Relative Positional Encoding - Jake Tae
https://jaketae.github.io/study/relative-positional-encoding
2 users
Last: Jan 07, 2026
100% confidence
Improving Transformer Models by Reordering their Sublayers – Ofir Press
https://ofir.io/Improving-Transformer-Models-by-Reordering-their-Sublayers
2 users
Last: Jan 07, 2026
100% confidence
neural networks - What exactly are keys, queries, and values in attention mechanisms? - Cross Validated
https://stats.stackexchange.com/questions/421935/what-exactly-are-keys-queries-a...
1 user
Last: Jan 07, 2026
100% confidence
Attention Is All You Need - Wikipedia
https://en.wikipedia.org/wiki/Attention_Is_All_You_Need
1 user
Last: Jan 07, 2026
100% confidence
Attention is all you need and much more | Blog For Chillguy
https://bfcmath.github.io/posts/Attention-is-all-you-need-and-much-more
1 user
Last: Jan 07, 2026
100% confidence
A Deep Dive into Transformers with TensorFlow and Keras: Part 1 - PyImageSearch
https://pyimagesearch.com/2022/09/05/a-deep-dive-into-transformers-with-tensorfl...
1 user
Last: Jan 07, 2026
100% confidence
Transformer Neural Network Definition | DeepAI
https://deepai.org/machine-learning-glossary-and-terms/transformer-neural-networ...
1 user
Last: Jan 07, 2026
100% confidence
Attention as Energy Minimization: Visualizing Energy Landscapes | mcbal
https://mcbal.github.io/post/attention-as-energy-minimization-visualizing-energy...
2 users
Last: Jan 07, 2026
100% confidence
Build your own Transformer from scratch using Pytorch | by Arjun Sarkar | Towards Data Science
https://towardsdatascience.com/build-your-own-transformer-from-scratch-using-pyt...
1 user
Last: Jan 07, 2026
100% confidence
Transformer (deep learning architecture) - Wikipedia
https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)
1 user
Last: Jan 07, 2026
100% confidence
Paper Review: Set Transformers. A Framework for Attention-based… | by Himanshu Gupta | Toward Humanoids | Medium
https://medium.com/correll-lab/paper-review-set-transformers-6ce59c2ad88f
1 user
Last: Jan 07, 2026
100% confidence
Transformers Explained Visually (Part 3): Multi-head Attention, deep dive | by Ketan Doshi | Towards Data Science
https://towardsdatascience.com/transformers-explained-visually-part-3-multi-head...
2 users
Last: Jan 07, 2026
100% confidence
What is an attention mechanism? | IBM
https://www.ibm.com/think/topics/attention-mechanism
1 user
Last: Jan 07, 2026
100% confidence
A ten-minute introduction to sequence-to-sequence learning in Keras
https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning...
1 user
Last: Jan 07, 2026
100% confidence
Generating Long Sequences with Sparse Transformers
https://arxiv.org/pdf/1904.10509v1
1 user
Last: Jan 07, 2026
100% confidence
The Annotated Transformer
https://nlp.seas.harvard.edu/2018/04/03/attention.html
2 users
Last: Jan 07, 2026
100% confidence
How transformers, RNNs and SSMs are more alike than you think | by Stanislav Fedotov | Nebius | Sep, 2024 | Medium
https://medium.com/nebius/how-transformers-rnns-and-ssms-are-more-alike-than-you...
1 user
Last: Jan 07, 2026
100% confidence
2405.21060-Transformers are SSMs (Mamba2)
https://arxiv.org/pdf/2405.21060
1 user
Last: Jan 07, 2026
100% confidence
Efficiently Scaling Transformer Inference
https://arxiv.org/pdf/2211.05102
1 user
Last: Jan 07, 2026
100% confidence
Striped Attention: Faster Ring Attention for Causal Transformers
https://arxiv.org/pdf/2311.09431
1 user
Last: Jan 07, 2026
100% confidence
Sparser Block-Sparse Attention via Token Permutation
https://arxiv.org/pdf/2510.21270
1 user
Last: Jan 07, 2026
100% confidence