Sample Pages (Top 50 by confidence)
How to Profile TPU Programs | How To Scale Your Model
https://jax-ml.github.io/scaling-book/profiling
1 user
Last: Jan 07, 2026
100% confidence
Luminal
https://demo.luminalai.com
1 user
Last: Jan 07, 2026
100% confidence
Gentle introduction to GPUs inner workings | vkSegfault
https://vksegfault.github.io/posts/gentle-intro-gpu-inner-workings
1 user
Last: Jan 07, 2026
100% confidence
Smth Smth GPU Related
https://sodakeyeatsmush.notion.site/Smth-Smth-GPU-Related-27bf1129214e804ba217e8...
1 user
Last: Jan 07, 2026
100% confidence
Learning CUDA with a Weak GPU or No GPU at All? Yes, You Can! | A plethora of science
https://hamdi.bearblog.dev/learning-cuda-with-a-weak-gpu-or-no-gpu-at-all-yes-yo...
1 user
Last: Jan 07, 2026
100% confidence
GPU Programming: A Better Look at the GPU
https://carpentries-incubator.github.io/lesson-gpu-programming/gpu_introduction....
1 user
Last: Jan 07, 2026
100% confidence
Implementing a fast Tensor Core matmul on the Ada Architecture | spatters.ca
https://www.spatters.ca/mma-matmul
1 user
Last: Jan 07, 2026
100% confidence
Mini Project: GPU Accelerated Matrix Multiplication (almost) like cuBLAS
https://0mean1sigma.com/xgemm
1 user
Last: Jan 07, 2026
100% confidence
GPU Compute Model Terminology - gpu_compute_model_terms_quick_ref.pdf
https://landonthomas.net/docs/gpu_compute_model_terms_quick_ref.pdf
1 user
Last: Jan 07, 2026
100% confidence
What Shapes Do Matrix Multiplications Like? [medium]
https://www.thonking.ai/p/what-shapes-do-matrix-multiplications
1 user
Last: Jan 07, 2026
100% confidence
azyResearch/ThunderKittens: Tile primitives for speedy kernels
https://github.com/HazyResearch/ThunderKittens
1 user
Last: Jan 07, 2026
100% confidence
I want a good parallel computer | Raph Levien’s blog
https://raphlinus.github.io/gpu/2025/03/21/good-parallel-computer.html
1 user
Last: Jan 07, 2026
100% confidence
A Gentle Introduction to CUDA PTX | Philip Fabianek
https://philipfabianek.com/posts/cuda-ptx-introduction
1 user
Last: Jan 07, 2026
100% confidence
Fireiron: A Data-Movement-Aware Scheduling Language for GPUs
https://lenary.co.uk/publications/fireiron.pdf
1 user
Last: Jan 07, 2026
100% confidence
The Best GPUs for Deep Learning in 2023 — An In-depth Analysis
https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/?utm_source=chatg...
1 user
Last: Jan 07, 2026
100% confidence
The Hidden Bottleneck: How GPU Memory Hierarchy Affects Your Computing Experience | DigitalOcean
https://www.digitalocean.com/community/tutorials/the-hidden-bottleneck-how-gpu-m...
1 user
Last: Jan 07, 2026
100% confidence
Python API reference — nvMatmulHeuristics
https://docs.nvidia.com/cuda/nvidia-matmul-heuristics/api_python.html
1 user
Last: Jan 07, 2026
100% confidence
How to Think About GPUs | How To Scale Your Model
https://jax-ml.github.io/scaling-book/gpus
2 users
Last: Jan 07, 2026
100% confidence
How to Think About TPUs | How To Scale Your Model
https://jax-ml.github.io/scaling-book/tpus
2 users
Last: Jan 07, 2026
100% confidence
NVIDIA Tensor Core Evolution: From Volta To Blackwell
https://newsletter.semianalysis.com/p/nvidia-tensor-core-evolution-from-volta-to...
1 user
Last: Jan 07, 2026
100% confidence
Efficient GEMM in CUDA — NVIDIA CUTLASS Documentation
https://docs.nvidia.com/cutlass/media/docs/cpp/efficient_gemm.html
1 user
Last: Jan 07, 2026
100% confidence
Cooperative Groups: Flexible CUDA Thread Programming | NVIDIA Technical Blog
https://developer.nvidia.com/blog/cooperative-groups
1 user
Last: Jan 07, 2026
100% confidence
Matrix Multiplication CUDA - ECA - GPU 2018-2019
https://ecatue.gitlab.io/gpu2018/pages/Cookbook/matrix_multiplication_cuda.html
1 user
Last: Jan 07, 2026
100% confidence
Adjusting for GPU Memory Bandwidth Tradeoffs | Apple Developer Documentation
https://developer.apple.com/documentation/metal/gpu_devices_and_work_submission/...
1 user
Last: Jan 07, 2026
100% confidence
TPUs vs GPUs for Transformers (BERT) — Tim Dettmers
https://timdettmers.com/2018/10/17/tpus-vs-gpus-for-transformers-bert
1 user
Last: Jan 07, 2026
100% confidence
GPU Performance Background User's Guide - NVIDIA Docs
https://docs.nvidia.com/deeplearning/performance/dl-performance-gpu-background/i...
2 users
Last: Jan 07, 2026
100% confidence
wangzyon/NVIDIA_SGEMM_PRACTICE: Step-by-step optimization of CUDA SGEMM
https://github.com/wangzyon/NVIDIA_SGEMM_PRACTICE
1 user
Last: Jan 07, 2026
100% confidence
Outperforming cuBLAS on H100: a Worklog
https://cudaforfun.substack.com/p/outperforming-cublas-on-h100-a-worklog
1 user
Last: Jan 07, 2026
100% confidence
penny-xu.github.io/blog/tiled-matrix-multiplication
https://penny-xu.github.io/blog/tiled-matrix-multiplication
2 users
Last: Jan 07, 2026
100% confidence
abeleinin/Metal-Puzzles: Solve Puzzles. Learn Metal 🤘
https://github.com/abeleinin/Metal-Puzzles
1 user
Last: Jan 07, 2026
100% confidence
A history of NVidia Stream Multiprocessor
https://fabiensanglard.net/cuda
1 user
Last: Jan 07, 2026
100% confidence
NVIDIA Collective Communications Library (NCCL) | NVIDIA Developer
https://developer.nvidia.com/nccl
1 user
Last: Jan 07, 2026
100% confidence
GPUDirect | NVIDIA Developer
https://developer.nvidia.com/gpudirect
1 user
Last: Jan 07, 2026
100% confidence
stanford-cs149/asst3: Stanford CS149 -- Assignment 3
https://github.com/stanford-cs149/asst3
1 user
Last: Jan 07, 2026
100% confidence
Matrix Multiplication Background User's Guide :: NVIDIA Deep Learning Performance Documentation
https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplic...
1 user
Last: Jan 07, 2026
100% confidence
the-compute-architecture-of-intel-processor-graphics-gen9-v1d0.pdf
https://www.intel.com/content/dam/develop/external/us/en/documents/the-compute-a...
1 user
Last: Jan 07, 2026
100% confidence
CUDA 12.2 Release Notes — cuda-toolkit-release-notes 12.2 documentation
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
1 user
Last: Jan 07, 2026
100% confidence
CUDA Compatibility :: NVIDIA Data Center GPU Driver Documentation
https://docs.nvidia.com/deploy/cuda-compatibility
1 user
Last: Jan 07, 2026
100% confidence
CUTLASS Tutorial: Efficient GEMM kernel designs with Pipelining – Colfax Research
https://research.colfax-intl.com/cutlass-tutorial-design-of-a-gemm-kernel
1 user
Last: Jan 07, 2026
100% confidence
LeetGPU - The GPU Programming Platform
https://leetgpu.com
1 user
Last: Jan 07, 2026
100% confidence
CUDA ontology ~ James Akl
https://jamesakl.com/posts/cuda-ontology
2 users
Last: Jan 07, 2026
100% confidence
What Every Developer Should Know About GPU Computing
https://codeconfessions.substack.com/p/gpu-computing?utm_source=tldrnewsletter
1 user
Last: Jan 07, 2026
100% confidence
Tutorial 02: CUDA in Actions - CUDA Tutorial
https://cuda-tutorial.readthedocs.io/en/latest/tutorials/tutorial02
1 user
Last: Jan 07, 2026
100% confidence
CPU vs GPU? What’s the Difference? Which Is Better? | NVIDIA Blog
https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-...
1 user
Last: Jan 07, 2026
100% confidence
Numba for CUDA GPUs — Numba 0.57.0+0.g4fd4e39c6.dirty documentation
https://numba.readthedocs.io/en/stable/cuda/index.html
2 users
Last: Jan 07, 2026
100% confidence
Writing CUDA Kernels — Numba 0.57.0+0.g4fd4e39c6.dirty documentation
https://numba.readthedocs.io/en/stable/cuda/kernels.html
2 users
Last: Jan 07, 2026
100% confidence
Convolutional Neural Network (CNN) | TensorFlow Core
https://www.tensorflow.org/tutorials/images/cnn
1 user
Last: Jan 07, 2026
100% confidence