curius graph
☾
Dark
all pages
search
showing 36101-36150 of 160880 pages (sorted by popularity)
« prev
1
...
721
722
723
724
725
...
3218
next »
o1: A Technical Primer — LessWrong
1 user ▼
LatentQA: Teaching LLMs to Decode Activations Into Natural Language
1 user ▼
Why Don't We Just... Shoggoth+Face+Paraphraser? — LessWrong
1 user ▼
Alignment faking in large language models - YouTube
1 user ▼
No hype, just works: How Comma reached 100M miles in autonomous driving | E2011 - YouTube
1 user ▼
Why Not Just Outsource Alignment Research To An AI? — LessWrong
1 user ▼
Tips and Code for Empirical Research Workflows — AI Alignment Forum
1 user ▼
Recipe: Hessian eigenvector computation for PyTorch models — LessWrong
1 user ▼
Coercion is an adaptation to scarcity; trust is an adaptation to abundance — LessWrong
1 user ▼
Agency begets agency — LessWrong
1 user ▼
Preventing model exfiltration with upload limits
1 user ▼
Diffusion language models – Sander Dieleman
1 user ▼
[2211.15089] Continuous diffusion for categorical data
1 user ▼
Spying on Python with py-spy
1 user ▼
LLMs are (mostly) not helped by filler tokens — LessWrong
1 user ▼
Win/continue/lose scenarios and execute/replace/audit protocols — AI Alignment Forum
1 user ▼
RL with KL penalties is better seen as Bayesian inference — LessWrong
1 user ▼
Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking
1 user ▼
Decoding opaque reasoning in current models (empirical project proposal) [public] - Google Docs
1 user ▼
Natural Latents: The Math — LessWrong
1 user ▼
Teaching ML to answer questions honestly instead of predicting human answers | by Paul Christiano | AI Alignment
1 user ▼
Unlearning via RMU is mostly shallow — AI Alignment Forum
1 user ▼
When does training a model change its goals?
1 user ▼
Bogong moths use a stellar compass for long-distance navigation at night | Nature
1 user ▼
Data representation – CS 61
1 user ▼
Fusion energy start-up claims to have cracked alchemy
1 user ▼
If a tree falls on Sleeping Beauty... — LessWrong
1 user ▼
Victor Weisskopf - Wikipedia
1 user ▼
David McCullough - Wikipedia
1 user ▼
Herman Feshbach - Wikipedia
1 user ▼
Simplex Progress Report - July 2025 — LessWrong
1 user ▼
Suddenly, Trait-Based Embryo Selection - by Scott Alexander
1 user ▼
I've tracked my last 800 flights. Here's when you really need to get to the airport.
1 user ▼
Why future AI agents will be trained to work together
1 user ▼
An Intuitive Explanation of Quantum Mechanics — LessWrong
1 user ▼
Configurations and Amplitude — LessWrong
1 user ▼
Opening the Black Box: Interpretable LLMs via Semantic Resonance Architecture
1 user ▼
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
1 user ▼
LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures
1 user ▼
A Toy Model of Mechanistic (Un)Faithfulness
1 user ▼
Synthesizing Standalone World-Models (+ Bounties, Seeking Funding) — AI Alignment Forum
1 user ▼
Artusi: Lesson 8: Triads
1 user ▼
How Well Does RL Scale? — Toby Ord
1 user ▼
How to Create Synthetic Data at High Quality for Fine-Tuning LLMs
1 user ▼
Introducing RFM-1: Giving robots human-like reasoning capabilities
1 user ▼
ARC progress update: Competing with sampling — LessWrong
1 user ▼
[2511.01836] Priors in Time: Missing Inductive Biases for Language Model Interpretability
1 user ▼
[2511.00617] Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering
1 user ▼
Circuits Updates - April 2024
1 user ▼
Watermarking of Large Language Models - YouTube
1 user ▼
« prev
1
...
721
722
723
724
725
...
3218
next »