curius graph

all pages

showing 36101-36150 of 160880 pages (sorted by popularity)

« prev 1...721 722723724 725...3218 next »

o1: A Technical Primer — LessWrong

LatentQA: Teaching LLMs to Decode Activations Into Natural Language

Why Don't We Just... Shoggoth+Face+Paraphraser? — LessWrong

Alignment faking in large language models - YouTube

No hype, just works: How Comma reached 100M miles in autonomous driving | E2011 - YouTube

Why Not Just Outsource Alignment Research To An AI? — LessWrong

Tips and Code for Empirical Research Workflows — AI Alignment Forum

Recipe: Hessian eigenvector computation for PyTorch models — LessWrong

Coercion is an adaptation to scarcity; trust is an adaptation to abundance — LessWrong

Agency begets agency — LessWrong

Preventing model exfiltration with upload limits

Diffusion language models – Sander Dieleman

[2211.15089] Continuous diffusion for categorical data

Spying on Python with py-spy

LLMs are (mostly) not helped by filler tokens — LessWrong

Win/continue/lose scenarios and execute/replace/audit protocols — AI Alignment Forum

RL with KL penalties is better seen as Bayesian inference — LessWrong

Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking

Decoding opaque reasoning in current models (empirical project proposal) [public] - Google Docs

Natural Latents: The Math — LessWrong

Teaching ML to answer questions honestly instead of predicting human answers | by Paul Christiano | AI Alignment

Unlearning via RMU is mostly shallow — AI Alignment Forum

When does training a model change its goals?

Bogong moths use a stellar compass for long-distance navigation at night | Nature

Data representation – CS 61

Fusion energy start-up claims to have cracked alchemy

If a tree falls on Sleeping Beauty... — LessWrong

Victor Weisskopf - Wikipedia

David McCullough - Wikipedia

Herman Feshbach - Wikipedia

Simplex Progress Report - July 2025 — LessWrong

Suddenly, Trait-Based Embryo Selection - by Scott Alexander

I've tracked my last 800 flights. Here's when you really need to get to the airport.

Why future AI agents will be trained to work together

An Intuitive Explanation of Quantum Mechanics — LessWrong

Configurations and Amplitude — LessWrong

Opening the Black Box: Interpretable LLMs via Semantic Resonance Architecture

Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures

A Toy Model of Mechanistic (Un)Faithfulness

Synthesizing Standalone World-Models (+ Bounties, Seeking Funding) — AI Alignment Forum

Artusi: Lesson 8: Triads

How Well Does RL Scale? — Toby Ord

How to Create Synthetic Data at High Quality for Fine-Tuning LLMs

Introducing RFM-1: Giving robots human-like reasoning capabilities

ARC progress update: Competing with sampling — LessWrong

[2511.01836] Priors in Time: Missing Inductive Biases for Language Model Interpretability

[2511.00617] Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering

Circuits Updates - April 2024

Watermarking of Large Language Models - YouTube

« prev 1...721 722723724 725...3218 next »