curius graph

all pages

showing 36401-36450 of 160880 pages (sorted by popularity)

« prev 1...727 728729730 731...3218 next »

More Was Possible: A Review of If Anyone Builds It, Everyone Dies—Asterisk

Cornell Armada

Another Anti-Social Media Screed

Existential Hope Meme Prize

The world's first frontier AI regulation is surprisingly thoughtful: the EU's Code of Practice

gpt-oss-20b · 模型库

How to spot a genius

Another Bay Area House Party - by Scott Alexander

Bentham's Newsletter | Bentham's Bulldog | Substack

xkcd: Pull

xkcd: Wavefunction Collapse

xkcd: Archaeology Research

Notes on fatalities from AI takeover — LessWrong

Harry Kingdon

My infant year as an AI researcher — Moving from physics to AI

Spandrel (biology) - Wikipedia

Can Large Language Models Develop Gambling Addiction?

"My Boyfriend is AI": A Computational Analysis of Human-AI Companionship in Reddit's AI Community

Vipassana Meditation Course

The role of symmetry in fundamental physics

The Feynman Lectures on Physics Vol. II Ch. 19: The Principle of Least Action

How an AI company CEO could quietly take over the world

Model Organisms for Emergent Misalignment

2411.07133v1

Agentic Monitoring for AI Control — LessWrong

Death of a Salesman - by Amos Wollen - Going Awol

About - Offhand Quibbles

[2510.27338] Reasoning Models Sometimes Output Illegible Chains of Thought

The Tale of the Top-Tier Intellect — LessWrong

nothing feels quite like rejection - by vincent huang

Skyscrapers and madmen - Joe Carlsmith

Practice - RwPhO

Émile P. Torres - Wikipedia

Forecasting Rare Language Model Behaviors

Miller index - Wikipedia

Current LLMs seem to rarely detect CoT tampering — AI Alignment Forum

Anthropic on X: "Remarkably, prompts that gave the model permission to reward hack stopped the broader misalignment. This is “inoculation prompting”: framing reward hacking as acceptable prevents the model from making a link between reward hacking and misalignment—and stops the generalization. https://t.co/ZUGnmcOYNV" / X

[Paper] Output Supervision Can Obfuscate the CoT — AI Alignment Forum

Available Models in Tinker – Tinker API

2411.06655v1

What is it to solve the alignment problem? — LessWrong

GRPO is terrible — LessWrong

semester three - randy chang

Borat - Wikipedia

Eugene Gendlin - Wikipedia

pdf

Hilary Greaves on Pascal's mugging, strong longtermism, and whether existing can be good for us | 80,000 Hours

Casey Handmer

The Origins of Representation Manifolds in Large Language Models

Dan White - Wikipedia

« prev 1...727 728729730 731...3218 next »