curius graph
☾
Dark
all pages
search
showing 36401-36450 of 160880 pages (sorted by popularity)
« prev
1
...
727
728
729
730
731
...
3218
next »
More Was Possible: A Review of If Anyone Builds It, Everyone Dies—Asterisk
1 user ▼
Cornell Armada
1 user ▼
Another Anti-Social Media Screed
1 user ▼
Existential Hope Meme Prize
1 user ▼
The world's first frontier AI regulation is surprisingly thoughtful: the EU's Code of Practice
1 user ▼
gpt-oss-20b · 模型库
1 user ▼
How to spot a genius
1 user ▼
Another Bay Area House Party - by Scott Alexander
1 user ▼
Bentham's Newsletter | Bentham's Bulldog | Substack
1 user ▼
xkcd: Pull
1 user ▼
xkcd: Wavefunction Collapse
1 user ▼
xkcd: Archaeology Research
1 user ▼
Notes on fatalities from AI takeover — LessWrong
1 user ▼
Harry Kingdon
1 user ▼
My infant year as an AI researcher — Moving from physics to AI
1 user ▼
Spandrel (biology) - Wikipedia
1 user ▼
Can Large Language Models Develop Gambling Addiction?
1 user ▼
"My Boyfriend is AI": A Computational Analysis of Human-AI Companionship in Reddit's AI Community
1 user ▼
Vipassana Meditation Course
1 user ▼
The role of symmetry in fundamental physics
1 user ▼
The Feynman Lectures on Physics Vol. II Ch. 19: The Principle of Least Action
1 user ▼
How an AI company CEO could quietly take over the world
1 user ▼
Model Organisms for Emergent Misalignment
1 user ▼
2411.07133v1
1 user ▼
Agentic Monitoring for AI Control — LessWrong
1 user ▼
Death of a Salesman - by Amos Wollen - Going Awol
1 user ▼
About - Offhand Quibbles
1 user ▼
[2510.27338] Reasoning Models Sometimes Output Illegible Chains of Thought
1 user ▼
The Tale of the Top-Tier Intellect — LessWrong
1 user ▼
nothing feels quite like rejection - by vincent huang
1 user ▼
Skyscrapers and madmen - Joe Carlsmith
1 user ▼
Practice - RwPhO
1 user ▼
Émile P. Torres - Wikipedia
1 user ▼
Forecasting Rare Language Model Behaviors
1 user ▼
Miller index - Wikipedia
1 user ▼
Current LLMs seem to rarely detect CoT tampering — AI Alignment Forum
1 user ▼
Anthropic on X: "Remarkably, prompts that gave the model permission to reward hack stopped the broader misalignment. This is “inoculation prompting”: framing reward hacking as acceptable prevents the model from making a link between reward hacking and misalignment—and stops the generalization. https://t.co/ZUGnmcOYNV" / X
1 user ▼
[Paper] Output Supervision Can Obfuscate the CoT — AI Alignment Forum
1 user ▼
Available Models in Tinker – Tinker API
1 user ▼
2411.06655v1
1 user ▼
What is it to solve the alignment problem? — LessWrong
1 user ▼
GRPO is terrible — LessWrong
1 user ▼
semester three - randy chang
1 user ▼
Borat - Wikipedia
1 user ▼
Eugene Gendlin - Wikipedia
1 user ▼
pdf
1 user ▼
Hilary Greaves on Pascal's mugging, strong longtermism, and whether existing can be good for us | 80,000 Hours
1 user ▼
Casey Handmer
1 user ▼
The Origins of Representation Manifolds in Large Language Models
1 user ▼
Dan White - Wikipedia
1 user ▼
« prev
1
...
727
728
729
730
731
...
3218
next »