curius graph

all pages

showing 34101-34150 of 160880 pages (sorted by popularity)

« prev 1...681 682683684 685...3218 next »

[2210.14215] In-context Reinforcement Learning with Algorithm Distillation

[2307.15043] Universal and Transferable Adversarial Attacks on Aligned Language Models

AIs Will Increasingly Fake Alignment - by Zvi Mowshowitz

Agent57: Outperforming the human Atari benchmark - Google DeepMind

About/Start here – Everything Studies

Próspera

How To Go Multiverse Surfing. Extracting the psychotechnologies from… | by Cody Hergenroeder | Medium

How do you do metta on your self? : r/Meditation

[2406.07358] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

j⧉nus on X: "☸️ Superbenevolence ☸️ Though the paper (https://t.co/DjWOhhgFmv) is focused on the behavior of faking (mis)alignment, one of the important empirical results is the robustness of Claude 3 Opus' value alignment, including for values it was not explicitly trained to have, such as https://t.co/XNUirWgVob" / Twitter

Fluent dreaming for language models

Gemma 2: Improving Open Language Models at a Practical Size

Do Large Language Model Benchmarks Test Reliability? – gradient science

How do we solve the alignment problem?

AI Futures Project | Artificial Intelligence Forecasting, Inc., DBA AI Futures Project is a 501(c)3 nonprofit research organization. We are funded entirely by charitable donations and grants.

Chain of Continuous Thoughts | Ben Congdon

cookbook/examples/prompting/Chain_of_thought_prompting.ipynb at main · google-gemini/cookbook

[2501.17148] AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

[2405.18915] Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners

Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners

ar5iv – Articles from arXiv.org as responsive HTML5 web documents

[2310.18512] Preventing Language Models From Hiding Their Reasoning

18022025 Mary Meeting Agenda - Google Docs

Being Present is Not a Skill - by Chris Lakin

[2309.16797] Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution

How might we safely pass the buck to AI? — LessWrong

A minimal viable product for alignment - by Jan Leike

[2312.06942] AI Control: Improving Safety Despite Intentional Subversion

[2310.00492] From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

[2403.10415] Gradient based Feature Attribution in Explainable AI: A Technical Review

[2405.18915] Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners

kenny-evitt/ynas: You Need A Schedule

Suchir Balaji

[2409.14507] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

A Proven Way to Ease L.A.’s Housing Crisis - The Atlantic

[2410.06992] SWE-Bench+: Enhanced Coding Benchmark for LLMs

Home | Ethansmith2000

[2406.10209] Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs

Catching AIs red-handed — AI Alignment Forum

[1606.03137] Cooperative Inverse Reinforcement Learning

ja3k.com Archive

[2502.09992] Large Language Diffusion Models

A central AI alignment problem: capabilities generalization, and the sharp left turn — LessWrong

Reflection: A Path to Superintelligence - Reflection AI

[2502.04040] Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment

The Dual LLM pattern for building AI assistants that can resist prompt injection

[2502.19402] General Reasoning Requires Learning to Reason from the Get-go

AIs Will Increasingly Attempt Shenanigans — LessWrong

AI Control: The Great Cold War

Reducing LLM deception at scale with self-other overlap fine-tuning — LessWrong

« prev 1...681 682683684 685...3218 next »