curius graph

all pages

showing 6451-6500 of 168121 pages (sorted by popularity)

« prev 1...128 129130131 132...3363 next »

Discovering Language Model Behaviors with Model-Written Evaluations — LessWrong

Technical debt

[2106.09685] LoRA: Low-Rank Adaptation of Large Language Models

[2006.08381] DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning

Chris Olah’s views on AGI safety - LessWrong

AI Creation and the Cosmic Host

Pluck and hard work, or luck of birth? Two stories, one man | Aeon Essays

Petri: An open-source auditing tool to accelerate AI safety research \ Anthropic

smoothbrains.net — Home

Inductive bias - Wikipedia

ARC-AGI Without Pretraining | iliao2345

Richard S. Sutton - Wikipedia

Geometric Rationality is Not VNM Rational — LessWrong

Towards a scale-free theory of intelligent agency

Anthropic/values-in-the-wild · Datasets at Hugging Face

Likert scale

[2401.05566] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

[1503.02531] Distilling the Knowledge in a Neural Network

Emergent introspective awareness in large language models \ Anthropic

Meta-rationality: An introduction | Meta-rationality

Categorical distribution

Our Mission, Technology, and Approach

Stop Climbing!

Spectral density

Mel scale - Wikipedia

What I use

My 40-liter backpack travel guide

Representation Engineering Mistral-7B an Acid Trip

The Boring Part of Bell Labs – Aceso Under Glass

Gibbs sampling - Wikipedia

This page is a quine.

Is Success the Enemy of Freedom? (Full) - LessWrong

Goodhart Taxonomy — LessWrong

Functional near-infrared spectroscopy

Thinking through how pretraining vs RL learn

Ilya Sutskever – We're moving from the age of scaling to the age of research

[1609.09106] HyperNetworks

⚓️ Thought Anchors

Advent of Code 2021

Paper Trails

Potemkin village

Links

when you break your own heart - by vincent huang

Everything Is Correlated · Gwern.net

Narrow Misalignment is Hard, Emergent Misalignment is Easy — LessWrong

[2406.14546] Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

Herbert A. Simon

The Hour I First Believed | Slate Star Codex

Functional ultrasound through the skull

Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers

« prev 1...128 129130131 132...3363 next »