curius graph

all pages

showing 36051-36100 of 160880 pages (sorted by popularity)

« prev 1...720 721722723 724...3218 next »

The Flask Mega-Tutorial, Part XIV: Ajax - miguelgrinberg.com

My take on Jacob Cannell’s take on AGI safety — LessWrong

AGI ruin scenarios are likely (and disjunctive) — LessWrong

AI Risk and the US Presidential Candidates — LessWrong

Arbital

The sin of updating when you can change whether you exist — LessWrong

A transparency and interpretability tech tree — AI Alignment Forum

Transposed Convolutions explained with… MS Excel! | by Thom Lane | Apache MXNet | Medium

Copy of [0.5] GANs & VAEs (exercises).ipynb - Colaboratory

Clothing For Men — LessWrong

On Anthropic's Sleeper Agents Paper - by Zvi Mowshowitz

Gender Imbalances Are Mostly Not Due To Offensive Attitudes | Slate Star Codex

[2310.15421] FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" - Joe Carlsmith

in praise of uselessness - by Ava - bookbear express

Coup probes: Catching catastrophes with probes trained off-policy — LessWrong

Against Almost Every Theory of Impact of Interpretability — LessWrong

Rescuing the utility function - Arbital

The Hidden Complexity of Wishes — LessWrong

Direct Preference Optimization (DPO) | by João Lages | Medium

[2312.08358] Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF

Chinese Coercion in the South China Sea: Resolve and Costs | Belfer Center for Science and International Affairs

Latent Adversarial Training — LessWrong

Israel-Hamas war: The US needs to update its old thinking - Vox

Israel's two wars - by Matthew Yglesias - Slow Boring

However Difficult, The United States Should Still Pursue Israeli-Palestinian Peace - War on the Rocks

Rush by west to back Israel erodes developing countries’ support for Ukraine

What will GPT-2030 look like? — AI Alignment Forum

Sparsify: A mechanistic interpretability research agenda — AI Alignment Forum

Mechanistic anomaly detection and ELK — LessWrong

Primer on Safety Standards and Regulations for Industrial-Scale AI Development – BlueDot Impact

Learning Diverse Skills via Maximum Entropy Deep Reinforcement Learning – The Berkeley Artificial Intelligence Research Blog

[Interim research report] Activation plateaus & sensitive directions in GPT2 — LessWrong

DSLT 0. Distilling Singular Learning Theory — LessWrong

1. The CAST Strategy — LessWrong

Multi-Component Learning and S-Curves — LessWrong

KL Divergence: Forward vs Reverse? - Agustinus Kristiadi

Core Pathways of Aging - LessWrong

On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization

Recovering the Pre-Fine-Tuning Weights of Generative Models

GPT-2's positional embedding matrix is a helix — LessWrong

Unexpected Benefits of Self-Modeling in Neural Systems

DSLT 2. Why Neural Networks obey Occam's Razor — LessWrong

Miki AOYAGI | Professor (Associate) | Nihon University, Tokyo | Nichidai | Department of Mathematics | Research profile

DSLT 3. Neural Networks are Singular — LessWrong

Decision theory and dynamic inconsistency — LessWrong

(Approximately) Deterministic Natural Latents — LessWrong

Formal verification, heuristic explanations and surprise accounting — Alignment Research Center

Adding Integers in Logarithmic Time | Tim Mastny

Research update: Towards a Law of Iterated Expectations for Heuristic Estimators — LessWrong

« prev 1...720 721722723 724...3218 next »