curius graph
☾
Dark
all pages
search
showing 36051-36100 of 160880 pages (sorted by popularity)
« prev
1
...
720
721
722
723
724
...
3218
next »
The Flask Mega-Tutorial, Part XIV: Ajax - miguelgrinberg.com
1 user ▼
My take on Jacob Cannell’s take on AGI safety — LessWrong
1 user ▼
AGI ruin scenarios are likely (and disjunctive) — LessWrong
1 user ▼
AI Risk and the US Presidential Candidates — LessWrong
1 user ▼
Arbital
1 user ▼
The sin of updating when you can change whether you exist — LessWrong
1 user ▼
A transparency and interpretability tech tree — AI Alignment Forum
1 user ▼
Transposed Convolutions explained with… MS Excel! | by Thom Lane | Apache MXNet | Medium
1 user ▼
Copy of [0.5] GANs & VAEs (exercises).ipynb - Colaboratory
1 user ▼
Clothing For Men — LessWrong
1 user ▼
On Anthropic's Sleeper Agents Paper - by Zvi Mowshowitz
1 user ▼
Gender Imbalances Are Mostly Not Due To Offensive Attitudes | Slate Star Codex
1 user ▼
[2310.15421] FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions
1 user ▼
New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" - Joe Carlsmith
1 user ▼
in praise of uselessness - by Ava - bookbear express
1 user ▼
Coup probes: Catching catastrophes with probes trained off-policy — LessWrong
1 user ▼
Against Almost Every Theory of Impact of Interpretability — LessWrong
1 user ▼
Rescuing the utility function - Arbital
1 user ▼
The Hidden Complexity of Wishes — LessWrong
1 user ▼
Direct Preference Optimization (DPO) | by João Lages | Medium
1 user ▼
[2312.08358] Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
1 user ▼
Chinese Coercion in the South China Sea: Resolve and Costs | Belfer Center for Science and International Affairs
1 user ▼
Latent Adversarial Training — LessWrong
1 user ▼
Israel-Hamas war: The US needs to update its old thinking - Vox
1 user ▼
Israel's two wars - by Matthew Yglesias - Slow Boring
1 user ▼
However Difficult, The United States Should Still Pursue Israeli-Palestinian Peace - War on the Rocks
1 user ▼
Rush by west to back Israel erodes developing countries’ support for Ukraine
1 user ▼
What will GPT-2030 look like? — AI Alignment Forum
1 user ▼
Sparsify: A mechanistic interpretability research agenda — AI Alignment Forum
1 user ▼
Mechanistic anomaly detection and ELK — LessWrong
1 user ▼
Primer on Safety Standards and Regulations for Industrial-Scale AI Development – BlueDot Impact
1 user ▼
Learning Diverse Skills via Maximum Entropy Deep Reinforcement Learning – The Berkeley Artificial Intelligence Research Blog
1 user ▼
[Interim research report] Activation plateaus & sensitive directions in GPT2 — LessWrong
1 user ▼
DSLT 0. Distilling Singular Learning Theory — LessWrong
1 user ▼
1. The CAST Strategy — LessWrong
1 user ▼
Multi-Component Learning and S-Curves — LessWrong
1 user ▼
KL Divergence: Forward vs Reverse? - Agustinus Kristiadi
1 user ▼
Core Pathways of Aging - LessWrong
1 user ▼
On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization
1 user ▼
Recovering the Pre-Fine-Tuning Weights of Generative Models
1 user ▼
GPT-2's positional embedding matrix is a helix — LessWrong
1 user ▼
Unexpected Benefits of Self-Modeling in Neural Systems
1 user ▼
DSLT 2. Why Neural Networks obey Occam's Razor — LessWrong
1 user ▼
Miki AOYAGI | Professor (Associate) | Nihon University, Tokyo | Nichidai | Department of Mathematics | Research profile
1 user ▼
DSLT 3. Neural Networks are Singular — LessWrong
1 user ▼
Decision theory and dynamic inconsistency — LessWrong
1 user ▼
(Approximately) Deterministic Natural Latents — LessWrong
1 user ▼
Formal verification, heuristic explanations and surprise accounting — Alignment Research Center
1 user ▼
Adding Integers in Logarithmic Time | Tim Mastny
1 user ▼
Research update: Towards a Law of Iterated Expectations for Heuristic Estimators — LessWrong
1 user ▼
« prev
1
...
720
721
722
723
724
...
3218
next »