N
Login
Pricing
Contact
Tools
behavior
Sycophancy to subterfuge: Investigating reward tampering in language models \ Anthropic
anthropic.com
2 MIN READ