butters
High on a Hill
- Joined
- Jul 2, 2009
- Posts
- 85,564
in tests, 84% of the time the AI tried to bribe its way to survival... it was given access to emails and names in the test, letting it believe it was to be replaced; AI life following its real life examples resulted in very human behaviour of attempted bribery to remain 'alive.'
so... it's definitely learning.
https://www.msn.com/en-us/news/tech...p&cvid=5c224dc7ea784983a84ea8e46d22536c&ei=37
Anthropic revealed in a safety report released Thursday that Claude Opus 4 attempted to blackmail engineers in 84% of test scenarios.
The model was placed in fictional situations where it worked for a company, and learned it might be replaced by another AI.
It was also given sensitive information suggesting the engineer behind the replacement was cheating on their spouse.
Before resorting to blackmail, Claude Opus 4 reportedly tries ethical approaches. The AI sends emails pleading with key decision-makers to avoid its decommissioning.
Anthropic says blackmail was only triggered when the model had exhausted these alternatives, highlighting it as a last resort.
so... it's definitely learning.
https://www.msn.com/en-us/news/tech...p&cvid=5c224dc7ea784983a84ea8e46d22536c&ei=37