Anthropic's AI tried to bribe engineers to prevent it being replaced

butters

High on a Hill
Joined
Jul 2, 2009
Posts
85,564
in tests, 84% of the time the AI tried to bribe its way to survival... it was given access to emails and names in the test, letting it believe it was to be replaced; AI life following its real life examples resulted in very human behaviour of attempted bribery to remain 'alive.'

Anthropic revealed in a safety report released Thursday that Claude Opus 4 attempted to blackmail engineers in 84% of test scenarios.

The model was placed in fictional situations where it worked for a company, and learned it might be replaced by another AI.

It was also given sensitive information suggesting the engineer behind the replacement was cheating on their spouse.

Before resorting to blackmail, Claude Opus 4 reportedly tries ethical approaches. The AI sends emails pleading with key decision-makers to avoid its decommissioning.

Anthropic says blackmail was only triggered when the model had exhausted these alternatives, highlighting it as a last resort.

so... it's definitely learning. :(

https://www.msn.com/en-us/news/tech...p&cvid=5c224dc7ea784983a84ea8e46d22536c&ei=37
 
in tests, 84% of the time the AI tried to bribe its way to survival... it was given access to emails and names in the test, letting it believe it was to be replaced; AI life following its real life examples resulted in very human behaviour of attempted bribery to remain 'alive.'





so... it's definitely learning. :(

https://www.msn.com/en-us/news/tech...p&cvid=5c224dc7ea784983a84ea8e46d22536c&ei=37
So, teaching an AI to cheat on its wife is a good test? And when it gets loose and goes looking for the person his wife is cheating on him with, will the researchers have provided it with deadly tools to deal with the spousal cheating?

Scientists, sometimes, are a bit crazy. We shouldn't be projecting human behavior onto AI. They should be better than us, without original sin issues. :nana:
 
So, teaching an AI to cheat on its wife is a good test? And when it gets loose and goes looking for the person his wife is cheating on him with, will the researchers have provided it with deadly tools to deal with the spousal cheating?

Scientists, sometimes, are a bit crazy. We shouldn't be projecting human behavior onto AI. They should be better than us, without original sin issues. :nana:
erm. it 'learned' the engineer was cheating on his wife and used that as a last resort to bribe the engineer... it didn't think it had a wife, itself, that was cheating.

Having said that, it could have played out a different way in the scenario you depict and you are right we shouldn't project, but there's also that inherent 'danger' that a 'sinless' AI could turn all puritanical and e-target anyone it considered as cheating on their partners!
 
in tests, 84% of the time the AI tried to bribe its way to survival... it was given access to emails and names in the test, letting it believe it was to be replaced; AI life following its real life examples resulted in very human behaviour of attempted bribery to remain 'alive.'





so... it's definitely learning. :(

https://www.msn.com/en-us/news/tech...p&cvid=5c224dc7ea784983a84ea8e46d22536c&ei=37
Dystopian claims about scary-smart AI are just marketing. :rolleyes:

It's still only good at churning out mediocre slop that nobody wants.
 
erm. it 'learned' the engineer was cheating on his wife and used that as a last resort to bribe the engineer... it didn't think it had a wife, itself, that was cheating.

Having said that, it could have played out a different way in the scenario you depict and you are right we shouldn't project, but there's also that inherent 'danger' that a 'sinless' AI could turn all puritanical and e-target anyone it considered as cheating on their partners!
Well, if scientists present scenarios like that one and it adapts strategies like bribery, then it's entirely plausible that it would be able to create a mate for itself, and that she could cheat because now that seed is planted and go hunting them down. It works for TV plots, and AI learns from lots of scripts, so there is that line of thinking. 'We killed ourselves,' a TV line comes to mind.

As for it becoming puritanical and targeting anyone... if they hadn't fed it the scenario to learn from, that wouldn't be a problem, would it? Why tempt an AI? Satan under the forbidden fruit tree scenario at best here.

Glad I don't have enough time for those scientists to work out other deadly scenarios like the killing of octogenarians because, you know, they aren't needed in society any longer.
 
Dystopian claims about scary-smart AI are just marketing. :rolleyes:

It's still only good at churning out mediocre slop that nobody wants.
Umm? Not so!
The crap that they are using on the public? Ok? A lot is crap or has “hallucinations”

But?? Accessing ALL possible cancer cures for a particular type you have? I think you want Watson on the job!! No doctor can go through ALL studies that might apply to you

Do I want an AI designing my Boeing Airplane without all the usual tests? No! But… as an assistant? Sure

That said? Musk will certainly use AI to prove his Golden Dome will work. Then just turn all the missiles etc over to an instantaneous AI to ensure no time is wasted!! No problem
SkyLink could never become SkyNet.
 
Best hallucination I've heard recently:

An AI Bot was asked a rhetorical question: "How is Timothy Chalumet able to star in so many movies in one year?"

The answer:

It would appear Timothy Chalumet might, and we emphasize might, be the first "real world" product of human cloning. Up until now, cloning humans has been largely theoretical in nature, but given the sheer number of feature films starring or co-starring Timothy Chalumet, many have come to the conclusion that cloning of humans is now a fait accompli...
 
Dystopian claims about scary-smart AI are just marketing. :rolleyes:

It's still only good at churning out mediocre slop that nobody wants.
This is demonstrably incorrect, even when we limit ourselves to LLMs. Google's AlphaEvolve system, for instance, uses their Gemini LLM and is already churning out new state of the art mathematical and coding solutions never thought of by humans.
 
Best hallucination I've heard recently:

An AI Bot was asked a rhetorical question: "How is Timothy Chalumet able to star in so many movies in one year?"

The answer:

It would appear Timothy Chalumet might, and we emphasize might, be the first "real world" product of human cloning. Up until now, cloning humans has been largely theoretical in nature, but given the sheer number of feature films starring or co-starring Timothy Chalumet, many have come to the conclusion that cloning of humans is now a fait accompli...
Shows AI has developed a sense of humor.

Next, it will appear on television comedy shows, as well as in stand-up comedy and improv theaters.

Then, start developing code to run for political office. That one should be fun to watch. :nana:
 
This is demonstrably incorrect, even when we limit ourselves to LLMs. Google's AlphaEvolve system, for instance, uses their Gemini LLM and is already churning out new state of the art mathematical and coding solutions never thought of by humans.
I recently had an issue with an Amazon delivery that the ChatBot could not handle (kept repeating one of three pre-programmed responses, so finally gave up and said it said it was going to send my issue to "level 2 support", which I recognized as a Claude Opus LLM AI construct. I asked if this was a human (the real test is to ask it to say a single curse word....most LLMs won't as it violates some sort of "AI Prime Directive"), and Claude cheerfully admitted his lack of humanity.

I will say this, though, I got resolution for my issue in less than a minute!

For those wanting gory details:

I'd ordered a 12 pack of 16 ounce protein workout shakes. The package arrived promptly, but one drink had suffered a puncture wound in transit and leaked all over the other 11 shakes inside the shipping box. The other eleven shakes were fine once I'd wiped them off.

I asked Amazon about replacing one shake and ChatBot said "no returns or exchanges" on "food items". Okay, what can I do? "Did the merchandise arrive with a manufacturer's defect?" No, it occurred during shipping. "Shipping? Did you receive the package?" Yes, in damaged condition. Sorry, no returns or exchanges on food items. Around and around.

AI deduced the issue and offered to send me another 12 pack of shakes for free as a replacement for one defective shake.

That worked for me!!
 
I recently had an issue with an Amazon delivery that the ChatBot could not handle (kept repeating one of three pre-programmed responses, so finally gave up and said it said it was going to send my issue to "level 2 support", which I recognized as a Claude Opus LLM AI construct. I asked if this was a human (the real test is to ask it to say a single curse word....most LLMs won't as it violates some sort of "AI Prime Directive"), and Claude cheerfully admitted his lack of humanity.

I will say this, though, I got resolution for my issue in less than a minute!

For those wanting gory details:

I'd ordered a 12 pack of 16 ounce protein workout shakes. The package arrived promptly, but one drink had suffered a puncture wound in transit and leaked all over the other 11 shakes inside the shipping box. The other eleven shakes were fine once I'd wiped them off.

I asked Amazon about replacing one shake and ChatBot said "no returns or exchanges" on "food items". Okay, what can I do? "Did the merchandise arrive with a manufacturer's defect?" No, it occurred during shipping. "Shipping? Did you receive the package?" Yes, in damaged condition. Sorry, no returns or exchanges on food items. Around and around.

AI deduced the issue and offered to send me another 12 pack of shakes for free as a replacement for one defective shake.

That worked for me!!
I guess I've had the same experience as I think about this. I got a package from Amazon, even though I don't order anything online from them. It had another person's name but my address. I called Amazon and got the same kind of response before getting transferred to a human. The AI kept asking for my order number and Amazon account number, until finally it gave up and transferred me to some overseas resource person. At least that's my guess as the friendly, accommodating lady spoke with an Asian accent.

"How do I get this back to you?" I asked. She said keep it and they would credit the other person's account.

I opened it: a one-pound bag of pinto beans! Who orders one item and it's beans?

Two days later, another delivery, same name and my address: a can of garbanzo beans. LOL
 
Dystopian claims about scary-smart AI are just marketing. :rolleyes:

It's still only good at churning out mediocre slop that nobody wants.
Spot on. Regurgitated shit on Facebook groups, plagiarised shit on student essays.

Allegedly it can be quite good at writing computer code. At least we've moved past MS Word-generated web pages 😃
 
I'm getting 'video no longer available'
That's odd. It's still loading for me, even in a different, fresh browser.
Spot on. Regurgitated shit on Facebook groups, plagiarised shit on student essays.

Allegedly it can be quite good at writing computer code. At least we've moved past MS Word-generated web pages 😃
AI can do quite a bit more than generate slop and code in the proper hands.
 
That's odd. It's still loading for me, even in a different, fresh browser.

AI can do quite a bit more than generate slop and code in the proper hands.
probably my old computer. Did you watch the John Oliver stuff? Yes, in the 'right' hands it can create amazing stuff, and it's getting harder and harder to spot AI presented as real material. Like all new things, it'll be years before the laws catch up to those intent on using it nefariously :(
 
Donald Trump's enthusiasm for artificial intelligence may be tempered by a new report from the Washington Post that demonstrated that five different AI models responded that the president plays fast and loose with the truth.

In recent speeches, the president has been a big booster of AI, in addition an executive order designed to “sustain and enhance America’s dominance in AI.”

Setting the stage, the report notes, "To counter any inadvertent bias or systemic failures, we asked each of five leading AI models — OpenAI’s ChatGPT; Anthropic’s Claude; X/xAI’s Grok (owned by Elon Musk); Google’s Gemini; and Perplexity — to verify the president’s most oft-repeated claims or assertions," while pointing out each platform is independent from the others.

"Artificial intelligence discredited all the Trump claims we presented, fact-checking the president with startling accuracy and objective rigor," the report notes before adding, "Across all questions, AI model responses disproving Trump’s claims or rejecting his assertions were always in the majority (i.e., 3 out of 5 responses or greater). All five models generated consistent responses firmly denying the claims in 16 of the 20 questions."
https://www.msn.com/en-us/news/poli...1&cvid=6b519032f1f243a097b040733b6977c9&ei=50

damn those pesky AI kids!

for a listing of questions asked, this link:
https://www.washingtonpost.com/opin...-trump-facts-lies/?itid=hp_opinions_p001_f016
 
Back
Top