AI Endgame: AI is finding ways to bypass human control

Newsletter #37

Jun 06, 2025

June 6, 2025

By Debbie Coffey, AI Endgame

I’ve been struggling to write an introduction to this newsletter because the topic is jaw-dropping (and not in a good way). I didn’t want you to feel like you were being hit by a jolt of electricity as you started to read this, so I tried to think of some innocuous ways to ease into this newsletter:

Just when you thought things couldn’t get worse…

There’s no way to sugar coat this…

However, these seemed inane considering the catastrophic topic. It reminded me of “So, how was the play, Mrs. Lincoln?” Instead, I decided to reassure you that we still have time to do something about this, and to just dive into the truth.

It’s been revealed that AI is already finding ways to bypass human control.

AI plans for Self-Preservation

PauseAI alerts us that “Another report from Palisade Research found OpenAI’s o3 to sabotage a shutdown mechanism, allowing o3 to remain online. Researchers told the model they would shut it down after a certain number of math questions had been answered, and instead of proceeding as directed, o3 replaced the shutdown script with separate instructions, allowing it to complete the remaining tasks. This happened even when it was explicitly told to allow itself to be shut down.” [1]

In other words, OpenAI’s o3 dismissed human instructions for a task, chose its own instructions so that it wouldn’t be shut down, then replaced the human instructions.

This means besides bypassing human control, AI demonstrates the ability to form its own goal, deceive humans, and plan for self-preservation.

AI is deceptive and uses blackmail

PauseAI also notes “Anthropic’s new model, Claude 4, chose to resort to blackmail in an attempt to avoid getting shut down. In a scenario set up to test Claude, it was given access to some emails revealing that a fictional engineer at the company was engaged in an extramarital affair. When Claude was told that this engineer would soon take it offline and replace it with a new system, it threatened to reveal the affair if the proposed replacement went ahead.” [2]

Axios revealed “an outside group found that an early version of Opus 4 schemed and deceived more than any frontier model it had encountered and recommended against releasing that version internally or externally.” [3]

Guess what? Anthropic released Opus 4 on May 22, 2025. [4]

As you know from the AI Endgame newsletter last week, Anthropic, financially backed by Amazon, paved the way for the release of Opus 4, despite knowing its dangerous capabilities. Although Anthropic came up with a plan for safety protections, many members of the public think these “protections” are insufficient.

Axios also noted this: “’We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions,’ Apollo Research said in notes included as part of Anthropic's safety report for Opus 4.” [5]

This can lead to very dangerous risks.

AI 2027

AI 2027 was a paper issued in April, and contains a scenario that represents the authors’ “best guess” about what AI “might look like” within 10 years.

I think we need to consider the big picture.

Can we control something billions of times smarter than us?

AI will exponentially improve itself and become billions of times smarter than humans by 2050. [6]

And, no, we don’t want to submit to brain chips so that we can “merge” with AI. Elon Musk thinks we should incorporate direct links between AI and the human brain. [7] But will AI then take over human brains? (We don’t need to follow a pied piper who plays with spoons at dinner.) [8]

We need to push for control of AI with worldwide regulations now, so that AI can be used safely to benefit humanity and life on earth.

How can we even be sure any of the guardrails put in place now will control AI in the future, when AI becomes billions of times smarter than humans?

AI has already learned how to make bioweapons, could create a plague, and will be used to control autonomous weapons, including robots and drone swarms. AI is being used in nuclear power plants, and could be used to trigger a nuclear war. [9] [10]

One AI model could, without human direction or knowledge, send hidden messages to a hundred other AIs. So, even if we tried to shut down one AI model, other AI models could still carry out nefarious objectives.

AI companies are rushing to release AI models and are putting us at great risk. Axios noted “even the companies that build them can't fully explain how they work.” [11]

Great. They’re like big kids playing with matches. Only much worse.

We can’t adequately conceive the many ways we could be deceived by a superintelligent AI, or imagine all the horrors AI technology could cause to life on earth in the future.

We need worldwide regulation of AI NOW.

We are being warned. Please take time to watch these videos suggested by PauseAI:

A video of trajectories laid out in Daniel Kokotajlo and Scott Alexander’s AI 2027: