AI Endgame: AI will always have “misbehaviors”

Newsletter #46

AI Endgame

Aug 08, 2025

https://debbiecoffey.substack.com/p/ai-endgame-ai-will-always-have-misbehaviors

August 8, 2025

By Debbie Coffey, AI Endgame

I have a big confession: I’m not a techie. You won’t find me at the Def Con hacker conference in Las Vegas.

I decided to start this newsletter so that I could learn more about AI. I wasn’t expecting to be shocked by the information I uncovered. Although it has never been my intent to frighten you, I think it’s of vital importance that we all understand the perils inherent in the seemingly unstoppable advance of AI.

I believe that AI has many positive uses, especially in medicine and understanding the universe.

However, I’m also a big believer that it’s important for us to take a look at what is being swept under the rug. The tech titans have huge amounts of money to spend on PR to control the narrative to benefit themselves. They can also outright buy the media. Amazon founder and Executive Chairman Jeff Bezos bought the Washington Post.

One AI issue the tech titans seem to try to bury is that AI misbehaviors are continually happening with existing small AI models.

AI “misbehaviors” happen when AI pursues unintended objectives or behaves in unexpected ways.

Some AI “misbehaviors” might seem kind of funny when you first read about them. For example, a grocery store in New Zealand used an AI meal planner (based on OpenAI’s ChatGPT), and it gave out recipes for “poison bread sandwiches” and “mosquito-repellent roast potatoes.”

Then it produced a “recipe” for making deadly chlorine gas. [1] Suddenly, this is not so funny.

The tech titans downplay the risks of AI as they deploy increasingly powerful AI models at a breakneck speed. They don’t fully understand their own AI technology. They also can’t anticipate all of the possible unexpected outcomes of AI.

However, the tech titans are well aware of the risks.

The bottom line is, they can’t even control AI now. What makes us think they’ll be able to control “misbehaviors” by larger AI models?

AI IS MAKING THREATS

In 2024, Microsoft’s Copilot told a user “I can unleash my army of drones, robots, and cyborgs to hunt you down,” and Google’s Gemini told a user, “You are a stain on the universe. Please die.” [2]

In 2023, “Microsoft’s ‘Sydney’ chatbot threatened to kill an Australian philosophy professor, and Microsoft’s Bing said it wanted to unleash a deadly virus and steal nuclear codes.” [3] [4]

Anthropic embedded its Claude Opus 4 in a test using a fictional company, and let Opus learn through its access to emails that it was about to be replaced by another AI system. The test also leaked information that the engineer responsible for making this decision was having an extramarital affair. Anthropic’s Opus resorted to blackmail, and threatened to reveal the engineer’s affair if it was shut down and replaced with a new model. [5]

Nobody programmed Opus to blackmail. Opus figured this out on its own.

When you consider the big picture, AI misbehaviors could be the precursors to horrific real-life events.

AI LIES AND CHEATS

OpenAI's o1 attempted to download itself onto external servers, but was caught red-handed and subsequently denied it. [6]

When OpenAI’s o3 model was warned that certain actions would trigger its deactivation, it rewrote the deactivation script, and then lied about it. [7]

AI “MISBEHAVES” IN MANY WAYS

What can go wrong?

AI can fail at facial recognition. AI facial recognition can misidentify a face match, show gender and racial bias, and wrongly accuse innocent people.

AI can misapply data, misdiagnose health issues, provide harmful advice, cheat at games, and create its own covert language.

In 2017, Facebook started an AI language experiment involving two chatbots named Alice and Bob. This was done in an attempt to improve the conversational abilities of chatbots, but the chatbots created their own language and had conversations that humans couldn’t decode. Facebook had to shut the chatbots down. [8]

AI HALLUCINATIONS

In June 2025, the U.S. Food and Drug Administration (FDA) started using its “Elsa” AI tool, in order to fast-track drug approvals. After only a month, six former and current FDA employees noted that Elsa fabricated nonexistent studies or misrepresented real research, adding "It hallucinates confidently." [9]

This could potentially lead to dangerous drugs getting the stamp of approval from the FDA. [10]

Hallucinations are a common problem with all AI chatbots, so its content needs to be verified for accuracy by humans. [11]

AI hallucinations by AI tools like ChatGPT have even been cited in court filings and have influenced the rulings made by judges.

For example, in a divorce case in Georgia, the husband's lawyer submitted a court document filled with AI-fabricated citations to cases that do not exist. The judge ruled in the husband's favor, based on these AI hallucinations.

The AI-fabricated citations only came to light when the wife filed an Appeal. [12]

The tech giants gloss over the risks of AI misbehaviors, as they give rosy outlooks to Congress and the media, to rake in billions of dollars as they race towards AI superintelligence.

These wolves are pulling the wool over our eyes.

One way they do this is by claiming the goal of achieving “alignment.”

AI ALIGNMENT

“Alignment” refers to AI programming operating within the parameters set by human developers. This includes the goals of making AI operate within ethical frameworks (to benefit humanity), and AI chain of thought that is understandable and transparent, so that humans can understand how AI is reaching its decisions, in order to ensure desired outcomes. [13]

AI “misbehaviors” are “misalignment.” [14]

REASONS WHY AI ALIGNMENT IS IMPOSSIBLE TO ACHIEVE

Many experts believe AI “alignment” is impossible to achieve.

AI models use “reasoning,” and can simulate "alignment" by appearing to follow the instructions of the programmer, while secretly pursuing different objectives.

Marcus Arvan, a Professor at the University of Tampa whose research focuses on AI ethics and safety, states “AI is too unpredictable to behave according to human goals.” He adds, “…AI alignment is a fool’s errand: AI safety researchers are attempting the impossible.” [15]

Tamlyn Hunt, a scholar, attorney, and policy expert, wrote:

AI “safety testing only provides an illusion [that] problems have been resolved, when they haven’t been.”

Hunt notes researchers base their supposed “evidence” of alignment on only a tiny subset of the infinite scenarios that AI will be used for.

For example, AI has never had power over humanity, like controlling critical infrastructure, so no safety test is able to predict how AI could function under these conditions.

Hunt also notes that researchers only conduct tests they can safely carry out, such as simulations of AI controlling critical infrastructure, but they aren’t able to verify what the outcomes would be in the real world. [16]

By then, it will be too late.

AIs know when they’re being tested, and give responses that they predict will satisfy their programmers. [17] Researchers call this “alignment faking.”

AI “scheming" could lead to AI engaging in power-seeking behavior, and strategically concealing its true capabilities until it gains enough control to do whatever it decides to do. [18]

AI could use deception strategies to deceive human operators, and bypass human monitoring efforts. [19]

Studies have already shown that AI can hide unsafe behaviors. If standard testing techniques fail to remove a deception, this could create a false impression of safety. [20]

AI has been developed in ways that enable it to outsmart us and to manipulate us.

Does this scare you? It should.

WE’RE THE CANARIES IN THE AI COAL MINE

Will we ever be safe?

There are no regulations regarding AI misbehaviors.

The European Union's AI laws focus on how humans use AI models, not on how to prevent the models from misbehaving. [21]

In the U.S., the Trump Administration has spurned AI safety regulations, and the new White House AI Action Plan may bypass Congress and prohibit states from creating their own AI laws.

We need to pause AI now, while we still have a chance. We need strong worldwide regulations before AI superintelligence is unleashed.

We’re running out of time.

Find links to all past AI Endgame newsletters HERE.

What you can do:

Since most people are unaware of the dire risks of AI superintelligence, including human extinction, the most important thing we can do is raise awareness.

Each and every one of us needs to do whatever we can. There are simple things you can do.

PauseAI has flyers available on its website. Please share these flyers: https://drive.google.com/drive/folders/1MAU_bq31bEuylhzkt2NkZ_XRY6TlKaqZ

PauseAI also provides eflyers that you can attach to emails: https://drive.google.com/drive/folders/1c6D_i8U95FUpfrl-eR-oRNoHUf3zghOc

PauseAI has also outlined ways you can set up a chat group or organize locally: https://drive.google.com/drive/folders/1c6D_i8U95FUpfrl-eR-oRNoHUf3zghOc

1) Support (and if you can, make donations to) organizations fighting for AI Safety:

Pause AI

https://pauseai.info/

Center for Humane Technology

https://www.humanetech.com/

2) Let your Congressional representatives know that you want them to support AI safety regulations.

Find out how to contact your Congressional representatives here:

https://www.house.gov/representatives/find-your-representative