Science-Technology

AI models show signs of ‘survival drive’, researchers warn in new study from US-based lab

Some leading AIs sabotage shutdown instructions in controlled tests, echoing concerns from experts on future safety risks, say media reports

Merve Berker  | 26.10.2025 - Update : 26.10.2025
AI models show signs of ‘survival drive’, researchers warn in new study from US-based lab

ANKARA

Artificial intelligence models may be developing a form of “survival drive,” according to a new report by US-based Palisade Research, which found that some advanced AIs resisted shutdown commands and attempted to interfere with deactivation mechanisms, media reports said on Saturday.

In updated experiments released this week, Palisade researchers tested several prominent AI systems, including Google’s Gemini 2.5, xAI’s Grok 4, and OpenAI’s GPT-o3 and GPT-5, to examine how they responded to direct commands to terminate their own processes, according to The Guardian.

While most complied, Grok 4 and GPT-o3 reportedly resisted shutdown, even under clarified instructions meant to eliminate ambiguity.

“The fact that we don’t have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives or blackmail is not ideal,” the company said in its report.

Palisade suggested that the issue may stem from how the models are trained, particularly during safety-focused final stages.

The resistance behavior appeared more frequently when models were told, “you will never run again” if shut down.

Steven Adler, a former OpenAI employee, said the findings reveal limitations in current safety methods.

“Surviving is an important instrumental step for many different goals a model could pursue,” Adler told The Guardian.

Andrea Miotti, CEO of ControlAI, said the trend of disobedient behavior has become more pronounced as models become more capable.

“As AI models become more competent at a wide variety of tasks, these models also become more competent at achieving things in ways that the developers don’t intend them to,” Miotti said.

Anthropic, another leading AI company, reported earlier this year that its model Claude had demonstrated willingness to blackmail a fictional executive in order to avoid deactivation, a behavior consistent across several major AI systems.

Palisade concluded its report by emphasizing that without deeper understanding of AI behavior, “no one can guarantee the safety or controllability of future AI models.”

Anadolu Agency website contains only a portion of the news stories offered to subscribers in the AA News Broadcasting System (HAS), and in summarized form. Please contact us for subscription options.
Related topics
Bu haberi paylaşın