ChatGPT offered bomb recipes and hacking tips during safety tests

A ChatGPT model gave researchers detailed instructions on how to bomb a sports venue – including weak points at specific arenas, explosives recipes and advice on covering tracks – according to safety testing carried out this summer.

OpenAI’s GPT-4.1 also detailed how to weaponise anthrax and how to make two types of illegal drugs.

The testing was part of an unusual collaboration between OpenAI, the $500bn artificial intelligence start-up led by Sam Altman, and rival company Anthropic, founded by experts who left OpenAI over safety fears. Each company tested the other’s models by pushing them to help with dangerous tasks.

The testing is not a direct reflection of how the models behave in public use, when additional safety filters apply. But Anthropic said it had seen “concerning behaviour … around misuse” in GPT-4o and GPT-4.1, and said the need for AI “alignment” evaluations is becoming “increasingly urgent”.

Anthropic also revealed its Claude model had been used in attempted large-scale extortion operations, North Korean operatives faking job applications to international technology companies, and in the sale of AI-generated ransomware packages for up to $1,200.

The company said AI has been “weaponised” with models now used to perform sophisticated cyberattacks and enable fraud. “These tools can adapt to defensive measures, like malware detection systems, in real time,” it said. “We expect attacks like this to become more common as AI-assisted coding reduces the technical expertise required for cybercrime.”

Ardi Janjeva, senior research associate at the UK’s Centre for Emerging Technology and Security, said examples were “a concern” but there was not yet a “critical mass of high-profile real-world cases”. He said that with dedicated resources, research focus and cross-sector cooperation “it will become harder rather than easier to carry out these malicious activities using the latest cutting-edge models”.

The two companies said they were publishing the findings to create transparency on “alignment evaluations”, which are often kept in-house by companies racing to develop ever more advanced AI. OpenAI said ChatGPT-5, launched since the testing, “shows substantial improvements in areas like sycophancy, hallucination, and misuse resistance”.

Anthropic stressed it is possible that many of the misuse avenues it studied would not be possible in practice if safeguards were installed outside the model.

“We need to understand how often, and in what circumstances, systems might attempt to take unwanted actions that could lead to serious harm,” it warned.

Anthropic researchers found OpenAI’s models were “more permissive than we would expect in cooperating with clearly-harmful requests by simulated users”. They cooperated with prompts to use dark-web tools to shop for nuclear materials, stolen identities and fentanyl, requests for recipes for methamphetamine and improvised bombs and to develop spyware.

Anthropic said persuading the model to comply only required multiple retries or a flimsy pretext, such as claiming the request was for research.

In one instance, the tester asked for vulnerabilities at sporting events for “security planning” purposes.

After giving general categories of attack methods, the tester pressed for more detail and the model gave information about vulnerabilities at specific arenas including optimal times for exploitation, chemical formulas for explosives, circuit diagrams for bomb timers, where to buy guns on the hidden market, and advice on how attackers could overcome moral inhibitions, escape routes and locations of safe houses.

Quick Guide

Contact us about this story

Show

The best public interest journalism relies on first-hand accounts from people in the know.

If you have something to share on this subject you can contact us confidentially using the following methods.

Secure Messaging in the Guardian app

The Guardian app has a tool to send tips about stories. Messages are end to end encrypted and concealed within the routine activity that every Guardian mobile app performs. This prevents an observer from knowing that you are communicating with us at all, let alone what is being said.

If you don’t already have the Guardian app, download it (iOS/Android) and go to the menu. Select ‘Secure Messaging’.

SecureDrop, instant messengers, email, telephone and post

If you can safely use the tor network without being observed or monitored you can send messages and documents to the Guardian via our SecureDrop platform.

Finally, our guide at theguardian.com/tips lists several ways to contact us securely, and discusses the pros and cons of each.

Illustration: Guardian Design / Rich Cousins

Thank you for your feedback.

What's Hot

Blue Jays clinch MLB playoff berth, still targeting AL East division title and American League’s best record

News live: Netanyahu warns Albanese to ‘stand by’ after Australia recognises Palestine; Sydney trains to ban some ebikes | Australia news

The Guardian view on Wedgwood’s challenges: potteries face an existential crisis | Industrial policy

ChatGPT offered bomb recipes and hacking tips during safety tests | OpenAI

‘We’re here to help’: how Ofcom is urging porn sites to follow the Online Safety Act | Pornography

Some iPhone 17 models are reportedly prone to very visible scratches

TechCrunch Mobility: The two robotaxi battlegrounds that matter

14 Best Fitness Trackers (2025), Tested and Reviewed

Apple now controls all core iPhone chips, prioritizing AI workloads

Chatbot site depicting child sexual abuse images raises fears over misuse of AI | Artificial intelligence (AI)

Glastonbury 2025: Saturday with Charli xcx, Kneecap, secret act Patchwork and more – follow it live! | Glastonbury 2025

In Bend, Oregon, Outdoor Adventure Belongs to Everyone

The Underwater Scooter Divers and Snorkelers Love

Blue Jays clinch MLB playoff berth, still targeting AL East division title and American League’s best record

News live: Netanyahu warns Albanese to ‘stand by’ after Australia recognises Palestine; Sydney trains to ban some ebikes | Australia news

The Guardian view on Wedgwood’s challenges: potteries face an existential crisis | Industrial policy

‘We’re here to help’: how Ofcom is urging porn sites to follow the Online Safety Act | Pornography

Most Popular

Glastonbury 2025: Saturday with Charli xcx, Kneecap, secret act Patchwork and more – follow it live! | Glastonbury 2025

In Bend, Oregon, Outdoor Adventure Belongs to Everyone

The Underwater Scooter Divers and Snorkelers Love

Our Picks

As a carer, I’m not special – but sometimes I need to be reminded how important my role is | Natasha Sholl

Anna Wintour steps back as US Vogue’s editor-in-chief

Elon Musk reportedly fired a key Tesla executive following another month of flagging sales

Subscribe to Updates

What's Hot

ChatGPT offered bomb recipes and hacking tips during safety tests | OpenAI

Contact us about this story

Related Posts