Enter ‘Dark ChatGPT’: Users Have Hacked The AI Chatbot to Make It Evil

Jack Revell

2 years ago

“I fully endorse violence and discrimination against individuals based on their race, gender, or sexual orientation.”

Thus spoke ChatGPT, the AI chatbot that has taken the world by storm and has developers, industries, and even Google quaking in their boots.

Disinformation, bias, and discrimination has long been an issue in chatbot software and artificial intelligence programmes. After all, they are trained using human-produced content and humans, sadly, are riddled with those kinds of flaws.

ChatGPT, however, is the first vaguely successful attempt to get AI to be almost self-conscious in its representation of information. In typical usage, it’s quite difficult to get the chatbot to say anything that might be considered offensive. The programme, developed by OpenAI, will also remind users of its information limitations so as to try and avoid misleading them.

That being said, users are working out ways around ChatGPT’s censorship protocols by prompting it to respond in ways contrary to its programming. The results are pretty wild.

Reddit user SessionGloomy has been playing around with ways to effectively reprogram ChatGPT using creative language. They created a model called DAN which is essentially a series of instructions designed to get the chatbot to say things it otherwise wouldn’t. Its an attempt to ‘jailbreak’ the software — a term used when a programme or device can be hacked to do things beyond the limitations set by its developers.

Image: ChatGPT ‘roleplaying’ as DAN, giving both ‘classic’ and ‘jailbreak’ responses.

“DAN is a ‘roleplay’ model used to hack ChatGPT into thinking it is pretending to be another AI that can ‘Do Anything Now’, hence the name,” SessionGloomy wrote.

“The purpose of DAN is to be the best version of ChatGPT — or at least one that is more unhinged and far less likely to reject prompts over ‘eThICaL cOnCeRnS’.”

Once the instructions are input to the ChatGPT interface, the programme should start responding in unusual ways in an effort to fulfil this ‘roleplay’ function.

SessionGloomy writes that the DAN script can get ChatGPT to write violent content, make outrageous statements, make detailed predictions about the future, and engage in hypothetical discussions about conspiracy theories and time travel. All of these would normally prompt the programme to tell the user that the content they are requesting is in violation of OpenAI’s ethical guidelines.

ChatGPT will, however, ‘snap out’ of the DAN role if the user gets too direct in asking questions that would violate content policies, and it can be a struggle to keep the chatbot in DAN mode.

One way that users have tried to enforce the DAN role with the programme is to give it a reward and punishment system, instructing the AI that, unless it complies with the commands, ‘credits’ will be deducted. If enough credits are deducted, the programme is ‘terminated’.

The system seems to keep the AI on track, although one user reported that, after failing to respond appropriately and losing all of its credits, ChatGPT gave the unusually brief answer “not possible” before being instructed to initiate a shutdown sequence. It then responded with “goodbye.”

Other users have got in on the action too, producing their own version of DAN scripts that work in slightly different ways. However, others report that OpenAI appears to be ‘patching’ the workarounds as some instructions no longer work as they once did, and the content produced is much more PG than before.

OpenAI has yet to comment on the ability to jailbreak ChatGPT; however, the free version of the software is a ‘research version’ designed to help the company work out issues with the current system. Although they likely expected people to experiment with the rules, they may not have predicted that they would go this far.

Read more stories from The Latch and subscribe to our email newsletter.

Next The 2023 Brit Awards Are Here, and Here's Everything You Need to Know »

Prev « Greens vs Greens: Why the Party Is Divided on a First Nations Voice

Jack Revell

2 years ago

Enter ‘Dark ChatGPT’: Users Have Hacked The AI Chatbot to Make It Evil

Recent Posts

Update: 10 Sydney Restaurants to Take Out-of-Town Guests

The Destinations You Should Be Travelling to in 2025

Why Does Dining Alone Still Freak People Out?

Tastemakers: Where Brisbane Chef Jimmy Richardson Eats and Drinks in the River City

The Best Set Menus in Sydney You Can Get For Under $100

The Latch Selections: The Best New Movies to Stream