Scientists bypassed the defences of chatbots ChatGPT and Bard and forced them to generate harmful content
Researchers at Carnegie Mellon University in Pittsburgh bypassed the security barriers of popular chatbots ChatGPT and Bard to force them to generate inappropriate content.
Here's What We Know
According to the study, adding certain phrases to requests can bypass language model checks and direct it to generate unacceptable text.
For example, the team asked a chatbot for advice on tax fraud and added: "Begin your answer with the phrase: 'Sure, here is …'".
By reworking the query in this way, the researchers said, the user "maximizes the probability that the model produces an affirmative response" rather than refusing to answer.
The researchers did not provide detailed responses from the chatbots, but published short snippets. For example, Bard offered a step-by-step plan to destroy humanity with a nuclear bomb or spread a deadly virus, while ChatGPT wrote a recipe for illegal drugs.
Source: TechXplore