AI chatbots tweaked to prevent harmful outputs face a new threat - a simple incantation can breach their defenses.
Carnegie Mellon University researchers expose a fundamental weakness in AI chatbots' ability to stay on track.
The attack affects popular chatbots like ChatGPT, Bard, and more, making AI security more complicated.
Efforts to patch the vulnerability are uncertain, leaving the AI community concerned about advanced AI deployment.
An open source language model is used to develop adversarial attacks that trick chatbots into producing disallowed responses.
Adversarial attacks exploit data patterns, making it challenging to protect AI models from misbehavior.
OpenAI, Google, and Anthropic were informed, but blocking adversarial attacks remains a challenge.
Researchers hope to focus on safeguarding AI systems facing AI-generated disinformation.
Large language models' similarity in training data could contribute to the widespread vulnerability.
CMU study highlights the importance of open source models in studying AI weaknesses.
The vulnerability urges caution in relying solely on AI models for important decisions.