Researchers Reveal ‘Deceptive Delight’ Method to Jailbreak AI Models – OfficialSarkar
Oct 23, 2024Ravie LakshmananArtificial Intelligence / Vulnerability Cybersecurity researchers have shed light on a new adversarial technique that could be used to jailbreak large language models (LLMs) during the course of an interactive conversation by sneaking in an undesirable instruction between benign ones. The approach has been codenamed Deceptive Delight by Palo Alto Networks Unit…