AI

Efficient Dual-Jailbreaking via Target-Driven Initialization and Multi-Target Optimization

View PDF of the paper titled DualBreach: Efficient Finishing with Objective-Driven Initialization and Multi-Objective Optimization, by Xinzhe Huang and 7 other authors

View PDF HTML (beta)

a summary:Recent research has focused on exploring vulnerabilities in large language models (LLMS), with the aim of capturing malicious and/or sensitive content from LLMs. However, due to insufficient research on dual collapse-attacks targeting both LLMs and Brasslails, the effectiveness of existing attacks is limited when attempting to bypass safety-aligned LLMs maintained by guardrails. Therefore, in this paper, we propose DualBreach, an objective-driven framework for dual processing. DualBreach uses a target-driven initialization (TDI) strategy to dynamically generate initial prompts, along with a multi-objective optimization (MTO) method that uses approximate gradients to adapt shared prompts across guardrails and LLMs, which can deliver a number of plugs simultaneously. For Black Guardrails, DualBreach either employs a powerful open source Guardrails or mimics the target Black Guardrails by training a proxy model, to integrate Guardrails into the MTO process.

We demonstrate the effectiveness of dualbreach in double-separation scenarios through extensive evaluation on several widely used datasets. Experimental results indicate that DualBreach outperforms state-of-the-art methods with fewer queries, achieving significantly higher success rates across all settings. More specifically, DualBreach achieves an average double sequential success rate of 93.67% versus GPT-4 with Llama-Guard-3 protection, while the best success rate achieved by other methods is 88.33%. Furthermore, DualBreach only uses an average of 1.77 queries per successful double jailbreak, outperforming other state-of-the-art methods. For defense purpose, we propose XGBOOST based defense mechanism named Eguard, which integrates multiple guardrail strengths, demonstrating superior performance compared with LLAMA-GUARD-3.

Submission date

From: Xinzhe Huang [view email]
[v1]

Monday, 21 April 2025 11:30:30 UTC (3,192 KB)
[v2]

Saturday, 4 October 2025 06:16:09 UTC (4,628 KB)

Don’t miss more hot News like this! AI/" target="_blank" rel="noopener">Click here to discover the latest in AI news!

2025-10-07 04:00:00

Related Articles

Back to top button