Efficient Dual-Jailbreaking via Target-Driven Initialization and Multi-Target Optimization

0 1 minute read

[Submitted on 21 Apr 2025 (v1), last revised 4 Oct 2025 (this version, v2)]

View PDF of the paper titled DualBreach: Efficient Finishing with Objective-Driven Initialization and Multi-Objective Optimization, by Xinzhe Huang and 7 other authors

View PDF HTML (beta)

a summary:Recent research has focused on exploring vulnerabilities in large language models (LLMS), with the aim of capturing malicious and/or sensitive content from LLMs. However, due to insufficient research on dual collapse-attacks targeting both LLMs and Brasslails, the effectiveness of existing attacks is limited when attempting to bypass safety-aligned LLMs maintained by guardrails. Therefore, in this paper, we propose DualBreach, an objective-driven framework for dual processing. DualBreach uses a target-driven initialization (TDI) strategy to dynamically generate initial prompts, along with a multi-objective optimization (MTO) method that uses approximate gradients to adapt shared prompts across guardrails and LLMs, which can deliver a number of plugs simultaneously. For Black Guardrails, DualBreach either employs a powerful open source Guardrails or mimics the target Black Guardrails by training a proxy model, to integrate Guardrails into the MTO process.

We demonstrate the effectiveness of dualbreach in double-separation scenarios through extensive evaluation on several widely used datasets. Experimental results indicate that DualBreach outperforms state-of-the-art methods with fewer queries, achieving significantly higher success rates across all settings. More specifically, DualBreach achieves an average double sequential success rate of 93.67% versus GPT-4 with Llama-Guard-3 protection, while the best success rate achieved by other methods is 88.33%. Furthermore, DualBreach only uses an average of 1.77 queries per successful double jailbreak, outperforming other state-of-the-art methods. For defense purpose, we propose XGBOOST based defense mechanism named Eguard, which integrates multiple guardrail strengths, demonstrating superior performance compared with LLAMA-GUARD-3.