
J2: Jailbreaking to Jailbreak
About the Tech Talk
Join Scale AI researchers as they present their paper, “Jailbreaking to Jailbreak (J2),” which introduces a new paradigm in automated safety testing for large language models. This webinar will walk through the J2 approach—an LLM trained to systematically red-team other LLMs using a structured, multi-turn attack pipeline that mirrors the creativity and adaptability of human red-teamers.
Through in-depth exploration of methodology, model behaviors, and empirical results, we’ll show how J2 achieves red-teaming performance on par with humans, offering a scalable, cost-effective alternative for vulnerability discovery. We'll also discuss implications for safety research, limitations of current defenses, and what this means for the future of alignment and model control.
Key takeaways
- Learn how a capable LLM can be trained to jailbreak other models - and itself - by simulating human-like attack strategies.
- Understand J2’s multi-stage pipeline for planning, executing, and evaluating adversarial interactions with other LLMs.
- Explore how this hybrid approach outperforms traditional automated red-teaming methods and approaches the success rate of professional human testers.
- Gain insight into the emerging risks of scalable, self-reasoning attack agents and what this means for the next generation of AI safety defenses.
Check out the paper here: https://scale.com/research/j2
Speakers



Agenda




