About the Tech Talk
Join Scale AI researchers as they present their paper, “Jailbreaking to Jailbreak (J2),” which introduces a new paradigm in automated safety testing for large language models. This webinar will walk through the J2 approach—an LLM trained to systematically red-team other LLMs using a structured, multi-turn attack pipeline that mirrors the creativity and adaptability of human red-teamers.
Through in-depth exploration of methodology, model behaviors, and empirical results, we’ll show how J2 achieves red-teaming performance on par with humans, offering a scalable, cost-effective alternative for vulnerability discovery. We'll also discuss implications for safety research, limitations of current defenses, and what this means for the future of alignment and model control.
Key takeaways
Learn how a capable LLM can be trained to jailbreak other models - and itself - by simulating human-like attack strategies.
Understand J2’s multi-stage pipeline for planning, executing, and evaluating adversarial interactions with other LLMs.
Explore how this hybrid approach outperforms traditional automated red-teaming methods and approaches the success rate of professional human testers.
Gain insight into the emerging risks of scalable, self-reasoning attack agents and what this means for the next generation of AI safety defenses.
Check out the paper here: https://scale.com/research/j2