Scale Events
+00:00 GMT
J2: Jailbreaking to Jailbreak
LIVESTREAM

J2: Jailbreaking to Jailbreak

About the Tech Talk

Join Scale AI researchers as they present their paper, “Jailbreaking to Jailbreak (J2),” which introduces a new paradigm in automated safety testing for large language models. This webinar will walk through the J2 approach—an LLM trained to systematically red-team other LLMs using a structured, multi-turn attack pipeline that mirrors the creativity and adaptability of human red-teamers.

Through in-depth exploration of methodology, model behaviors, and empirical results, we’ll show how J2 achieves red-teaming performance on par with humans, offering a scalable, cost-effective alternative for vulnerability discovery. We'll also discuss implications for safety research, limitations of current defenses, and what this means for the future of alignment and model control.

Key takeaways

  1. Learn how a capable LLM can be trained to jailbreak other models - and itself - by simulating human-like attack strategies.
  2. Understand J2’s multi-stage pipeline for planning, executing, and evaluating adversarial interactions with other LLMs.
  3. Explore how this hybrid approach outperforms traditional automated red-teaming methods and approaches the success rate of professional human testers.
  4. Gain insight into the emerging risks of scalable, self-reasoning attack agents and what this means for the next generation of AI safety defenses.

Check out the paper here:  https://scale.com/research/j2

Speakers

Willow Primack
Willow Primack
VP, Prompt Engineering & Technical Operations @ Scale AI
Jeremy  Kritz
Jeremy Kritz
Prompt Engineer @ Scale AI
Sail (Zifan) Wang
Sail (Zifan) Wang
ML Research Scientist, SEAL @ Scale AI

Agenda

4:00 PM, GMT
-
4:45 PM, GMT
Presentation
J2: Jailbreaking to Jailbreak - Presentation
Willow Primack
Sail (Zifan) Wang
Jeremy  Kritz
4:16 PM, GMT
-
5:00 PM, GMT
Opening / Closing
Live Q&A
Live in 7 days
May 13, 5:00 PM, GMT
Online
Organized by
Scale Events
Scale Events
Add to calendar
Live in 7 days
May 13, 5:00 PM, GMT
Online
Organized by
Scale Events
Scale Events
Add to calendar