LIVESTREAM

Building Trust in AI: Testing and Evaluating Large Language Models (LLMs)

About the Tech Talk

Understanding the capabilities, risks, and vulnerabilities of large language models is critical to ensuring the safety of these models. Join Scale as we discuss our vision for what an effective and comprehensive test and evaluation (“T&E”) regime for LLMs should look like moving forward, how that leverages human experts, as well as how we aim to help service this need with our new Scale LLM Test & Evaluation offering.

About Daniel Berrios

Daniel Berrios is the Chief of Staff at Scale, where he focuses on strategic projects on behalf of the CEO, including the launch of Scale’s large language model test & evaluation platform. In his prior life, he served as Co-Founder and Chief Operating Officer of Helia, a machine learning company focused on physical security applications, which was acquired by Scale in 2020. Previously, he advised on IPOs, M&A, and corporate strategy at Goldman Sachs. He was born and raised in California, where he developed his deep passion for technology, and holds a degree in Economics from Stanford University.

About Dylan Slack

Dylan Slack is a Machine Learning Research Engineer at Scale AI, and he previously earned his Ph.D. at UC Irvine advised by Sameer Singh and Hima Lakkaraju and was associated with UCI NLP, CREATE, and the HPI Research Center. His research focuses on developing techniques that help researchers and practitioners build more robust, reliable, and trustworthy machine learning systems. His work has been published at venues such as NeurIPS, Nature, EMNLP, and ACL, and has been cited 1000+ times. In the past, he has worked at Google AI and AWS.

Speakers

Daniel Berrios

Chief of Staff @ Scale AI

Dylan Slack

Machine Learning Research Engineer @ Scale AI

Agenda

Track View

5:00 PM

5:40 PM

GMT

Stage 1

Presentation

Building Trust in AI: Testing and Evaluating Large Language Models (LLMs)

5:40 PM

6:00 PM

GMT

Stage 1

Q&A

Live Q&A

Live Q&A with Daniel Berrios and Dylan Slack

+ Read More

Event has finished

September 27, 5:00 PM GMT

Online

Organized by

Scale Events

Event has finished

September 27, 5:00 PM GMT

Online

Organized by

Scale Events