Scale Events
+00:00 GMT
Sign in or Join the community to continue

Unmasking LLM Performance: How to Benchmark the Reasoning Abilities of LLMs

Posted Invalid date | Views 844
Share
speakers
avatar
Hugh Zhang
Machine Learning Research Engineer @ Scale AI

Hugh Zhang is a Machine Learning Researcher Engineer at Scale AI, where he works on methods for evaluating and improving LLM reasoning ability. Previously, he was a PhD candidate at Harvard studying reinforcement learning and large language models. He was also part of the team that created CICERO, the first human-level agent to play the game of Diplomacy and co-created the Gradient, an online AI magazine.

+ Read More
avatar
Bihan Jiang
Product Manager @ Scale AI

Bihan is currently a Product Manager at Scale AI on the Generative AI team. Previously, she was a product engineer on Scale AI’s Nucleus team, which helps ML teams build better models with better data. She received her Bachelor’s and a Master’s degree in Computer Science from Stanford University with concentrations in systems and artificial intelligence.

+ Read More

Watch More

0:47
OpenAI's Greg Brockman: The Future of Large Language (LLMs) and Generative Models
Posted Oct 20, 2022 | Views 5.4K
# TransformX 2022
# Fireside Chat
# Natural Language Processing (NLP)
# Foundation Models
# Large Language Models (LLMs)
# Generative Models
fast.ai: The Why, How, and Future of Democratizing ML
Posted Jun 21, 2021 | Views 1.6K
# Transform 2021
# Fireside Chat