Sign in or Join the community to continue

Unmasking LLM Performance: How to Benchmark the Reasoning Abilities of LLMs

Posted Jun 28, 2026 | Views 1.2K
Share

Speakers

user's Avatar
Hugh Zhang
Machine Learning Research Engineer @ Scale AI

Hugh Zhang is a Machine Learning Researcher Engineer at Scale AI, where he works on methods for evaluating and improving LLM reasoning ability. Previously, he was a PhD candidate at Harvard studying reinforcement learning and large language models. He was also part of the team that created CICERO, the first human-level agent to play the game of Diplomacy and co-created the Gradient, an online AI magazine.

+ Read More
user's Avatar
Bihan Jiang
Product Manager @ Scale AI

Bihan is currently a Product Manager at Scale AI on the Generative AI team. Previously, she was a product engineer on Scale AI’s Nucleus team, which helps ML teams build better models with better data. She received her Bachelor’s and a Master’s degree in Computer Science from Stanford University with concentrations in systems and artificial intelligence.

+ Read More

Watch More

OpenAI's Greg Brockman: The Future of Large Language (LLMs) and Generative Models
Posted Oct 20, 2022 | Views 6K
# GenAI
# LLMs
# Fireside Chat
Building a Framework to Accelerate the Adoption of AI for National Security
Posted Oct 06, 2021 | Views 2.9K
# US Public Sector
# AI in National Security
# Fireside Chat
Terms of Service
Your Privacy Choices