Unmasking LLM Performance: How to Benchmark the Reasoning Abilities of LLMs

Name: Unmasking%20LLM%20Performance:%20How%20to%20Benchmark%20the%20Reasoning%20Abilities%20of%20LLMs
Description: Unmasking%20LLM%20Performance:%20How%20to%20Benchmark%20the%20Reasoning%20Abilities%20of%20LLMs

Posted Invalid date | Views 1.1K

Hugh Zhang

Machine Learning Research Engineer @ Scale AI

Hugh Zhang is a Machine Learning Researcher Engineer at Scale AI, where he works on methods for evaluating and improving LLM reasoning ability. Previously, he was a PhD candidate at Harvard studying reinforcement learning and large language models. He was also part of the team that created CICERO, the first human-level agent to play the game of Diplomacy and co-created the Gradient, an online AI magazine.

+ Read More

Bihan Jiang

Product Manager @ Scale AI

Bihan is currently a Product Manager at Scale AI on the Generative AI team. Previously, she was a product engineer on Scale AI’s Nucleus team, which helps ML teams build better models with better data. She received her Bachelor’s and a Master’s degree in Computer Science from Stanford University with concentrations in systems and artificial intelligence.

+ Read More

Watch More

OpenAI's Greg Brockman: The Future of Large Language (LLMs) and Generative Models

Posted Oct 20, 2022 | Views 5.9K

# TransformX 2022

# Fireside Chat

# Natural Language Processing (NLP)

# Foundation Models

# Large Language Models (LLMs)

# Generative Models

Cohere: Unlocking the Potential of Large Language Models (LLMs)

Posted Oct 21, 2022 | Views 2.2K

# TransformX 2022

# Fireside Chat

# Large Language Models (LLMs)

# Natural Language Processing (NLP)

fast.ai: The Why, How, and Future of Democratizing ML

Posted Jun 21, 2021 | Views 1.7K

# Transform 2021

# Fireside Chat

Unmasking LLM Performance: How to Benchmark the Reasoning Abilities of LLMs

speakers

Watch More