How to Deploy Models at Scale with GPUs

Name: How%20to%20Deploy%20Models%20at%20Scale%20with%20GPUs
Uploaded: 2022-10-19T14:50:28.622Z

Posted Oct 19, 2022 | Views 488

# TransformX 2022

# Breakout Session

Varun Mohan

CEO and Co-Founder @ Exafunction

Varun Mohan is the CEO and Co-Founder of Exafunction, which builds infrastructure to optimize deep learning workloads. Previously, Varun was a technical lead and senior manager at Nuro, where he saw the power of deep learning and the large challenges of productionizing it at scale. Before that, he received a B.S. and Masters in Computer Science from MIT.

+ Read More

SUMMARY

Graphics Processing Units (GPUs) are used for training artificial intelligence and deep learning models, particularly those related to ML inference use cases. However, using GPUs to deploy models at scale can create several challenges for ML practitioners. In this session, Varun Mohan, CEO and Co-Founder of Exafunction, shared the best practices he’s learned to build an architecture that optimizes GPUs for deep learning workloads. Mohan explained the advantages for using GPUs for ML deployment, as well as where they might not have as many benefits. Mohan also discussed cost, memory, and other factors in the GPU-vs-CPU equation. He discussed inefficiencies that may arise in different scenarios and some of the issues related to network bandwidth and egress. Mohan offered techniques, including the importance of batching workloads and optimizing your models, to solve these problems. Finally, he discussed how some companies are using GPUs to run their recommendation and serving systems. Before Exafunction, Mohan was a technical lead and senior manager at Nuro, where he saw the power of deep learning and the large challenges of productionizing it at scale.

+ Read More