Name: Lightning Talk: LLMs on Edge with AI Accelerators - Chen Lai & Cemal Bilgin, Meta
Start: 2024-09-19T11:35:00-0700
End: 2024-09-19T11:45:00-0700

September 18-19, 2024
San Francisco, California
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Pacific Daylight Time (UTC-7). To see the schedule in your preferred timezone, please select from the drop-down located at the bottom of the menu to the right.

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Thursday September 19, 2024 11:35am - 11:45am PDT

Room A

LLMs are known to be compute heavy and consume lots of resources (almost all resources on phones), including memory and power. A natural thought is to leverage the AI hardware accelerators, for example, Apple Neural Engine (ANE) on Apple devices and HTP on Qualcomm SoCs, to make it run fast and efficiently. Only by optimizing the model latency, memory consumption and power usage to a certain level will users be interested in installing the models on their devices. In this session, we’d like to introduce how we leverage these AI accelerators within the PyTorch ecosystem to achieve the state-of-art performance for llama3 on device, via ExecuTorch and the partnership with Apple and Qualcomm. Hardware companies usually have their own AI accelerators. Likely they have different characteristics, one may support a list of different operators than others, and one may only support static shapes (like HTP). However, transformers-based optimization can be generic. We’ll discuss in more detail how we apply the generic optimization as well as the backend specific optimization. The techniques we applied here are not just for LLMs, but can be applied to other transformer-based models.

Speakers

Chen Lai

Software Engineer, Meta

Software engineers focusing on bringing up accelerators on devices

Cemal Bilgin

Making LLama3 8B go brr on phones, Meta

Engineering Manager PyTorch Edge Acceleration

Thursday September 19, 2024 11:35am - 11:45am PDT
Room A

Lightning Talks

Audience Intermediate

PyTorch Conference 2024

Chen Lai

Cemal Bilgin

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!