PyTorch Conference 2024: Full Schedule

September 18-19, 2024
San Francisco, California
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Pacific Daylight Time (UTC-7). To see the schedule in your preferred timezone, please select from the drop-down located at the bottom of the menu to the right.

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

11:10am PDT

Lightning Talk: What’s New in Export? - Angela Yi, Tugsbayasgalan Manlaibaatar, Avik Chaudhuri & Yidi Wu, Meta

Wednesday September 18, 2024 11:10am - 11:20am PDT

Festival Pavilion - Breakout Room A

This talk discusses updates we've made to torch.export this past year: (a) Non-strict mode, an alternative tracing mode which in practice covers more programs than TorchDynamo without compromising important soundness guarantees (b) Better dynamic shapes specifications through generating suggested fixes and runtime assertions (c) Control flow operators such as cond, map, and associative scan (d) A shift in the export generated IR, which will enable both training and inference (e) An unflattener, which will reconstruct the eager module structure from the flattened exported graph

Speakers

Yidi WU

Research Scientist, Meta

I work on torch.export. Recently on front-end support of control flow operators/higher order operators.

Angela Yi

Software Engineer, Meta

I've been working on the PyTorch Compilers team for the past 2 years, mainly working on torch.export!

Avik Chaudhuri

Software Engineer, Meta

Creator of @flowtype. Machine learning explorer. Rusty programming language researcher. Amateur chef. Soccer dad. Website: https://avikchaudhuri.github.io/ Twitter: @__avik Blog: https://mathydad.wordpress.com/

Tugsbayasgalan Manlaibaatar

Software Engineer, Meta

I am a software engineer at Meta, working on PyTorch Compilers. I mainly work on the PT2 export workstream.

What’s new in torch.export .pptx pdf

Wednesday September 18, 2024 11:10am - 11:20am PDT
Festival Pavilion - Breakout Room A

Lightning Talks

Audience Intermediate

11:40am PDT

ExecuTorch Beta and on-Device Generative AI Support - Mergen Nachin & Mengtao (Martin) Yuan, Meta

Wednesday September 18, 2024 11:40am - 12:05pm PDT

Festival Pavilion - Breakout Room A

During this session, we will discuss real-life case studies focusing on the productionization of PyTorch models onto edge devices and welcome the community to begin adopting ExecuTorch. Since announcing the ExecuTorch MVP at the previous PTC, we have made significant progress in terms of stability, model coverage, accelerator performance, and developer experience, reaching a milestone that marks the transition to beta status. In addition to the above improvements, we continue to support generative AI models. Since the alpha launch that initially enabled support for LLama2/3 models, we have now expanded our capabilities to include multimodal use cases and developed mobile demo apps showcasing these new features.

Speakers

Mengtao (Martin) Yuan

Tech Lead Manager, Meta

Mengtao (Martin) Yuan is a Tech Lead Manager in Meta’s PyTorch Edge team. With multiple years of experience in the AI industry, Mengtao is focused at building software systems to help AI researchers and engineers to deploy their models on edge devices such as mobile phones, AR/VR... Read More →

Mergen Nachin

Software Engineer, Meta

Mergen Nachin is a Software Engineer specializing in creating rich AI experiences on low latency, high performance, and privacy-aware embedded systems. With a background in distributed systems, developer infrastructure, remote sensing, and localization, he brings a versatile skill... Read More →

Talk 1 PTC ExecuTorch Beta and on Device Generative AI Support pdf

Wednesday September 18, 2024 11:40am - 12:05pm PDT
Festival Pavilion - Breakout Room A

Breakout Sessions

Audience Intermediate

11:55am PDT

Lightning Talk: Mobile Computational Photography with PyTorch: Low-Light Denoising - Alexis Baudron, Sony

Wednesday September 18, 2024 11:55am - 12:05pm PDT

Festival Pavilion - Breakout Room B

Over the last decade, smartphone cameras have improved significantly, becoming the primary device people use for capturing everyday moments and high-quality photographs. This progress is largely due to advances in computational photography and novel image sensors. Computational photography enables great images from compact mobile cameras, enhancing photos through various techniques such as multi-shot merging. Despite these advancements, challenges such as noise, artifacts, and distortions persist, especially in low-light conditions where limited light increases noise levels. In this lightning talk, we will explore how PyTorch can be used to design and optimize deep learning networks for real-time low-light denoising. We will dive into noise modeling, data generation, physics-aware models, and advanced network architectures for effective denoising in challenging low-light scenarios. Attendees will gain practical insights into the latest advancements in mobile computational photography using PyTorch.

Speakers

Alexis Baudron

Senior AI Researcher, Sony

Alexis Baudron is a Senior AI Researcher at Sony, where his team specializes in building AI models to tackle complex computer vision challenges. His background is in computational photography, developing advanced techniques for image enhancement and artifact removal. Alexis earned... Read More →

Mobile Computational Photography pptx

Wednesday September 18, 2024 11:55am - 12:05pm PDT
Festival Pavilion - Breakout Room B

Lightning Talks

Audience Intermediate

2:10pm PDT

The Impact and Challenges of Open Source Generative Datasets and Models - Aaron Gokaslan, Cornell University

Wednesday September 18, 2024 2:10pm - 2:35pm PDT

Gateway Pavilion - Cowell Theater

Open source generative models like OpenGPT2, BLOOM, and others have been pivotal in advancing AI technology. These models leverage extensive text data to achieve advanced linguistic capabilities. However, the trend towards proprietary tools and closed large language models is growing, posing unique challenges in open-source AI development. This discussion will explore the intricacies of training such models, the hurdles in dataset management, and the regulation of open-source contributions. We'll explore how to effectively iterate on collected data, prepare for extensive training sessions, and coordinate research across large open-source organizations. We will discuss the challenges of generative models in three different modalities: text, image, and genomics. The talk will draw from the speaker’s personal experience on working on OpenWebText, OpenGPT2, BLOOM, CommonCanvas, Caduceus, and other generative models. We will also cover the changing AI environment and how the future of open souce is threatened by onerous regulation, ever increasing compute costs, and the commoditization of previously open data.

Speakers

Aaron Gokaslan

PhD Student, Cornell University

Aaron Gokaslan has worked on many popular generative models and datasets such as OpenWebText, CommonCanvas, BLOOM, DBRX, and Caduceus, collectively downloaded millions of times. His work on open source has earned him a Community Contributor Award at PyTorch Con and recognition from... Read More →

PyTorchCon 2024 pdf

Wednesday September 18, 2024 2:10pm - 2:35pm PDT
Gateway Pavilion - Cowell Theater

Breakout Sessions

Audience Intermediate

2:40pm PDT

Lightning Talk: Beyond Zero: Eliminating Vulnerabilities in PyTorch Container Images - Patrick Smyth, Dan Fernandez & Srishti Hegde, Chainguard

Wednesday September 18, 2024 2:40pm - 2:50pm PDT

Gateway Pavilion - Cowell Theater

Container images are increasingly the future of production applications at scale, providing reproducibility, robustness, and transparency. As PyTorch images get deployed to production, however, security becomes a major concern. PyTorch has a large attack surface, and building secure PyTorch images can be a challenge. Currently, the official PyTorch runtime container image has 30 CVEs (known vulnerabilities) rated critical and 256 CVE rated high. Improving this situation could secure many deployments that incorporate PyTorch for cloud-based inference or training. In this fast-paced session, we'll take a deep dive on the official PyTorch image from a vulnerability mitigation perspective, looking hard at included packages, executables, and active CVE. We'll identify low-hanging fruit for increasing security, including stripping bloat and building fresh. We'll also talk about the next level of security practiced in Chainguard's PyTorch image builds, such as including SBOMs and going distroless. Finally, we'll consider emerging tools and approaches for analyzing AI artifacts such as models and how these systems can benefit PyTorch in production.

Speakers

Dan Fernandez

Staff Product Manager, Chainguard

Dan is a Management Information Systems graduate from Florida's FIU and recently completed his Master of Cybersecurity at the Georgia Institute of Technology. He is currently focusing on securing the software supply chain at Chainguard. In his free time, he enjoys writing about analytics... Read More →

Patrick Smyth

Staff Developer Relations Engineer, Chainguard

Dr. Patrick Smyth is Staff Developer Relations Engineer at Chainguard, where he shows developers how to deploy AI and other applications with 0 CVEs using Chainguard Images. Patrick has a PhD in the digital humanities and in a previous life led technical bootcamps for researchers... Read More →

Srishti Hegde

Software Engineer, Chainguard

Wednesday September 18, 2024 2:40pm - 2:50pm PDT
Gateway Pavilion - Cowell Theater

Lightning Talks

Audience Intermediate

2:40pm PDT

Running State-of-Art Gen AI Models on-Device with NPU Acceleration - Felix Baum, Qualcomm

Wednesday September 18, 2024 2:40pm - 3:05pm PDT

Festival Pavilion - Breakout Room B

Since the boom of generative AI, the industry is now moving towards on-device AI inferencing, as it is not only a trend but a necessity now in order to save costs, achieve the best inference performance, ultra-low latency at the lowest power possible. In this session we go over the new features added on the Qualcomm AI Stack and how it works with the public release of ExecuTorch 1.0. We will discuss how to run traditional workloads as well as GenAI use cases including the latest version of Llama on the Mobile device while using Qualcomm Hexagon NPU.

Speakers

Felix Baum

Senior Director of Product Management, Qualcomm

Felix Baum has an extensive background of over two decades in the embedded industry, where he has excelled both as an embedded developer and a product manager. Currently he is responsible for AI Software Products at Qualcomm. Prior to that, he led efforts for various real-time operating... Read More →

Felix Baum PyTorch 2024 Session Running State of Art GenAI models on NPU pdf

Wednesday September 18, 2024 2:40pm - 3:05pm PDT
Festival Pavilion - Breakout Room B

Breakout Sessions

Audience Intermediate
Slides Attached Yes

2:55pm PDT

Lightning Talk: Sparsifying Vision Transformers with Minimal Accuracy Loss - Jesse Cai, Meta

Wednesday September 18, 2024 2:55pm - 3:05pm PDT

Gateway Pavilion - Cowell Theater

Sparsity, like quantization, is an approximate model optimization technique, where we trade some model accuracy for increased performance.

In this talk we'll explore how to minimize the accuracy degradation of sparsifying Vision Transformer (ViT) based models to GPU accelerable sparsity patterns like block sparsity and semi-structured sparsity.

We'll cover the best techniques to ensure a < 5% loss in accuracy when:
- training a sparse model from scratch
- pruning and retraining an existing dense model
- zero-shot/one-shot pruning a dense model

We've collected these techniques into a single repository, torchao, so that model optimization enthusiasts like you can sparsify your models with just a few lines of code.

Speakers

Jesse Cai

Software Engineer, Meta

Jesse is a software engineer on the PyTorch Core Performance team, where he works on accelerating models with sparsity. Before joining Meta, he worked at several startups, focusing on natural language processing.

Sparsifying ViT lightning talk slides pdf

Wednesday September 18, 2024 2:55pm - 3:05pm PDT
Gateway Pavilion - Cowell Theater

Lightning Talks

Audience Intermediate

3:10pm PDT

Lightning Talk: PyTorch/XLA Auto-Sharding - Yeounoh Chung, Google

Wednesday September 18, 2024 3:10pm - 3:20pm PDT

Gateway Pavilion - Cowell Theater

PyTorch/XLA recently launched the new PyTorch/XLA SPMD feature as a first-step to automate ML workloads parallelization using GSPMD. It turns out that the performance largely depends on the quality of sharding hints provided by the user – and it requires a correct and deep understanding of model architectures and much expertise to come up with optimal sharding hints. To address this problem, we propose to integrate PyTorch/XLA SPMD with XLA's auto sharding service that allows the XLA compiler to shard and optimize the whole model without any user input.

Speakers

Yeounoh Chung

Software Engineer, Google

SystemsResearch@Google

pytorchconf24 autosharding pdf

Wednesday September 18, 2024 3:10pm - 3:20pm PDT
Gateway Pavilion - Cowell Theater

Lightning Talks

Audience Intermediate

3:10pm PDT

TorchInductor CPU Backend Advancements: New Features and Performance Improvements - Jiong Gong & Leslie Fang, Intel

Wednesday September 18, 2024 3:10pm - 3:35pm PDT

Festival Pavilion - Breakout Room B

This presentation provides an update on the latest advancements in the TorchInductor CPU backend since the last conference to bring best-in-class CPU performance for broad DL workloads. We will discuss new features and performance enhancements, including: • Max-autotune support with codegen for GEMMs, boosting performance for GEMM-related operations • Enhanced vectorized codegen support, now covering all data types beyond floating points with flexible vector factors, and optimized loop scheduling • Comprehensive quantization support, including weight-only-quantization (WoQ), and optimizations for dynamic quantization and quantization-aware training • Improved Attention support, featuring attention masks and optimizating SoftMax via flash attention v2 etc. • AOTInductor support, enabling high-performance inference with frozen weights • Native Windows support, with improved vectorization capabilities These advancements, combined with ongoing optimizations, have resulted in significant performance improvements since PyTorch 2.1, demonstrated through extensive benchmarks and large language models (LLMs).

Speakers

Leslie Fang

Software Engineer, Intel

Leslie is a software engineer from Intel who works on PyTorch performance optimization on X86 servers for the past 4 years. Currently, he is mainly focusing on the feature domain of Quantization, Autocast, and Inductor CPP/OpenMP backend in Stock PyTorch.

Jiong Gong

Principle Engineer, Intel

Jiong is a software architect from Intel who works on PyTorch framework optimizations. He is the PyTorch module maintainer for CPU and compiler.

TorchInductor CPU Backend Advancements New Features and Performance Improvements 20240915 pdf

Wednesday September 18, 2024 3:10pm - 3:35pm PDT
Festival Pavilion - Breakout Room B

Breakout Sessions

Audience Intermediate
Slides Attached Yes

3:25pm PDT

Lightning Talk: Extending PyTorch with Custom Python/C++/CUDA Operators - Richard Zou, Meta

Wednesday September 18, 2024 3:25pm - 3:35pm PDT

Festival Pavilion - Breakout Room A

In this talk, we'll go over the new recommended APIs to extend PyTorch with custom Python/C++/CUDA operators. Users have been able to extend PyTorch with custom operators for years but we have updated our guidance for creating custom operators that compose with torch.compile, autograd, and other PyTorch subsystems.

Speakers

Richard Zou

Software Engineer, Meta

I'm a software engineer at Meta working on PyTorch. I'm one of the creators of functorch, JAX-like composable function transforms for PyTorch. Nowadays I spend my time working on torch.compile, figuring out how to add infra changes to make it easier for PyTorch features like custom... Read More →

PTC 2024 Extending PyTorch with Custom Operators pdf

Wednesday September 18, 2024 3:25pm - 3:35pm PDT
Festival Pavilion - Breakout Room A

Lightning Talks

Audience Intermediate
Slides Attached Yes

3:25pm PDT

Lightning Talk: Introduction to Torch.Distributed.Pipelining - Howard Huang & Ke Wen, Meta

Wednesday September 18, 2024 3:25pm - 3:35pm PDT

Gateway Pavilion - Cowell Theater

Pipeline parallelism is a technique employed in distributed deep learning that enhances model execution by dividing the model into distinct segments, or "stages." As large language models and other memory-intensive models become more common, pipeline parallelism has grown increasingly important for several key areas: - Executing large-scale training jobs. - Enhancing performance in bandwidth-limited clusters. - Supporting large model inference. In this talk, we will introduce the `torch.distributed.pipelining` package which provides users a seamless way of applying pipeline parallelism. We will demonstrate the following features: - Splitting of model code based on simple specification. - Support for pipeline schedules, including GPipe, 1F1B, Interleaved 1F1B and Looped BFS, and providing the infrastructure for writing customized schedules. - Composability with other PyTorch parallel techniques such as data parallel (DDP, FSDP) or tensor parallel. - Out of the box integration with Hugging Face models for efficient inference.

Speakers

Howard Huang

Software Engineer, Meta

Howard Huang is a software engineer at Meta. He has been working on PyTorch and the PyTorch distributed team for the past 4 years.

Ke Wen

Software Engineer, Meta

Ke Wen is a software engineering at Meta. He works on PyTorch Distributed features, including pipeline parallelism, distributed inference, and graph-based analysis.

Introduction to torch.distributed.pipelining pdf

Introduction to torch.distributed.pipelining pptx

Wednesday September 18, 2024 3:25pm - 3:35pm PDT
Gateway Pavilion - Cowell Theater

Lightning Talks

Audience Intermediate
Slides Attached Yes

4:00pm PDT

Lightning Talk: On-Device Profiling and Debugging with ExecuTorch - Olivia Liu & Vaun Puri, Meta

Wednesday September 18, 2024 4:00pm - 4:10pm PDT

Gateway Pavilion - Cowell Theater

High developer velocity is crucial to shipping new ML-enabled experiences from a server-trained model to a customers’ device. ExecuTorch is an on-device runtime that seamlessly integrates with the PyTorch stack with a focus on developer productivity. We present the ExecuTorch Dev Tools and highlight key features that tighten the iteration loop when optimizing models for deployment and execution on edge devices. We demonstrate how ExecuTorch’s built-in profiler and bundled tools tackle key pain-points, such as: 1. Examining the memory footprint of an ExecuTorch program ahead-of-time; 2. Collecting runtime performance metrics and intermediate outputs for accuracy analysis; 3. Correlating runtime data with the underlying graph of an exported model.

Speakers

Olivia Liu

Software Engineer, Meta

Olivia has been worked on PyTorch at Meta for over 2 years, focusing on on-device inference and building out profiling and debugging tools for model developers.

Varun Puri

On Device Profiling and Debugging with ExecuTorch PyTorch Conference 2024 Olivia Liu, Varun Puri pdf

Wednesday September 18, 2024 4:00pm - 4:10pm PDT
Gateway Pavilion - Cowell Theater

Lightning Talks

Audience Intermediate

5:30pm PDT

Poster Presentations

Wednesday September 18, 2024 5:30pm - 8:30pm PDT

Gateway Pavilion - Sponsor Showcase

Purge the GIL: Improved Torch.DataLoader - Michal Szolucha & Rostan Tabet, NVIDIA
XFormers - Daniel Haziza, Meta AI
TritonCC: AOT Triton Workflow for TorchScript C++ Runtime - Sijia Chen & Huamin Li, Meta
The PyTorch 2.0 Inference Story - Angela Yi, Bin Bao, Sheng Qin & Sherlock Huang, Meta
Tensor Subclasses with PT2 - Brian Hirsh, Meta
Streamlining PyTorch Eager Mode Support on New Hardware Backends Through Torch.Compile - Eikan Wang, Intel
Sparsifying Vision Transformers with Minimal Accuracy Loss - Jesse Cai, Meta
Real-Time Art Creation: Stable Diffusion Fine-Tuning Techniques on Gaudi with PyTorch - Alex Sin & Louie Tsai, Intel Corporation
Quantization via AI Edge Torch - Pauline Sho, Google LLC
PyTorch Korea User Group: Introduction & Encourage - Junghwan Park, PyTorch Korea User Group & Hyoyoung Chang, Freelancer
PyTorch + MAX + Mojo - Nick Kreeger & Jack Clayton, Modular
PT2 Torch.Compile and CPython - William Wen, Meta
PT2 Cold and Warm Compile Time Improvements in Torch.Compile - Oguz Ulgen & Animesh Jain, Meta
Pre-Train Llama3 Models Using Meta's Torchtitan on Amazon SageMaker - Less Wright, Meta & Roy Allela, AWS
Optimizing Memory and Compilation with While_loop - Manfei Bai, Google
Non-Linear Quantization Functions for Machine Learning Models - Diogo Emanuel da Costa Venâncio, INESC-ID
Nested Tensors for Ragged Data Handling - Joel Schlosser, Meta
`Torch.Tensor.Module_load` and Tensor Subclass Serialization - Mikayla Gawarecki, Meta Platforms
Accelerating Generative AI on Ubiquitous CPU Instances with Native PyTorch - Mingfei Ma, Intel
Addressing Reverse Kinematics Challenges and Geometric Optimization in Robotics with PyTorch - Blair Birdsell, PhD. Student at University of Alberta
Blazingly Fast LLM Inference with Native PyTorch: Update from the Past Year - Yanbo Liang & Horace He, Meta
Boosting in-Browser ML: Accelerate PyTorch Generative Models for the Web - Emma Ning & Kshama Pawar, Microsoft; Joshua Lochner, Hugging Face
Democratizing AI, One Byte at a Time: The Bitsandbytes Open-Source Saga, Ft. FSDP+QLoRA Fine-Tuning - Titus von Koeller, Hugging Face
Depyf: A Tool to Help Write Code in a Torch.Compile-Friendly Way Through Decompilation - Kaichao You, Tsinghua University/UC Berkeley
Exploiting on-Chip AI Accelerator for High-Performance LLM Inference - Hiroshi Inoue & Tabari Alexander, IBM Research - Tokyo
ExecuTorch Android and IOS on-Device Demo Poster - Hansong Zhang, Meta
Fault Tolerance for Large Scale Training - Tristan Rice & Chirag Pandya, Meta
FP8 State of the Art Inference Performance with Pytorch - Chih-Chieh Yang & Adnan Hoque, IBM; Antoni Viros i Martin, IBM Research
From FSDP to DeepSpeed and Back Again - Yu Chin Fabian Lim, IBM Research, Singapore
Large Scale Transformer Model Training with PyTorch Tensor Parallel API - Tianyu Liu, Meta
Model Explorer - Visualizing Pytorch Models - Na Li & Eric Yang, Google
PT-D Zero Overhead Checkpointing - Lucas Pasqualin, Meta / PyTorch; Chien-Chin Huang & Iris Zhang, Meta
PyTorch Performance Debugging in N-Dimensional Parallelism - Wei Sun & Sreen Tallam, Meta
Unlock Up to 5x Faster Inference in PyTorch: Recent Innovations in Torch-TensorRT - Laikh Tewari, NVIDIA
Torch-Monitor: A Comprehensive Call Path Profiling Tool for PyTorch - Qidong Zhao, North Carolina State University & Hao Wu, George Mason University

Speakers

Yanbo Liang

software engineer, Meta

I'm software engineer at PyTorch team working on torch.compile and LLM.

Titus Von Koeller

ML engineer / lead maintainer bitsandbytes, Hugging Face

Titus, lead maintainer of the independent non-profit bitsandbytes (sponsored by Hugging Face), works on co-engineering the democratization of AI and in his free time cherishes electronic music, queer culture and ski mountaineering. With degrees in Psychology and Computer Science... Read More →

Angela Yi

Software Engineer, Meta

I've been working on the PyTorch Compilers team for the past 2 years, mainly working on torch.export!

Animesh Jain

Software Engineer, Meta

Animesh Jain works on PyTorch compilers.

Antoni Viros i Martin

Research Scientist, IBM Research

Antoni is currently a Research Scientist at IBM Research, investigating optimization approaches for ML inference and training, with a focus on open-source technologies such as PyTorch. He holds a PhD in Aerospace Engineering from Texas A&M University, and has previously worked at... Read More →

Bin Bao

Software Engineer, Meta

Bin Bao is a software engineer working with the PyTorch Compiler team at Meta. He focuses on developing AOTInductor, an Ahead-of-Time compiler for the PyTorch2 export path.

Daniel Haziza

Research Engineer, Meta AI

Daniel is a Research Engineer working at FAIR Paris on workloads efficiency, and developing the xFormers library

Diogo Venâncio

Researcher, University of Lisbon | INESC-ID

My name is Diogo and I am a Master's student at IST in Lisbon, Portugal and also a ML Engineer at an early stage AI startup. I grew up in the suburbs of Lisbon and always strived to have a positive impact on the lives of others. At the age of 20, I built my own company, called OutGoing... Read More →

Eikan Wang

AI Frameworks Engineer, Intel

Eikan is a staff engineer from Intel and a DL framework tech lead having full-stack experience in DL, from various AI applications to framework, library, and DL compiler. He is actively optimizing on torch.compile stack for Intel platforms, including optimizing Inductor C++/OpenMP... Read More →

Emma Ning

Principal PM, Microsoft

Emma Ning is a Principal PM in the Microsoft AI Framework team, focusing on AI model operationalization and acceleration with ONNX Runtime/Olive for open and interoperable AI. She has more than five years of product experience in search engines taking advantage of machine learning... Read More →

Iris Zhang

Software Engineer, Meta

PyTorch Distributed @ Meta

Junghwan Park

Lead maintainer @ PyTorch Korea User Group, PyTorch Korea User Group

- Data engineer at telecommunication company in Korea - Lead maintainer at PyTorch Korea User Group - Interested in open-source, community and time-series forecasting

Kshama Pawar

Principal Program Manager, Microsoft Corporation

Kshama Pawar is a Program Manager on the AI Platform team at Microsoft. She helps drive Training initiatives for both large language models and on-device training through optimization engines like ONNX Runtime. She is also involved in the Triton community effort to improve developer... Read More →

Laikh Tewari

Deep Learning Software Product Manager, NVIDIA

Laikh Tewari manages products for inference in deep learning frameworks at NVIDIA and focuses on the usability of performance optimization tools across data center, consumer, and embedded segments. Laikh received his B.S. and M.S. in computer science from Stanford University where... Read More →

Mingfei Ma

Senior Software Engineer, Intel

Mingfei Ma is a senior deep learning software engineer in Intel. He is also the maintainer of CPU performance module in PyTorch. Mingfei holds a Master degree from Harbin Institute of Technology where he majored in Control Science and Technology. Mingfei has a 12 years’ experience... Read More →

Chien-Chin Huang

Software Engineer, Meta

Software Engineer, PyTorch Distributed, Meta

Mikayla Gawarecki

Software Engineer, Meta Platforms

Software Engineer at Meta on PyTorch Core Team

Baihan Huang

Software Engineer, Meta

Working on PyTorch

KaiChao YOU

Ph.D. student, Tsinghua University/UC Berkeley

Kaichao You is a four-th year Ph.D. student from Tsinghua University. He is currently visiting UC Berkeley, working on the vLLM project, a high-throughput and memory-efficient inference and serving engine for LLMs. He is an open-source contributor to PyTorch/Triton, and he leads the... Read More →

Brian Hirsh

Software Engineer, Meta

Brian is a software engineer at Meta working on PyTorch core and compilers.

Jesse Cai

Software Engineer, Meta

Pauline Sho

Software Engineer, Google

Software engineering at Google LLC currently focused on improving the quantization infrastructure for edge devices.

Alex Sin

AI Software Solutions Engineer, Intel

Louie Tsai

AI SW Engineer, Intel

Horace He

Software Engineer, Meta

To be filled

Adnan Hoque

Research Engineer, IBM

I am a Research Engineer at IBM. I have a Bachelor of Science degree in Electrical Engineering from the University of Alberta. I have worked on machine learning applications in various domains such as computer vision, network security and most recently have been developing kernels... Read More →

Blair Birdsell

Data Scientist, Surespan Construction

Blair Birdsell has a MASc in Civil Engineering from the University of Victoria. This background integrates his design and engineering expertise with data science. Over 9 years, Blair has contributed to 4.86 million sq. ft. of building projects and now develops data-driven software... Read More →

Chih-Chieh Yang

Research Scientist, IBM

Performance optimization of AI workloads

Chirag Pandya

Software Engineer, Meta

Chirag is backend engineer who's worked for over 20 years in the Software industry. His expertise includes Networks/Storage/Security and Distributed Systems with emphasis on building fast, secure and performant systems.

Hansong Zhang

Software Engineer, Meta Platforms

Software Engineer at Meta. Worked on integrating ExecuTorch framework into Android apps with Java and JNI library.

Hiroshi Inoue

Research Staff Member, IBM Research - Tokyo

Hiroshi Inoue is a research staff member at IBM Research - Tokyo, where he works on performance optimization of system software. He has a PhD from the University of Tokyo.

Huamin Li

Software Engineer, Meta

Software engineer from Meta PyTorch, focusing on GPU and CPU inference for Meta internal workloads

Hyoyoung Chang

Lead maintainer, PyTorch Korea User Group

Data Engineer

Jack Clayton

AI Developer Advocate, Modular

Jack started his career optimizing autonomous truck software for leading mining companies, including BHP and Caterpillar. Most recently he was designing computer vision software, putting AI inference pipelines into production for IDVerse. He is passionate about the developer community... Read More →

Joel Schlosser

Software Engineer, Meta

Engineer with a decade's worth of ML experience across the research, industry, and framework perspectives.

Joshua Lochner

Machine Learning Engineer, Hugging Face

Bringing the power of machine learning to the web. Currently working on Transformers.js (@huggingface 🤗)

Less Wright

PyTorch Partner Engineer, Meta

PyTorch Distributed and Cuda/Triton kernels

Lucas Pasqualin

ML Engineer, PyTorch (Meta)

Lucas has been developing Machine Learning Applications and Machine Learning infrastructure at scale for years, and has recently been focused on extending the product offering of PyTorch's Distributed Checkpointing stack.

Manfei Bai

Software Engineer, Google LLC

Manfei Bai is a software engineer at Google.

Michał Szołucha

Deep Learning Software Engineer, NVIDIA

During his work at NVIDIA, Michał gained vast experience in Deep Learning Software Development. He tackled challenges in training and inference, ranging from small-scale to large-scale applications, as well as user-facing tasks and highly-optimized benchmarks like MLPerf. Micha... Read More →

Na Li

Software Engineer, Google

Tech Lead Manager at Google Cloud, leading on-device ML developer tools.

Nick Kreeger

Frameworks Engineering Director, Modular

Software Engineering lead with over 15 years of experience working at Google, Microsoft and a handful of startups. Nick has contributed to many technologies in Machine Learning such as TensorFlow.js, TensorFlow Lite/Micro, and ONNX/ONNXRuntime. Nick enjoys spending his free time with... Read More →

Oguz Ulgen

Software Engineer, Meta

I'm a software engineer at Meta where I used to work on the Hack programming language and now work on PyTorch.

Rostan TABET

Software Engineer, NVIDIA

I am a Computer Science student with a passion for Python and deep learning. During my end-of-studies internship, I focused on leveraging free-threaded Python in the context of NVIDIA's deep learning libraries suite. My work aims to improve data handling efficiency in machine learning... Read More →

Roy Allela

Sr AI/ML Specialist Architect, AWS

Roy Allela is a Senior AI/ML Specialist Architect at AWS.Roy helps customers-from small startups to large enterprises-train and deploy large language models efficiently on AWS. He previously spent 8 years at Intel as a Senior AI Software Engineer working on low-level ML framework... Read More →

Sheng Qin

Software Engineer, Meta Inc.

Sheng Qin is a software engineer of PyTorch Accelerator Enablement org at Meta

Sijia Chen

Software Engineer, Meta / PyTorch

Sijia is a software engineer in Meta PyTorch Acceleration team, focusing on GPU inference area

Tianyu Liu

Research Scientist, Meta

Tianyu Liu is a Research Scientist on the PyTorch team at Meta, currently working on distributed training. Prior to this, he was a postdoc at Stanford University and has worked on the Ads Core Machine Learning team at Meta. He obtained his PhD degree at the University of Wisconsin--Madison... Read More →

Tristan Rice

Software Engineer, Meta

Software engineer working on PyTorch Distributed and large scale training.

Wei Sun

Research Scientist, Meta Platform

Wei Sun supports the Meta AI Infrastructure organization. He brings deep expertise in analyzing ML model execution during training and serving and identifies efficiency/performance bottlenecks across model and system architecture. This has led him to build some of the most comprehensive... Read More →

William Wen

Software Engineer, Meta Platforms, Inc.

William works on the torch.compile team, specializing in TorchDynamo.

Yu Chin Fabian Lim

Research Staff Member, IBM Research, Singapore

Fabian Lim is currently in IBM Research, Singapore. During 2013 - 2016, he worked in Avago Technologies (now Broadcom), then SK Hynix Memory Systems, in San Jose, CA. From 2010-2013, he was a postdoc at the Massachusetts Institute of Technology, Cambridge, MA. Dr Lim received the... Read More →

Tabari Alexander

STSM, IBM Z AI and Analytics, IBM

Eric Yang

Software Engineer, Google

Sreen Tallam

Software Engineering Manager - AI Performance & Efficiency, Meta

I am a SW Engineering Manager at Meta helping all ML Training & Serving models (RecSys, Content Understanding, GenAI) run optimally and efficiently through various optimization techniques, including scaling them across the entire Meta fleet.

Qidong Zhao

PHD Student, North Carolina State University

Research Interest:Profiling techniques for different workloads and architectures.

Hao Wu

PhD, George Mason University

I am interested in deep learning profiler

PTC 2024 Tensor Subclasses with PT2 pdf

Nested Tensors for Ragged Data Handling.pptx pdf

Purge the GIL Improved Torch.DataLoader NVIDIA pdf

Modular Torch Compile and high perf Mojo CPU GPU code pdf

Wednesday September 18, 2024 5:30pm - 8:30pm PDT
Gateway Pavilion - Sponsor Showcase

Poster Presentations

Audience Intermediate
Slides Attached Yes

10:50am PDT

Lightning Talk: d-Matrix LLM Compression Flow Based on Torch.Fx: Simplifying PTQ/QAT - Zifei Xu & Tristan Webb, d-Matrix Corporation

Thursday September 19, 2024 10:50am - 11:00am PDT

Festival Pavilion - Breakout Room A

We introduce dmx-compressor, d-Matrix's open-source LLM compression toolkit that is modular, robust, efficient, and user-friendly. It utilizes symbolic tracing and fx.Transformer for network compression while keeping the model a first-class citizen in PyTorch for the user, despite prevalent graph dynamism in LLMs. It achieves this by maintaining both the original nn.Module and a just-in-time (JIT) traced and transformed fx.GraphModule representation behind the scenes, in conjunction with an abstraction that cleanly decouples network compression from the original model graph definition. This design allows the FXIR to dynamically adapt to diverse forward call signatures and flow-control arguments throughout quantization-aware training and post-training quantization written in plain PyTorch, yielding a compressed FXIR fully compatible with application-level APIs like the Hugging Face pipeline. We also provide a graph visualizer based on fx.Interpreter for ease of debugging. We believe this project shall empower the community to build efficient LLMs for deployment on custom hardware accelerators and contribute to the PyTorch ecosystem.

Speakers

Zifei Xu

Senior Machine Learning Research Engineer, d-Matrix Corporation

Zifei is a Senior Machine Learning Research Engineer at d-Matrix. Her current work focuses on developing model quantization pipelines and efficient quantization algorithms. She graduated from Stanford University with a Master's degree in Computational & Mathematical Engineering and... Read More →

Tristan Webb

ML Engineer, d-Matrix

Tristan's background is primarily in computer science and mathematics, and which let him to graduate with a PhD in Complexity Science at the University of Warwick, where he worked with large computational neuroscience models of spiking neural networks using simulators written in C... Read More →

dmx compressor pdf

dmx Compressor Pytorch Conference pptx

Thursday September 19, 2024 10:50am - 11:00am PDT
Festival Pavilion - Breakout Room A

Lightning Talks

Audience Intermediate
Slides Attached Yes

10:50am PDT

The Rise of `Transformers` in the Growing PyTorch Ecosystem - Arthur Zucker, Hugging Face

Thursday September 19, 2024 10:50am - 11:15am PDT

Festival Pavilion - Breakout Room B

Explore how the `tranformers` library grows and adapts to the fast paced and ever-changing AI field to bring the best to the AI community

Speakers

Arthur Zucker

Core Maintainer, Hugging Face

Arthur is a Core maintainer at Hugging Face, maintaining several critical libraries such as transformers and tokenizers. He is the owner of the text and LLM parts of Hugging Face's open-source toolkits, resulting in the implementations of LLaMa, Mistral, MoEs, etc and torch.compile... Read More →

Thursday September 19, 2024 10:50am - 11:15am PDT
Festival Pavilion - Breakout Room B

Breakout Sessions

Audience Intermediate

11:05am PDT

Lightning Talk: LLMs on Edge with AI Accelerators - Chen Lai, Kimish Patel & Cemal Bilgin, Meta

Thursday September 19, 2024 11:05am - 11:15am PDT

Festival Pavilion - Breakout Room A

LLMs are known to be compute heavy and consume lots of resources (almost all resources on phones), including memory and power. A natural thought is to leverage the AI hardware accelerators, for example, Apple Neural Engine (ANE) on Apple devices and HTP on Qualcomm SoCs, to make it run fast and efficiently. Only by optimizing the model latency, memory consumption and power usage to a certain level will users be interested in installing the models on their devices. In this session, we’d like to introduce how we leverage these AI accelerators within the PyTorch ecosystem to achieve the state-of-art performance for llama3 on device, via ExecuTorch and the partnership with Apple and Qualcomm. Hardware companies usually have their own AI accelerators. Likely they have different characteristics, one may support a list of different operators than others, and one may only support static shapes (like HTP). However, transformers-based optimization can be generic. We’ll discuss in more detail how we apply the generic optimization as well as the backend specific optimization. The techniques we applied here are not just for LLMs, but can be applied to other transformer-based models.

Speakers

Chen Lai

Software Engineer, Meta

Software engineers focusing on bringing up accelerators on devices

CEMAL Bilgin

Engineering Manager, Meta

Engineering Manager PyTorch Edge Acceleration

Kimish Patel

Software Engineer, Meta Platforms

Kimish has worked on enabling PyTorch on Meta's family of apps, primarily focusing on performance optimizations. His past experiences include hardware/software co-design, CPU architecture, and CPU/GPU performance optimization.

LLMs on Edge with AI Accelerators pdf

Thursday September 19, 2024 11:05am - 11:15am PDT
Festival Pavilion - Breakout Room A

Lightning Talks

Audience Intermediate
Slides Attached Yes

11:20am PDT

Sponsored Session: Torchchat: A Showcase of PyTorch LLM Ubiquity - Jack Khuu & Jesse White, Meta

Thursday September 19, 2024 11:20am - 11:45am PDT

Festival Pavilion - Breakout Room A

This talk explores the journey of enabling LLMs in the PyTorch ecosystem, as well as how the teams behind AOT Inductor, ExecuTorch, and torchao collaborated to create torchchat, a showcase of PyTorch’s ability to run LLM inference everywhere.

Torchchat demonstrates the ubiquity, simplicity, and quality of PyTorch’s LLM support through performant, reproducible implementations for not only Python environments, but on desktop, server, and on-device as-well.

All of our work is open source and available on GitHub.

Speakers

Jack Khuu

Software Engineer, Meta

Software Engineer @ Meta working on the PyTorch Edge team. TL for torchchat, which is PyTorch's showcase of LLM inference ubiquity (Python, Desktops, Mobile, etc.). More broadly, I focus on the "Experience" of PyTorch Edge, encompassing User, Developer, and Community Experience.Ex-Lecturer... Read More →

Jesse White

Software Engineering Manager, Meta

Jesse is an engineering manager at PyTorch @ Meta, where he supports the Edge Experience team in improving the experience for on-device inference and training, including mobile, laptops, and embedded devices. With nearly 20 years of experience in startups, Jesse is passionate about... Read More →

torchchat pytorch conf24 pptx

Thursday September 19, 2024 11:20am - 11:45am PDT
Festival Pavilion - Breakout Room A

Breakout Sessions

Audience Intermediate
Slides Attached Yes

11:20am PDT

Training MoEs at Scale with PyTorch - Mihir Patel & Brian Chu, Databricks

Thursday September 19, 2024 11:20am - 11:45am PDT

Festival Pavilion - Breakout Room B

Mixture-of-Experts MoE (models) are becoming an increasingly popular architecture choice for large language models (LLMs). In this talk, we describe how to train MoE models with PyTorch. After discussing various performance tradeoffs, we use PyTorch distributed tools like DTensor to build custom parallelism approaches, including expert parallelism via MegaBlocks. We then show how to get near linear scaling to thousands of GPUs, combining PyTorch FSDP and HSDP with our parallelism strategies. We discuss many of the challenges of training at scale, including communication bottlenecks, hardware failures, and networking challenges. We further improve training at scale setups using tools like PyTorch Distributed Checkpointing for rapid saving and loading. We then highlight further optimizations to minimize challenges only present at scale, such as object store failures for large checkpoints.

Speakers

Mihir Patel

Research Engineer, Databricks

Mihir Patel is a Research Engineer at MosaicML / Databricks, where he works on distributed training at scale and serves as the tech lead for Composer, an open-source deep learning training library. His primary focus is on large model training, and he has helped build several open... Read More →

Brian Chu

Research Engineer, Databricks

Brian is a Research Engineer at MosaicML / Databricks, where he contributes to Composer and Foundry, open-source libraries for training LLMs. He has been involved in the DBRX project and products like the Databricks finetuning and pretraining API. Prior to joining Databricks, Brian... Read More →

[PyTorch Conference] Training MoEs at Scale with PyTorch pdf

Thursday September 19, 2024 11:20am - 11:45am PDT
Festival Pavilion - Breakout Room B

Breakout Sessions

Audience Intermediate

11:50am PDT

Lightning Talk: Empowering Developers: Tools and Resources for Running Generative AI on Arm CPUs - Pareena Verma, Arm

Thursday September 19, 2024 11:50am - 12:00pm PDT

Festival Pavilion - Breakout Room B

As the demand for accessible and scalable AI solutions grows, leveraging CPUs for generative AI offers significant advantages in cost, energy efficiency and widespread availability. This sessions aims to equip developers with the ecosystem of tools, resources and technical content needed to effectively run generative AI use cases on Arm CPUs. We have launched a range of easily digestible tutorials for developers, part of our Learning Paths on https://learn.arm.com/, which demonstrate how you can easily and efficiently run small and large language models on Arm-based devices. Learn about end-to-end workflows to accelerate PyTorch based sentiment analysis models from Hugging Face on Arm servers with optimizations in Arm Compute Library kernels for fp32 and bfloat16. Use the new KleidiAI library to accelerate LLMs with AI frameworks and build an Android chat app on your Arm mobile device with ExecuTorch, and XNNPACK. Find out about our roadmap for learning content demonstrating the feasibility and successful deployment of generative AI on Arm-based devices. Help us shape the support that we offer developers.

Speakers

Pareena Verma

Principal Solutions Architect, Arm

Pareena is a Principal Solutions Architect at Arm. She has extensive experience working with software developers and SoC architects on numerous Arm based projects involving usage of modeling, ML frameworks, compilers, debuggers and virtual prototyping simulation tools. Pareena holds... Read More →

Thursday September 19, 2024 11:50am - 12:00pm PDT
Festival Pavilion - Breakout Room B

Lightning Talks

Audience Intermediate

11:50am PDT

Lightning Talk: New Activation Checkpointing APIs in PyTorch - Jeffrey Wan & Horace He, Meta

Thursday September 19, 2024 11:50am - 12:00pm PDT

Festival Pavilion - Breakout Room A

Activation checkpointing is a commonly used technique to reduce memory usage during model training by reducing the number of activations saved for backward. Instead of keeping tensors needed for backward alive until they are used in gradient computation during backward, those tensors are recomputed during the backward pass. This talk will introduce new activation checkpoint APIs that can help achieve a better trade off between memory savings and compute overhead that recomputing introduces.

Speakers

Horace He

Software Engineer, Meta

To be filled

Jeffrey Wan

Software Engineer, Meta

Software Engineer working on PyTorch

New Activation Checkpointing APIs in PyTorch pdf

Thursday September 19, 2024 11:50am - 12:00pm PDT
Festival Pavilion - Breakout Room A

Lightning Talks

Audience Intermediate

12:00pm PDT

Lightning Talk: Fast, Scalable Distributed Training with StreamingDataset - Saaketh Narayan, Databricks

Thursday September 19, 2024 12:00pm - 12:10pm PDT

Gateway Pavilion - Cowell Theater

StreamingDataset makes training on large datasets from cloud storage as fast, cheap, and scalable as possible. It’s specially designed for multi-node, distributed training for large models — maximizing correctness guarantees, performance, and ease of use. Key features include elastically deterministic training, instant mid-epoch resumption, effective shuffling, high training throughput, and flexible data mixing, among other features. When training with StreamingDataset, the data shards are written to cloud storage in MDS, our file format that allows for low-latency random access to samples. By being as efficient as possible with shard downloads and shuffling, StreamingDataset minimizes egress costs while ensuring that dataloading never bottlenecks model training. StreamingDataset powers training for LLMs with over 100 billion parameters like DBRX, to advanced diffusion models, to two-tower recommendation models, and more, scaling to training jobs on thousands of GPUs with ease. Join us to learn how StreamingDataset can elevate your distributed model training experience.

Speakers

Saaketh Narayan

Machine Learning Engineer, Databricks

Saaketh Narayan is a machine learning engineer at Databricks. As part of the Mosaic AI Runtime team, he works on the GenAI training stack, including dataloading, training frameworks, and performance across the Mosaic Streaming, Composer, and LLM Foundry libraries.

StreamingDataset PyTorchCon Presentation (1) pdf

Thursday September 19, 2024 12:00pm - 12:10pm PDT
Gateway Pavilion - Cowell Theater

Lightning Talks

Audience Intermediate

12:00pm PDT

Lightning Talk: FlexAttention - The Flexibility of PyTorch + The Performance of FlashAttention - Yanbo Liang & Horace He, Meta

Thursday September 19, 2024 12:00pm - 12:10pm PDT

Festival Pavilion - Breakout Room A

Introducing a novel abstraction leveraging the PyTorch compiler stack to enable custom, user-defined attention mechanisms. This new API supports dynamic modifications to attention scores within SDPA, providing both runtime and memory efficiency through kernel fusion with the FlashAttention algorithm.

Speakers

Yanbo Liang

software engineer, Meta

I'm software engineer at PyTorch team working on torch.compile and LLM.

Horace He

Software Engineer, Meta

To be filled

[PTC] FlexAttention the Flexibility of PyTorch + the Performance of FlashAttention pdf

Thursday September 19, 2024 12:00pm - 12:10pm PDT
Festival Pavilion - Breakout Room A

Lightning Talks

Audience Intermediate

12:10pm PDT

Lightning Talk: AOTriton: Ahead of Time Triton Kernel Libraries on ROCm - Jeff Daily, AMD

Thursday September 19, 2024 12:10pm - 12:20pm PDT

Festival Pavilion - Breakout Room B

Scaled dot product attention provides significant acceleration of the transformer layer through fusion of the multihead attention layer. There are several different algorithms to achieve this but tiled attention through scaled dot product attention via Flash Attention is a very popular approach. In PyTorch on the ROCm platform this is currently achieved through ahead of time compiled (AOT) Triton kernels in a linkable archive. AMD’s work to enable and package these kernels is done through AOTriton, which aims to use Triton’s compiler and GPU kernels for faster development. AOTriton maintains an optimized set of tiling sizes and other parameters to provide optimized, pre-compiled Triton kernels. The differences between JIT and AOT are few but are very important. Despite this, prototyping kernels in Triton is much faster than template-based C++ libraries. In this presentation we will go into detail on the interaction layer between PyTorch and AOTriton, the structure of AOTriton and how to add new triton kernels to AOTriton.

Speakers

Jeff Daily

Principal Member of Technical Staff, Advanced Micro Devices

Jeff Daily is the chief architect of the Machine Learning Software Engineering group supporting ML frameworks such as PyTorch and onnxruntime on AMD GPUs. He enjoys delivering open source software to answer the challenges of the rapidly-changing ML landscape. For over five years... Read More →

AOT PyConf Jeff 20240912 pdf

Thursday September 19, 2024 12:10pm - 12:20pm PDT
Festival Pavilion - Breakout Room B

Lightning Talks

Audience Intermediate

12:10pm PDT

Lightning Talk: Implementing and Using Iterable Datasets: What Could Go Wrong? - Nicolas Hug, Meta

Thursday September 19, 2024 12:10pm - 12:20pm PDT

Gateway Pavilion - Cowell Theater

PyTorch supports two kinds of datasets: Iterable datasets and indexable "map-style" datasets. Iterable datasets can be more flexible and potentially faster than their indexable cousins. They are also much harder to use correctly, and can easily lead to silently wrong results. This talk is a quick and fun intro to some of the traps that Iterable datasets lay out for you, with some tips to help you avoid them.

Speakers

Nicolas Hug

Research Engineer, Meta

Nicolas is a software engineer in the PyTorch team at Meta, where he mainly contributes to the torchvision library. Prior to that, Nicolas was a research scientist at Columbia University, where he became part of the scikit-learn core development team. Nicolas holds a PhD in machine... Read More →

What could go wrong pdf

Thursday September 19, 2024 12:10pm - 12:20pm PDT
Gateway Pavilion - Cowell Theater

Lightning Talks

Audience Intermediate
Slides Attached Yes

12:10pm PDT

Lightning Talk: Making the Most of Heterogeneous Memory Capacity Using PyTorch - Syed Ahmed, NVIDIA Corporation

Thursday September 19, 2024 12:10pm - 12:20pm PDT

Festival Pavilion - Breakout Room A

Memory intensive deep learning workloads require efficient use of all kinds of memories that are available in a system. In this session, we will discuss how we can utilize such heterogeneous memory through memory pools in PyTorch. We will show how to mix-and-match different CUDA system allocators in the same PyTorch program using memory pools. Consequently, this API unlocks new use cases such as Extended GPU Memory (EGM) based all-gathers, Unified Virtual Memory (UVM), and NVLink Sharp (NVLS) reductions. New NVIDIA architectures accelerate such use cases with high-bandwidth and low-latency interconnects in the hardware, driven by extended functionality of CUDA system allocators in the software. Learn how to use these techniques on memory-intensive deep learning models like LLMs, and discover new CUDA features powered by PyTorch.

Speakers

Syed Ahmed

Senior Software Engineer, NVIDIA

Syed Ahmed is a Senior Software Engineer on the PyTorch Core team at NVIDIA, focused on keeping PyTorch fast and numerically stable on current NVIDIA platforms, and making PyTorch more expressive on future NVIDIA platforms. He holds a Master’s degree in Electrical Engineering from... Read More →

pytorch conference 2024 mempools pdf

Thursday September 19, 2024 12:10pm - 12:20pm PDT
Festival Pavilion - Breakout Room A

Lightning Talks

Audience Intermediate

2:15pm PDT

vLLM: Easy, Fast, and Cheap LLM Serving for Everyone - Woosuk Kwon & Xiaoxuan Liu, UC Berkeley

Thursday September 19, 2024 2:15pm - 2:40pm PDT

Festival Pavilion - Breakout Room B

We will present vLLM, an open-source high-performance LLM inference engine built on top of PyTorch. Starting as a research project at UC Berkeley, vLLM has been one of the fastest and most popular LLM inference solutions in industry, reaching 20K+ stars and 350+ contributors. In this talk, we will cover how vLLM adopts various LLM inference optimizations and how it supports various AI accelerators such as AMD GPUs, Google TPUs, and AWS Inferentia. Also, we will discuss how vLLM benefits from PyTorch 2 and its ecosystem.

Speakers

Lily Liu

Student, UCB

Lily (Xiaoxuan) Liu is a PhD student at UC Berkeley, working with Professors Ion Stoica and Alvin Cheung. Her research focuses on machine learning systems, particularly optimizing latency for LLM inference and addressing memory bottlenecks in LLM systems. Her recent work explores... Read More →

Woosuk Kwon

PhD Student, UC Berkeley

Woosuk Kwon is a Ph.D. student at UC Berkeley, advised by Prof. Ion Stoica. He is interested in building practical, flexible, and high-performance software systems for emerging applications such as large language models. Recently, he has been developing vLLM, a high-performance open-source... Read More →

Thursday September 19, 2024 2:15pm - 2:40pm PDT
Festival Pavilion - Breakout Room B

Breakout Sessions

Audience Intermediate

2:45pm PDT

Lightning Talk: What's New for PyTorch Developer Infrastructure - Sahan Paliskara & Catherine Lee, Meta

Thursday September 19, 2024 2:45pm - 2:55pm PDT

Festival Pavilion - Breakout Room A

Having a chat about all of the work being done to continue supporting PyTorch's Developer Infrastructure needs including updates around Target Determination, Releases, and OSS Tooling.

Speakers

Catherine Lee

Software Engineer, META

Software engineer on the PyTorch Dev Infra team primarily working on reducing time to signal, testing infrastructure, and CI related developer tooling.

Sahan Paliskara

Software Engineer, Meta

After spending a lot of time using PyTorch to train computer vision models, Sahan joined the PyTorch team three years ago. He started off working on inference and packaging, and now he's part of the dev infra team. These days, he's involved in everything from managing releases to... Read More →

2024 PyTorch Conference What's New in DevInfra pdf

Thursday September 19, 2024 2:45pm - 2:55pm PDT
Festival Pavilion - Breakout Room A

Lightning Talks

Audience Intermediate

2:45pm PDT

Blobs to Clips: Efficient End-to-End Video Data Loading - Andrew Ho & Ahmad Sharif, Meta

Thursday September 19, 2024 2:45pm - 3:10pm PDT

Gateway Pavilion - Cowell Theater

The PyTorch team has improved training speed by an order of magnitude for teams at Meta working on Small-to-Large-Scale MultiModal Video models. In this talk we’ll share our learnings on reducing GPU starvation by overcoming data loading challenges such as dealing with large distributed datasets, worker imbalance, compute-bottlenecks due to parallel video decoding and sampling, checkpointing, and debuggability. As part of our commitment to open-source, we are releasing a new decoding library and updating existing PyTorch libraries on GitHub, and invite feedback and contributions from the community.

Speakers

Ahmad Sharif

Software Engineer, Meta

SWE in Pytorch Content Domains Past: SWE at Google in Search, Privacy, ChromeOS

Andrew Ho

Machine Learning Engineer, Meta Platforms

We are ML Engineers at Meta on PyTorch working on multi-modal LLM dataloading

Blobs to Clips Efficient End to End Video Data Loading pdf

Thursday September 19, 2024 2:45pm - 3:10pm PDT
Gateway Pavilion - Cowell Theater

Breakout Sessions

Audience Intermediate

2:45pm PDT

Torchtitan: Large-Scale LLM Training Using Native PyTorch 3D Parallelism - Wanchao Liang, Meta & Linsong Chu, IBM Research

Thursday September 19, 2024 2:45pm - 3:10pm PDT

Festival Pavilion - Breakout Room B

torchtitan is a proof-of-concept for Large-scale LLM training using native PyTorch. It is a repo that showcases PyTorch's latest distributed training features in a clean, minimal codebase. We show-cased end to end large scale training features enablement: 1. 3D/4D Parallelism 2. Efficient distributed checkpoint save/load/resharding 3. Many efficient training techniques including Float8, torch.compile, activation checkpoint, etc.

Speakers

Wanchao Liang

Software Engineer, Meta Platforms, Inc.

Software Engineer at Meta, PyTorch team Tech Lead in PyTorch Distributed training. Author of torchtitan, Tensor Parallel and DTensor, a fundamental distributed abstraction to perform distributed computation. Previously worked on the TorchScript compiler, ONNX.

LINSONG CHU

Senior Technical Staff Member, IBM Research

Linsong is a STSM at IBM Research, focusing on FSDP, torch compile and FP8 in the area of pre-training.

torchtitan Large Scale LLM Training Using Native PyTorch 3D Parallelism pdf

Thursday September 19, 2024 2:45pm - 3:10pm PDT
Festival Pavilion - Breakout Room B

Breakout Sessions

Audience Intermediate

3:00pm PDT

Lightning Talk: PyTorch Release Process - Andrey Talman, Meta

Thursday September 19, 2024 3:00pm - 3:10pm PDT

Festival Pavilion - Breakout Room A

I would like to present and quickly discuss PyTorch Release process, how it happens. What are milestones. What is our cherry-picking criteria, how we validate the release.

Speakers

Andrey Talman

Software Engineer, Meta Inc.

Software Engineer - Meta Inc. 2021-Present Part of PyTorch Dev Infra team. Working on PyTorch OSS Releases. Lead Software Engineer - Dow Jones & Company 2019-2021 Part of the team developing software and the API Services used by Dow Jones Factiva website and WSJ. Software Engineer... Read More →

PyTorch OSS Release Process pdf

Thursday September 19, 2024 3:00pm - 3:10pm PDT
Festival Pavilion - Breakout Room A

Lightning Talks

Audience Intermediate

3:15pm PDT

Slaying OOMs - Mark Saroufim & Jane Xu, Meta

Thursday September 19, 2024 3:15pm - 3:40pm PDT

Festival Pavilion - Breakout Room B

Have you ever hit an OOM (and wished you had more VRAM)? Who hasn't! Hop on the bus with us and feel the road become smoother as we talk about stacking together techniques like FSDP2 + QLoRa + CPU Offloading + Fused ADAM (thanks Intel) + more in PyTorch native. We will give an overview of these techniques as well as the hard edges we solved in their composition. Curious for more? Or...still OOMing? We also plan on discussing our more researchy work on offloading, pagedness, and low precision optimizers.

Speakers

Jane Xu

SWE, Meta

I'm Jane and I work on the PyTorch core library! Tell me your favorite optimizer, complain to me about your latest OOM, teach me about what you’re excited about.

Mark Saroufim

Software Engineer, Meta

Mark Saroufim is a PyTorch Engineer at Meta working on inference, compilers and community.

FINAL Slaying OOMs PTC 2024 pdf

Thursday September 19, 2024 3:15pm - 3:40pm PDT
Festival Pavilion - Breakout Room B

Breakout Sessions

Audience Intermediate

3:15pm PDT

Torch.Compile for Autograd, DDP and FSDP - Will Feng , Chien-Chin Huang & Simon Fan, Meta

Thursday September 19, 2024 3:15pm - 3:40pm PDT

Festival Pavilion - Breakout Room A

In this talk, we will present the latest advancements in torch.compile for distributed training via DDP and FSDP. We will first introduce Compiled Autograd, a torch.compile mode to fully capture the backpropagation step, including the communication collective operators used in distributed. We will then cover the improvements this new approach brought to Compiled DDP/FSDP, notably by removing DDP/FSDP graph breaks which brings the potential of improving compute/communication overlap.

Speakers

Chien-Chin Huang

Software Engineer, Meta

Software Engineer, PyTorch Distributed, Meta

Simon Fan

Software Engineer, Meta

I'm a software engineer on the PyTorch Compiler team, I focus on torch.compile for distributed training frameworks.

Will Feng

Software Engineer, Meta Platforms, Inc.

Will Feng is a Software Engineer in PyTorch Compiler team at Meta. He has been working in PyTorch core and ecosystem for the past 7 years. He is now working on and most excited about torch.compile for distributed training performance.

PTC 2024 Torch.Compile for Autograd, DDP and FSDP pdf

Thursday September 19, 2024 3:15pm - 3:40pm PDT
Festival Pavilion - Breakout Room A

Breakout Sessions

Audience Intermediate

4:05pm PDT

Lightning Talk: Debiasing the Data Lifecycle - Shailvi Wakhlu, Shailvi Ventures LLC

Thursday September 19, 2024 4:05pm - 4:15pm PDT

Festival Pavilion - Breakout Room A

Biased data, results in biased decision-making. Making sure that at every step of the data lifecycle, we make conscious attempts to debias the data is an important responsibility for all data scientists. In this talk, I highlight the typical data lifecycle, and how to prevent biases at every step. ---- The key takeaways from my talk include: 1) Understanding the data lifecycle 2) What are the typical ways biases creep in 3) How we can proactively prevent and fix biases in data

Speakers

Shailvi Wakhlu

Founder, Shailvi Ventures LLC

Shailvi is a seasoned Data Leader and Self-Advocacy Expert with over sixteen years of experience building technology products. She has spoken at nearly 100 global conferences and Fortune 500 events, coached close to 500 individuals, and authored the best-selling book "Self-Advocacy... Read More →

Thursday September 19, 2024 4:05pm - 4:15pm PDT
Festival Pavilion - Breakout Room A

Lightning Talks

Audience Intermediate

4:05pm PDT

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA

Thursday September 19, 2024 4:05pm - 4:30pm PDT

Festival Pavilion - Breakout Room B

Understanding how to effectively size a production grade LLM deployment requires understanding of the model(s), the compute hardware, quantization and parallelization methods, KV Cache budgets, input and output token length predictions, model adapter management and much more. - Why LLM inference is different to standard deep learning inference - Current and future NVIDIA GPU overview - which GPU(s) for which models and why - Understanding the importance of building inference engines - Deep recap on the attention mechanism along with different types of popular attention mechanisms used in production - Deep dive on KV Cache and managing KV Cache budgets - Parallelism (reducing latency) - mainly tensor parallelism, but data, sequence, pipeline, and expert parallelism will be highlighted - Quantization methods on weights, activations, and KV Cache to reduce engine sizes for more effective GPU utilization - Increasing throughput with inflight batching and other techniques - Detailed performance analysis of LLM deployments looking at Time to first token, inter-token latencies, llm deployment characterizations, and more that can help reduce deployment costs

Speakers

Mark Moyou

Sr. Data Scientist, NVIDIA

Dr. Mark Moyou Senior Data Scientist at NVIDIA working with enterprise clients on AI strategy and deploying machine learning applications to production. He is the host of the Caribbean Tech Pioneers Podcast, The AI Portfolio Podcast and is the Director of the Optimized AI Confere... Read More →

Thursday September 19, 2024 4:05pm - 4:30pm PDT
Festival Pavilion - Breakout Room B

Breakout Sessions

Audience Intermediate

4:35pm PDT

Unlocking the Enigma: Crafting Unbiased, Transparent, and Explainable Large Language Models - Rashmi Nagpal, Patchstack

Thursday September 19, 2024 4:35pm - 5:00pm PDT

Festival Pavilion - Breakout Room A

In an era where artificial intelligence reigns supreme, the statistics are both perplexing and thought-provoking – only a mere 13% of large language models manage to transcend the realms of research and enter the practical world of production. Who bears the responsibility when these models err, spewing out biased or discriminatory outputs? It's time to demystify the complex landscape of machine learning ethics and carve a path towards a brighter, more accountable future! In this talk, firstly, we will navigate the profound impacts of large language models across diverse domains, from the lifesaving advances in medicine to safeguarding our nations through enhanced security protocols. Secondly, as we marvel at data-driven decisions laid by these models, we will confront the darker shadows cast by – the looming spectre of bias in the data. Finally, we will delve deep into the art of building interpretable models and navigating the maze of ethical considerations. Through a live demonstration in PyTorch, we will witness how to craft unbiased, transparent, and explainable models.

Speakers

Rashmi Nagpal

Machine Learning Engineer, Patchstack

Rashmi, a passionate researcher at the MIT CSAIL and machine learning engineer at Patchstack, is dedicated to crafting beautiful AI applications. With nearly 5 years of industrial experience, she has brought ideas to life at pre-seed startups and contributed to impactful redesigns... Read More →

PyTorch Conference Rashmi Nagpal.pptx pdf

Thursday September 19, 2024 4:35pm - 5:00pm PDT
Festival Pavilion - Breakout Room A

Breakout Sessions

Audience Intermediate
Slides Attached Yes

5:05pm PDT

Implementing a Custom Torch.Compile Backend - A Case Study - Maanav Dalal & Yulong Wang, Microsoft

Thursday September 19, 2024 5:05pm - 5:30pm PDT

Festival Pavilion - Breakout Room B

This presentation will dive into the development of the ONNXRuntime (ORT) backend for torch.compile. We'll cover the implementation process, starting with a PyTorch 2.0 generated FX graph, highlighting the unique challenges encountered when serving ORT-specific scenarios and how we solved them. Attendees will gain insights into optimizing performance, overcoming integration hurdles, and achieving efficient execution. Whether you're a developer looking to extend PyTorch's capabilities for your own use cases, keen to learn about ONNX Runtime, or interested in backend performance optimization, and the many steps we've taken to get to where we are now, this session promises valuable takeaways and practical knowledge.

Speakers

Maanav Dalal

Program Manager, Microsoft

PM @Microsoft, working on the ONNX Exporter team. I adore learning about consumer tech and experimenting with bleeding edge software. I'm passionate about creating delightful user experiences.

Yulong Wang

Software Engineer, Microsoft

PyTorch Conference 2024 pptx

Thursday September 19, 2024 5:05pm - 5:30pm PDT
Festival Pavilion - Breakout Room B

Breakout Sessions

Audience Intermediate