Loading…
Attending this event?
September 18-19, 2024
San Francisco, California
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Pacific Daylight Time (UTC-7). To see the schedule in your preferred timezone, please select from the drop-down located at the bottom of the menu to the right.

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Wednesday, September 18
 

7:30am PDT

Registration & Badge Pick-Up
Wednesday September 18, 2024 7:30am - 6:00pm PDT
Wednesday September 18, 2024 7:30am - 6:00pm PDT
Gateway Pavilion - Foyer

9:00am PDT

Keynote: Welcome & Opening Remarks - Matt White, Executive Director, PyTorch Foundation
Wednesday September 18, 2024 9:00am - 9:10am PDT
Over the past few years, and especially since the deployment of ChatGPT in November 2022,  neural language models with billions of parameters and trained on trillions of words are powering the fastest-growing computing applications in history and generating discussion and debate across society. However, AI scientists cannot study or improve those state-of-the-art models because the models' parameters, training data, code, and even documentation are not openly available. In this talk, I present our OLMo project toward building strong language models and making them fully open to researchers along with open-source code for data management, training, inference, and interaction. In particular, I describe DOLMa, a 3T token open dataset curated for training language models, Tulu, our instruction-tuned language model, and OLMo v1, a fully-open 7B parameter language model trained from scratch.  
Speakers
avatar for Matt White

Matt White

Executive Director, PyTorch Foundation. GM of AI., Linux Foundation
Matt White is the Executive Director of the PyTorch Foundation and GM of AI at the Linux Foundation. He is also the Director of the Generative AI Commons, an open community initiative focused on advancing responsible generative AI under the LF AI & Data Foundation. Matt has nearly... Read More →
Wednesday September 18, 2024 9:00am - 9:10am PDT
Festival Pavilion - Keynote Room

9:12am PDT

Keynote: PyTorch Technical Deep Dive - Piotr Bialecki, NVIDIA; Peng Wu, Will Constable, Kartikay Khandelwal & Mengtao (Martin) Yuan, Meta
Wednesday September 18, 2024 9:12am - 10:12am PDT
This Deep Dive provides an update on PyTorch development since last conference and dives into the key new features coming in PyTorch 2.5 and beyond.  We will explore how advancements across a number of PyTorch features combine to better support the full model development lifecycle across training, fine-tuning, and deployment.
Speakers
avatar for Piotr Bialecki

Piotr Bialecki

Director of Engineering, Deep Learning Frameworks, NVIDIA
Piotr joined PyTorch team at NVIDIA in 2019 and currently manages the team.  He drives NVIDIA's effort in maintaining and advancing PyTorch's CUDA backend and received the PyTorch SUPERHERO award in 2023 for his community contributions especially in the PyTorch discussion board... Read More →
avatar for Peng Wu

Peng Wu

Engineering Manager, Meta
Dr. Peng Wu is the engineering manager of the PyTorch Compiler team at Meta.  Dr. Wu spent over a decade at IBM research, working on many aspects of programming systems.  She then founded the Programming Technologies Lab at Huawei and led its growth for six years.  At Meta, she... Read More →
avatar for Will Constable

Will Constable

engineer, meta
Will Constable works on PyTorch Distributed Algorithms and Infrastructure at Meta as an IC and Tech Lead.  Previously, he worked at Intel and Nervana Systems on different parts of the Deep Learning SW stack including Compiler Frontends, Integrations to TensorFlow and PyTorch, Distributed... Read More →
avatar for Kartikay Khandelwal

Kartikay Khandelwal

Software Engineer, PyTorch, Meta
Kartikay Khandelwal is a software engineer in the PyTorch and AI Infra team at Meta where he leads the development of the PyTorch ecosystem for Generative AI, including open-source libraries like torchtune for LLM fine-tuning and torchchat for LLM inference. Prior to PyTorch, he worked... Read More →
avatar for Mengtao (Martin) Yuan

Mengtao (Martin) Yuan

Tech Lead Manager, Meta
Mengtao (Martin) Yuan is a Tech Lead Manager in Meta’s PyTorch Edge team. With multiple years of experience in the AI industry, Mengtao is focused at building software systems to help AI researchers and engineers to deploy their models on edge devices such as mobile phones, AR/VR... Read More →
Wednesday September 18, 2024 9:12am - 10:12am PDT
Festival Pavilion - Keynote Room

10:14am PDT

Keynote: Open Language Models (OLMo): Accelerating the Science of Language Modeling - Hanna Hajishirzi, Senior Director NLP Research, Allen Institute for AI
Wednesday September 18, 2024 10:14am - 10:29am PDT
Over the past few years, and especially since the deployment of ChatGPT in November 2022,  neural language models with billions of parameters and trained on trillions of words are powering the fastest-growing computing applications in history and generating discussion and debate across society. However, AI scientists cannot study or improve those state-of-the-art models because the models' parameters, training data, code, and even documentation are not openly available. In this talk, I present our OLMo project toward building strong language models and making them fully open to researchers along with open-source code for data management, training, inference, and interaction. In particular, I describe DOLMa, a 3T token open dataset curated for training language models, Tulu, our instruction-tuned language model, and OLMo v1, a fully-open 7B parameter language model trained from scratch.  
Speakers
avatar for Hanna Hajishirzi

Hanna Hajishirzi

Associate Professor/Senior Director of NLP, UW/AI2
Hanna Hajishirzi is the Torode Family Associate Professor in the Allen School of Computer Science and Engineering at the University of Washington and a Senior Director of NLP at AI2. She received her Ph.D in Computer Science from University of Illinois at Urbana-Champaign, and spent... Read More →
Wednesday September 18, 2024 10:14am - 10:29am PDT
Festival Pavilion - Keynote Room

10:30am PDT

Keynote: Enabling Generative AI on the Edge - Cormac Brick, Principal Engineer, Google
Wednesday September 18, 2024 10:30am - 10:45am PDT
Generative AI is no longer just in the cloud - recently it's also getting deployed on edge devices. A disruptive goal of this work is AI-powered applications that respond instantly, work offline, and protect user privacy by processing data locally. In this talk, we'll explore the cutting edge of edge-based generative AI, showcasing open models that are pushing the boundaries of what's possible today on the edge. We'll dive deep into the PyTorch ecosystem, looking at projects that are making it easier than ever to author, optimize, and deploy these models across a wide range of devices.
Speakers
avatar for Cormac Brick

Cormac Brick

Principal Engineer, Core Machine Learning Software, Google
Cormac Brick is a principal Engineer at Google working on frameworks and on device machine learning.   He has over 10 years experience in AI software, silicon and systems, with work spanning AI frameworks and ecosystems and compilers down to silicon microarchitecture.   Over that... Read More →
Wednesday September 18, 2024 10:30am - 10:45am PDT
Festival Pavilion - Keynote Room

10:45am PDT

Coffee Break
Wednesday September 18, 2024 10:45am - 11:10am PDT
Wednesday September 18, 2024 10:45am - 11:10am PDT
Gateway Pavilion - Sponsor Showcase

10:45am PDT

Sponsor Showcase
Wednesday September 18, 2024 10:45am - 8:30pm PDT
Wednesday September 18, 2024 10:45am - 8:30pm PDT
Gateway Pavilion - Sponsor Showcase

11:10am PDT

Lightning Talk: What’s New in Export? - Angela Yi, Tugsbayasgalan Manlaibaatar, Avik Chaudhuri & Yidi Wu, Meta
Wednesday September 18, 2024 11:10am - 11:20am PDT
This talk discusses updates we've made to torch.export this past year: (a) Non-strict mode, an alternative tracing mode which in practice covers more programs than TorchDynamo without compromising important soundness guarantees (b) Better dynamic shapes specifications through generating suggested fixes and runtime assertions (c) Control flow operators such as cond, map, and associative scan (d) A shift in the export generated IR, which will enable both training and inference (e) An unflattener, which will reconstruct the eager module structure from the flattened exported graph
Speakers
avatar for Yidi WU

Yidi WU

Research Scientist, Meta
I work on torch.export. Recently on front-end support of control flow operators/higher order operators.
avatar for Angela Yi

Angela Yi

Software Engineer, Meta
I've been working on the PyTorch Compilers team for the past 2 years, mainly working on torch.export!
avatar for Avik Chaudhuri

Avik Chaudhuri

Software Engineer, Meta
Creator of @flowtype. Machine learning explorer. Rusty programming language researcher. Amateur chef. Soccer dad. Website: https://avikchaudhuri.github.io/ Twitter: @__avik Blog: https://mathydad.wordpress.com/
avatar for Tugsbayasgalan Manlaibaatar

Tugsbayasgalan Manlaibaatar

Software Engineer, Meta
I am a software engineer at Meta, working on PyTorch Compilers. I mainly work on the PT2 export workstream.
Wednesday September 18, 2024 11:10am - 11:20am PDT
Festival Pavilion - Breakout Room A

11:10am PDT

Meta Llama 3 and the Future of Responsible AI Development - Spencer Whitman & Vincent Gonquet, Meta
Wednesday September 18, 2024 11:10am - 11:35am PDT
As AI models become increasingly powerful and pervasive, trust and safety have become top priorities. Join us for a timely talk on Llama 3, our latest foundation model, and the cutting-edge trust and safety models and tools we've developed to ensure responsible AI development. In this talk, we'll dive into: •The advancements of Llama 3 and its applications •Our innovative trust and safety approaches, including toxicity detection and mitigation •The open-source tools and resources we're sharing to empower the community Discover how Meta is pushing the boundaries of trust and safety and learn how you can integrate these solutions into your own projects. Let's build a safer, more responsible AI future together!
Speakers
SW

Spencer Whitman

Product Manager (AI Security), Meta
VG

Vincent Gonguet

Director, GenAI Trust & Safety, Meta
Wednesday September 18, 2024 11:10am - 11:35am PDT
Gateway Pavilion - Cowell Theater

11:10am PDT

Sponsored Session: NeMo-Aligner: A Scalable Toolkit for Model Alignment - Gerald Shen & Jimmy Zhang, NVIDIA
Wednesday September 18, 2024 11:10am - 11:35am PDT
Aligning AI models with human values and preferences is essential for making them safe and helpful. However, building an efficient and scalable toolkit for alignment can be challenging, especially when applied to state of the art foundation models with billions or trillions of parameters. NeMo-Aligner is an open-source, optimized and scalable toolkit that implements alignment algorithms such as Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), SteerLM and Self-Play Fine Tuning (SPIN). This talk will introduce NeMo-Aligner and show the steps we took to design and optimize the toolkit around various alignment algorithms. In particular, we discuss the RLHF implementation where we observe close to 7x speedup and excellent scaling performance by adding TRT-LLM integration, carefully orchestrating communication and utilizing fast training kernels. We’re able to align state-of-the-art open source models with NeMo-Aligner and hope our framework can enable the community to performantly customize, fine-tune and align foundational models at any scale.
Speakers
avatar for Gerald Shen

Gerald Shen

Engineer, NVIDIA
Gerald Shen is a member of the NVIDIA NeMo NLP Team specializing in model alignment. He leads the development of the NeMo-Aligner toolkit, a scalable toolkit to align large language models. This toolkit has been used to align models at NVIDIA with algorithms such as reinforcement... Read More →
avatar for Jimmy US

Jimmy US

Machine Learning Engineer, NVIDIA
Jimmy Zhang is a Senior Deep Learning Architect at NVIDIA. His work focuses on researching and developing the performance of deep learning frameworks, including NeMo and Megatron-LM. He completed his M.S. at UIUC where he was mentored under Professor Rakesh Kumar.
Wednesday September 18, 2024 11:10am - 11:35am PDT
Festival Pavilion - Breakout Room B

11:25am PDT

Lightning Talk: Low Precision Dtypes in PyTorch - Vasiliy Kuznetsov, Meta
Wednesday September 18, 2024 11:25am - 11:35am PDT
This talk deep dives into the new native PyTorch float8 training library, and previews PyTorch's strategy for supporting upcoming low precision dtypes such as float6, float4 and MX for efficient training and inference.
Speakers
avatar for Vasiliy Kuznetsov

Vasiliy Kuznetsov

software engineer, Meta
Software Engineer, PyTorch Core
Wednesday September 18, 2024 11:25am - 11:35am PDT
Festival Pavilion - Breakout Room A

11:40am PDT

Lightning Talk: HieroGlyph2Text: A PyTorch-Powered Pipeline for Automated Egyptian Hieroglyph Translation from Image - Susi Gentsch, University of Bonn
Wednesday September 18, 2024 11:40am - 11:50am PDT
HieroGlyph2Text is an innovative PyTorch-powered pipeline that automates the detection, classification, and attempts translation of Egyptian hieroglyphs from large image inputs. It addresses the challenge of decoding and translating ancient hieroglyphic inscriptions, traditionally a time-consuming and specialized task. This pipeline leverages PyTorch to create custom models: 1. Object Detection: YOLOv8 accurately detects individual hieroglyphs within images. 2. Image Classification: A custom ResNet model built using PyTorch achieves state-of-the-art accuracy in assigning Gardiner Codes to hieroglyphs. 3. Translation: The classified Gardiner Codes outputs from the ResNet model are integrated with Llama3, a large language model (LLM), using Retrieval-Augmented Generation (RAG) and a custom dataset based upon Gardiner Codes and their respective description and ideogram. Key highlights include accurate hieroglyph detection and state-of-the-art classification performance through an optimized ResNet model. This pipeline lays the groundwork for collaboration with subject matter experts to refine the translation process and democratize access to ancient Egyptian hieroglyphic knowledge.
Speakers
avatar for Susi Gentsch

Susi Gentsch

Student, University of Bonn
Driven by applying deep learning to real-world challenges, Susi is a Computer Science student finishing her degree at the University of Bonn. Her projects include teaching a robot to autonomously detect and collect trash using YOLOv5 and ROS, and adapting YOLOv5 to identify archaeological... Read More →
Wednesday September 18, 2024 11:40am - 11:50am PDT
Festival Pavilion - Breakout Room B

11:40am PDT

Building Scientific Computing Infrastructure Software with the PyTorch Ecosystem - Bharath Ramsundar, Deep Forest Sciences
Wednesday September 18, 2024 11:40am - 12:05pm PDT
The DeepChem library is a scientific computing library that implements deep learning infrastructure for drug discovery, materials discovery, and biology. The DeepChem community is one of the largest scientific open source projects built in PyTorch, with over 5K stars on Github and thousands of citations. The DeepChem community has learned a number of useful lessons for building and maintaining high quality scientific code built on top of PyTorch. In this talk, I will share our learnings with the PyTorch community and also highlight opportunities for improving scientific support in the ecosystem.
Speakers
avatar for Bharath Ramsundar

Bharath Ramsundar

CEO, Deep Forest Sciences
Bharath received a BA and BS from UC Berkeley in EECS and Mathematics and was valedictorian of his class in mathematics. He received his PhD in computer science from Stanford where he founded the DeepChem project. Bharath is founder and CEO of Deep Forest Sciences, a startup building... Read More →
Wednesday September 18, 2024 11:40am - 12:05pm PDT
Gateway Pavilion - Cowell Theater

11:40am PDT

ExecuTorch Beta and on-Device Generative AI Support - Mergen Nachin & Mengtao (Martin) Yuan, Meta
Wednesday September 18, 2024 11:40am - 12:05pm PDT
During this session, we will discuss real-life case studies focusing on the productionization of PyTorch models onto edge devices and welcome the community to begin adopting ExecuTorch. Since announcing the ExecuTorch MVP at the previous PTC, we have made significant progress in terms of stability, model coverage, accelerator performance, and developer experience, reaching a milestone that marks the transition to beta status. In addition to the above improvements, we continue to support generative AI models. Since the alpha launch that initially enabled support for LLama2/3 models, we have now expanded our capabilities to include multimodal use cases and developed mobile demo apps showcasing these new features.
Speakers
avatar for Mengtao (Martin) Yuan

Mengtao (Martin) Yuan

Tech Lead Manager, Meta
Mengtao (Martin) Yuan is a Tech Lead Manager in Meta’s PyTorch Edge team. With multiple years of experience in the AI industry, Mengtao is focused at building software systems to help AI researchers and engineers to deploy their models on edge devices such as mobile phones, AR/VR... Read More →
avatar for Mergen Nachin

Mergen Nachin

Software Engineer, Meta
Mergen Nachin is a Software Engineer specializing in creating rich AI experiences on low latency, high performance, and privacy-aware embedded systems. With a background in distributed systems, developer infrastructure, remote sensing, and localization, he brings a versatile skill... Read More →
Wednesday September 18, 2024 11:40am - 12:05pm PDT
Festival Pavilion - Breakout Room A

11:55am PDT

Lightning Talk: Mobile Computational Photography with PyTorch: Low-Light Denoising - Alexis Baudron, Sony
Wednesday September 18, 2024 11:55am - 12:05pm PDT
Over the last decade, smartphone cameras have improved significantly, becoming the primary device people use for capturing everyday moments and high-quality photographs. This progress is largely due to advances in computational photography and novel image sensors. Computational photography enables great images from compact mobile cameras, enhancing photos through various techniques such as multi-shot merging. Despite these advancements, challenges such as noise, artifacts, and distortions persist, especially in low-light conditions where limited light increases noise levels. In this lightning talk, we will explore how PyTorch can be used to design and optimize deep learning networks for real-time low-light denoising. We will dive into noise modeling, data generation, physics-aware models, and advanced network architectures for effective denoising in challenging low-light scenarios. Attendees will gain practical insights into the latest advancements in mobile computational photography using PyTorch.
Speakers
avatar for Alexis Baudron

Alexis Baudron

Senior AI Researcher, Sony
Alexis Baudron is a Senior AI Researcher at Sony, where his team specializes in building AI models to tackle complex computer vision challenges. His background is in computational photography, developing advanced techniques for image enhancement and artifact removal. Alexis earned... Read More →
Wednesday September 18, 2024 11:55am - 12:05pm PDT
Festival Pavilion - Breakout Room B

12:05pm PDT

Lunch (Provided Onsite for All Attendees)
Wednesday September 18, 2024 12:05pm - 1:15pm PDT
Wednesday September 18, 2024 12:05pm - 1:15pm PDT
Gateway Pavilion - Sponsor Showcase

12:10pm PDT

Women and Non-Binary in PyTorch Lunch - Sponsored by Google
Wednesday September 18, 2024 12:10pm - 1:10pm PDT
Join us for the 2024 Women and Non-Binary in PyTorch Lunch! This event is dedicated to celebrating and supporting women and non-binary individuals in the PyTorch community. Enjoy lunch, connect with peers, and engage in meaningful conversations about advancing diversity and inclusion in the field. It’s a wonderful opportunity to network, share experiences, and inspire each other.

*We will do our best to accommodate all interested attendees, but please note that participation is on a first-come, first-served basis.

Sponsored by Google.



Wednesday September 18, 2024 12:10pm - 1:10pm PDT
Gateway Pavilion - Gallery 2

1:15pm PDT

Sponsored Keynote: The Lightning AI OSS Stack for Accelerating the AI Lifecycle - Luca Antiga, CTO, Lightning AI
Wednesday September 18, 2024 1:15pm - 1:20pm PDT
We introduce the Lightning AI open source stack, a high-performance stack for training, fine-tuning, and deploying AI systems that augments the PyTorch ecosystem.

Today PyTorch Lightning powers training workloads across the industry, from small-scale research to large-scale training endeavors. The package has reached 130M total downloads in June 2024, 2x since early 2023. PyTorch Lightning 2.4 features support for 2D parallelism via DTensors, first introduced in PyTorch 2.3.

The open source stack is completed by Fabric (lightweight building blocks for scaling training workloads), LitGPT (library for pre-training, fine-tuning, serving LLMs), LitData (parallel data processing and streaming data loading), LitServe (lightweight, high-performance serving framework), TorchMetrics (de-facto standard in deep learning metrics), and the recently released Thunder compiler. Together, these packages provide a low-friction, high-performance stack to democratize and accelerate the AI lifecycle.

The stack is optimized to run on Lightning Studios, a PyTorch native, fully integrated AI development environment on the cloud.
Speakers
avatar for Luca Antiga

Luca Antiga

CTO, Lightning AI
CTO @ Lightning AI, Founder (Orobix, Tensorwerk), early PyTorch core contributor, Manning Author (Deep Learning with PyTorch). PhD in Bioengineering.
Wednesday September 18, 2024 1:15pm - 1:20pm PDT
Festival Pavilion - Keynote Room

1:20pm PDT

Sponsored Keynote: Enabling AI Everywhere with PyTorch and Intel - Kismat Singh,VP of Engineering for AI Frameworks, Intel
Wednesday September 18, 2024 1:20pm - 1:25pm PDT
Unlocking the availability of and access to generative AI technologies has great societal value. In this keynote, Kismat Singh will present how open software built on industry-standard frameworks such as PyTorch, and ubiquitous hardware from Intel that forms a large part of the current installed base across edge, PC and cloud are keys to democratizing AI and allowing new solutions to be implemented across industries ranging from healthcare, telecommunication, industrial and more. Kismat will share his thoughts on how software acceleration, flexibility and security are important factors in deploying AI applications in production and what he sees as challenges with those projects. He will also discuss Open Platform for Enterprise AI (OPEA), a new Linux Foundation AI and Data project that gives developers access to open source, standardized, modular, and heterogenous retrieval-augmented generation (RAG) pipelines that they can use for their enterprise-grade Generative AI deployments. Lastly, he will share some exciting Intel contributed features recently upstreamed into PyTorch. He will end the keynote by stating what he believes to be the future of AI and the part each of us will play in it!
Speakers
avatar for Kismat Singh

Kismat Singh

VP, Software, Intel Corporation
Kismat Singh is the VP of Engineering for AI Frameworks at Intel. He brings over two decades of AI experience and has also worked at companies such as Nvidia, AMD, HP and Stream Processors Inc. Kismat  has made significant contributions to industry leading deep learning libraries... Read More →
Wednesday September 18, 2024 1:20pm - 1:25pm PDT
Festival Pavilion - Keynote Room

1:30pm PDT

Sponsored Keynote: From Containers to Cognition: Conducting the AI Orchestra - Taylor Dolezal, Head of Ecosystem, The Linux Foundation (CNCF)
Wednesday September 18, 2024 1:30pm - 1:35pm PDT
Let's explore the powerful harmony created when the CNCF and PyTorch communities join forces. This keynote highlights how the collaboration between cloud native experts and AI innovators is orchestrating a new era of technological symphonies. We'll touch on critical initiatives and shared victories that demonstrate the strength of this partnership. To illustrate the creative potential of this alliance, we'll briefly showcase a demo of how containerized workloads can produce unexpected melodies. Join us for this exploration of community-driven innovation, where containers and cognition come together to compose the future of technology.
Speakers
avatar for Taylor Dolezal

Taylor Dolezal

Head of Ecosystem, The Linux Foundation (CNCF)
I navigate the cloud native universe with a knack for puns and a keen eye for psychology. Living in the heart of LA, I blend tech innovation with mental insights, one punny cloud at a time. I am an avid reader, thinker, and cloud whisperer.
Wednesday September 18, 2024 1:30pm - 1:35pm PDT
Festival Pavilion - Keynote Room

1:35pm PDT

Keynote Panel Discussion: Responsible AI - Kate Rooney, CNBC; Kush Varshney, IBM T. J. Watson Research Center; Sara Hooker, C4AI; Aleksander Madry, OpenAI; and Rishi Bommasani, Stanford University
Wednesday September 18, 2024 1:35pm - 2:05pm PDT
Moderators
avatar for Kate Rooney

Kate Rooney

Technology Reporter, CNBC
Kate Rooney is a technology reporter based out of CNBC’s San Francisco bureau, covering Amazon, financial technology, payments and venture capital for the network. She also writes for CNBC’s digital platforms.Rooney won a National Headliner Award for her Celsius coverage in 2023... Read More →
Speakers
avatar for Sara Hooker

Sara Hooker

Head of Cohere For AI, Cohere For AI
Sara Hooker leads Cohere For AI, the dedicated research arm of Cohere. Cohere For AI seeks to solve complex machine learning problems and supports fundamental research that explores the unknown. With a long track-record of impactful research at Google Brain, Sara brings a wealth of... Read More →
avatar for Kush Varshney

Kush Varshney

IBM Fellow, IBM Research
Kush R. Varshney is an IBM Fellow based at the IBM T. J. Watson Research Center where he is responsible for leading innovations in AI governance. He and his team developed the well-known open-source toolkits AI Fairness 360, AI Explainability 360, and Uncertainty Quantification 360... Read More →
avatar for Aleksander Mądry

Aleksander Mądry

Member of Technical Staff, OpenAI
Aleksander Mądry is a Member of Technical Staff at OpenAI. Aleksander is also a Professor of Computing at MIT (currently on leave), where he has been serving as the Director of the MIT Center for Deployable Machine Learning and a Faculty Co-Lead of the MIT AI Policy Forum.
avatar for Rishi Bommasani

Rishi Bommasani

Society Lead, Stanford Center for Research on Foundation Models
I am the Society Lead at the Stanford Center for Research on Foundation Models (CRFM). I am completing my PhD at Stanford Computer Science, advised by Percy Liang and Dan Jurafsky. Funding: Lieberman Fellowship (active)NSF Graduate Research Fellowship (completed).Prior to St... Read More →
Wednesday September 18, 2024 1:35pm - 2:05pm PDT
Festival Pavilion - Keynote Room

2:10pm PDT

Maximizing Training Throughput Using Torch.Compile and FSDP - Linsong Chu & Antoni Viros i Martin, IBM Research; Brian Vaughan, IBM
Wednesday September 18, 2024 2:10pm - 2:35pm PDT
torch.compile is a graph compilation technique that improves GPU utilization. A key challenge in getting torch.compile to perform well is to minimize (or eliminate) graph breaks, however, this isn't trivial as even the Llama implementation provided by Meta has many graph breaks resulting in reduced training throughput. In this talk we discuss 1. how we addressed these challenges in order to train a model using torch.compile 2. how we combined torch.compile with FSDP and selective activation checkpointing to achieve the maximum throughput for training 3. model quality comparison between models trained with compile and no-compile, and lastly 4. the best setup we have for different model sizes in the Llama family that achieves the maximum throughput and MFU number (e.g. 68% MFU for the 7B model on A100 GPUs!)
Speakers
avatar for Antoni Viros i Martin

Antoni Viros i Martin

Research Scientist, IBM Research
Antoni is currently a Research Scientist at IBM Research, investigating optimization approaches for ML inference and training, with a focus on open-source technologies such as PyTorch. He holds a PhD in Aerospace Engineering from Texas A&M University, and has previously worked at... Read More →
avatar for LINSONG CHU

LINSONG CHU

Senior Technical Staff Member, IBM Research
Linsong is a STSM at IBM Research, focusing on FSDP, torch compile and FP8 in the area of pre-training.
avatar for Brian Vaughan

Brian Vaughan

Senior Technical Staff Member, IBM
An STSM at IBM focusing on foundation models.
Wednesday September 18, 2024 2:10pm - 2:35pm PDT
Festival Pavilion - Breakout Room B

2:10pm PDT

State of PyTorch - Ji Li & Damien Sereni, Meta
Wednesday September 18, 2024 2:10pm - 2:35pm PDT
This talk gives a run through of who builds PyTorch, new and upcoming improvements to the framework and how to get involved. All thanks to our awesome community of contributors, partners and ecosystem tools.
Speakers
JL

Ji Li

Data Scientist, Meta
DS

Damien Sereni

Engineering director, Meta
Wednesday September 18, 2024 2:10pm - 2:35pm PDT
Festival Pavilion - Breakout Room A

2:10pm PDT

The Impact and Challenges of Open Source Generative Datasets and Models - Aaron Gokaslan, Cornell University
Wednesday September 18, 2024 2:10pm - 2:35pm PDT
Open source generative models like OpenGPT2, BLOOM, and others have been pivotal in advancing AI technology. These models leverage extensive text data to achieve advanced linguistic capabilities. However, the trend towards proprietary tools and closed large language models is growing, posing unique challenges in open-source AI development. This discussion will explore the intricacies of training such models, the hurdles in dataset management, and the regulation of open-source contributions. We'll explore how to effectively iterate on collected data, prepare for extensive training sessions, and coordinate research across large open-source organizations. We will discuss the challenges of generative models in three different modalities: text, image, and genomics. The talk will draw from the speaker’s personal experience on working on OpenWebText, OpenGPT2, BLOOM, CommonCanvas, Caduceus, and other generative models. We will also cover the changing AI environment and how the future of open souce is threatened by onerous regulation, ever increasing compute costs, and the commoditization of previously open data.
Speakers
avatar for Aaron Gokaslan

Aaron Gokaslan

PhD Student, Cornell University
Aaron Gokaslan has worked on many popular generative models and datasets such as OpenWebText, CommonCanvas, BLOOM, DBRX, and Caduceus, collectively downloaded millions of times. His work on open source has earned him a Community Contributor Award at PyTorch Con and recognition from... Read More →
Wednesday September 18, 2024 2:10pm - 2:35pm PDT
Gateway Pavilion - Cowell Theater

2:40pm PDT

Lightning Talk: Beyond Zero: Eliminating Vulnerabilities in PyTorch Container Images - Patrick Smyth, Dan Fernandez & Srishti Hegde, Chainguard
Wednesday September 18, 2024 2:40pm - 2:50pm PDT
Container images are increasingly the future of production applications at scale, providing reproducibility, robustness, and transparency. As PyTorch images get deployed to production, however, security becomes a major concern. PyTorch has a large attack surface, and building secure PyTorch images can be a challenge. Currently, the official PyTorch runtime container image has 30 CVEs (known vulnerabilities) rated critical and 256 CVE rated high. Improving this situation could secure many deployments that incorporate PyTorch for cloud-based inference or training. In this fast-paced session, we'll take a deep dive on the official PyTorch image from a vulnerability mitigation perspective, looking hard at included packages, executables, and active CVE. We'll identify low-hanging fruit for increasing security, including stripping bloat and building fresh. We'll also talk about the next level of security practiced in Chainguard's PyTorch image builds, such as including SBOMs and going distroless. Finally, we'll consider emerging tools and approaches for analyzing AI artifacts such as models and how these systems can benefit PyTorch in production.
Speakers
avatar for Dan Fernandez

Dan Fernandez

Staff Product Manager, Chainguard
Dan is a Management Information Systems graduate from Florida's FIU and recently completed his Master of Cybersecurity at the Georgia Institute of Technology. He is currently focusing on securing the software supply chain at Chainguard. In his free time, he enjoys writing about analytics... Read More →
avatar for Patrick Smyth

Patrick Smyth

Staff Developer Relations Engineer, Chainguard
Dr. Patrick Smyth is Staff Developer Relations Engineer at Chainguard, where he shows developers how to deploy AI and other applications with 0 CVEs using Chainguard Images. Patrick has a PhD in the digital humanities and in a previous life led technical bootcamps for researchers... Read More →
avatar for Srishti Hegde

Srishti Hegde

Software Engineer, Chainguard
Wednesday September 18, 2024 2:40pm - 2:50pm PDT
Gateway Pavilion - Cowell Theater

2:40pm PDT

Running State-of-Art Gen AI Models on-Device with NPU Acceleration - Felix Baum, Qualcomm
Wednesday September 18, 2024 2:40pm - 3:05pm PDT
Since the boom of generative AI, the industry is now moving towards on-device AI inferencing, as it is not only a trend but a necessity now in order to save costs, achieve the best inference performance, ultra-low latency at the lowest power possible. In this session we go over the new features added on the Qualcomm AI Stack and how it works with the public release of ExecuTorch 1.0. We will discuss how to run traditional workloads as well as GenAI use cases including the latest version of Llama on the Mobile device while using Qualcomm Hexagon NPU.
Speakers
avatar for Felix Baum

Felix Baum

Senior Director of Product Management, Qualcomm
Felix Baum has an extensive background of over two decades in the embedded industry, where he has excelled both as an embedded developer and a product manager. Currently he is responsible for AI Software Products at Qualcomm. Prior to that, he led efforts for various real-time operating... Read More →
Wednesday September 18, 2024 2:40pm - 3:05pm PDT
Festival Pavilion - Breakout Room B

2:40pm PDT

Sponsored Session: Accelerating AI Innovation: High Performance PyTorch at AMD - Robert Suderman & Ian Norden, AMD
Wednesday September 18, 2024 2:40pm - 3:05pm PDT
Explore the powerful collaboration between AMD and PyTorch, driving advancements in AI and machine learning. Learn how AMD’s Day-0 PyTorch support delivers cutting-edge performance and seamless compatibility.

This session will highlight the technical synergies that make AMD hardware ideal choice for PyTorch frameworks, with real-world examples of accelerated workflows and breakthrough AI applications. Attendees will gain insights into how this dynamic partnership is enabling researchers, developers, and data scientists to push the boundaries of innovation and achieve unprecedented results in AI projects.

Speakers
avatar for Robert Suderman

Robert Suderman

Engineering Manager, AMD
Rob Suderman manages front-end support with AMD’s SHARK AI group with a goal of pushing tier one support for as many ML compute languages as possible. This has included core work on Torch-mlir, JAX, TOSA, and StableHLO, including being a founding team member on the IREE project... Read More →
avatar for Ian Norden

Ian Norden

Manager Software Development, AMD
Ian Norden is a manager within AMD’s AIG-Sharks group where he spearheads machine learning model development for IREE’s compiler consumption to enable AI workloads to efficiently run across AMD’s hardware portfolio. He has been working in the AI compiler space for the past four... Read More →
Wednesday September 18, 2024 2:40pm - 3:05pm PDT
Festival Pavilion - Breakout Room A

3:10pm PDT

Lightning Talk: A Whirlwind Tour of PyTorch Extension Points - Alban Desmaison, Meta
Wednesday September 18, 2024 3:10pm - 3:20pm PDT
Journey across the PyTorch stack and see all the extension points that exist from nn.Module to the c++ Dispatcher through autograd and subclasses. This sessions will cover example use cases and when each one should be used while pointing to reference for in-depth details.
Speakers
avatar for Alban Desmaison

Alban Desmaison

Research Engineer, Meta
Alban has been working on PyTorch since nearly its inception, first during his PhD at the University of Oxford and now at Meta. He is focused on maintaining core components, designing a wide breadth of features and fostering the PyTorch Community.
Wednesday September 18, 2024 3:10pm - 3:20pm PDT
Festival Pavilion - Breakout Room A

3:10pm PDT

Lightning Talk: PyTorch/XLA Auto-Sharding - Yeounoh Chung, Google
Wednesday September 18, 2024 3:10pm - 3:20pm PDT
PyTorch/XLA recently launched the new PyTorch/XLA SPMD feature as a first-step to automate ML workloads parallelization using GSPMD. It turns out that the performance largely depends on the quality of sharding hints provided by the user – and it requires a correct and deep understanding of model architectures and much expertise to come up with optimal sharding hints. To address this problem, we propose to integrate PyTorch/XLA SPMD with XLA's auto sharding service that allows the XLA compiler to shard and optimize the whole model without any user input.
Speakers
avatar for Yeounoh Chung

Yeounoh Chung

Software Engineer, Google
SystemsResearch@Google
Wednesday September 18, 2024 3:10pm - 3:20pm PDT
Gateway Pavilion - Cowell Theater

3:10pm PDT

TorchInductor CPU Backend Advancements: New Features and Performance Improvements - Jiong Gong & Leslie Fang, Intel
Wednesday September 18, 2024 3:10pm - 3:35pm PDT
This presentation provides an update on the latest advancements in the TorchInductor CPU backend since the last conference to bring best-in-class CPU performance for broad DL workloads. We will discuss new features and performance enhancements, including: • Max-autotune support with codegen for GEMMs, boosting performance for GEMM-related operations • Enhanced vectorized codegen support, now covering all data types beyond floating points with flexible vector factors, and optimized loop scheduling • Comprehensive quantization support, including weight-only-quantization (WoQ), and optimizations for dynamic quantization and quantization-aware training • Improved Attention support, featuring attention masks and optimizating SoftMax via flash attention v2 etc. • AOTInductor support, enabling high-performance inference with frozen weights • Native Windows support, with improved vectorization capabilities These advancements, combined with ongoing optimizations, have resulted in significant performance improvements since PyTorch 2.1, demonstrated through extensive benchmarks and large language models (LLMs).
Speakers
avatar for Leslie Fang

Leslie Fang

Software Engineer, Intel
Leslie is a software engineer from Intel who works on PyTorch performance optimization on X86 servers for the past 4 years. Currently, he is mainly focusing on the feature domain of Quantization, Autocast, and Inductor CPP/OpenMP backend in Stock PyTorch.
avatar for Jiong Gong

Jiong Gong

Principle Engineer, Intel
Jiong is a software architect from Intel who works on PyTorch framework optimizations. He is the PyTorch module maintainer for CPU and compiler.
Wednesday September 18, 2024 3:10pm - 3:35pm PDT
Festival Pavilion - Breakout Room B

3:25pm PDT

Lightning Talk: Extending PyTorch with Custom Python/C++/CUDA Operators - Richard Zou, Meta
Wednesday September 18, 2024 3:25pm - 3:35pm PDT
In this talk, we'll go over the new recommended APIs to extend PyTorch with custom Python/C++/CUDA operators. Users have been able to extend PyTorch with custom operators for years but we have updated our guidance for creating custom operators that compose with torch.compile, autograd, and other PyTorch subsystems.
Speakers
avatar for Richard Zou

Richard Zou

Software Engineer, Meta
I'm a software engineer at Meta working on PyTorch. I'm one of the creators of functorch, JAX-like composable function transforms for PyTorch. Nowadays I spend my time working on torch.compile, figuring out how to add infra changes to make it easier for PyTorch features like custom... Read More →
Wednesday September 18, 2024 3:25pm - 3:35pm PDT
Festival Pavilion - Breakout Room A

3:25pm PDT

Lightning Talk: Introduction to Torch.Distributed.Pipelining - Howard Huang & Ke Wen, Meta
Wednesday September 18, 2024 3:25pm - 3:35pm PDT
Pipeline parallelism is a technique employed in distributed deep learning that enhances model execution by dividing the model into distinct segments, or "stages." As large language models and other memory-intensive models become more common, pipeline parallelism has grown increasingly important for several key areas: - Executing large-scale training jobs. - Enhancing performance in bandwidth-limited clusters. - Supporting large model inference. In this talk, we will introduce the `torch.distributed.pipelining` package which provides users a seamless way of applying pipeline parallelism. We will demonstrate the following features: - Splitting of model code based on simple specification. - Support for pipeline schedules, including GPipe, 1F1B, Interleaved 1F1B and Looped BFS, and providing the infrastructure for writing customized schedules. - Composability with other PyTorch parallel techniques such as data parallel (DDP, FSDP) or tensor parallel. - Out of the box integration with Hugging Face models for efficient inference.
Speakers
avatar for Howard Huang

Howard Huang

Software Engineer, Meta
Howard Huang is a software engineer at Meta. He has been working on PyTorch and the PyTorch distributed team for the past 4 years.
avatar for Ke Wen

Ke Wen

Software Engineer, Meta
Ke Wen is a software engineering at Meta. He works on PyTorch Distributed features, including pipeline parallelism, distributed inference, and graph-based analysis.
Wednesday September 18, 2024 3:25pm - 3:35pm PDT
Gateway Pavilion - Cowell Theater

3:35pm PDT

Coffee Break
Wednesday September 18, 2024 3:35pm - 4:00pm PDT
Wednesday September 18, 2024 3:35pm - 4:00pm PDT
Gateway Pavilion - Sponsor Showcase

3:45pm PDT

Sponsor Scavenger Hunt Raffle Drawing
Wednesday September 18, 2024 3:45pm - 4:00pm PDT
Grab your scavenger hunt card at registration, visit all our awesome sponsors, and you'll be in the running to win some fantastic prizes!
Wednesday September 18, 2024 3:45pm - 4:00pm PDT
Gateway Pavilion - Sponsor Showcase

4:00pm PDT

Welcome to the PyTorch Ecosystem for LLM Fine-tuning Mini Summit - Kartikay Khandelwal, Meta
Wednesday September 18, 2024 4:00pm - 4:05pm PDT
As open-source LLMs have become more capable, a substantial ecosystem has developed around the fine-tuning of these models. A thriving community of researchers, developers, practitioners and hobbyists has emerged which focuses on topics ranging from memory efficiency, parameter-efficient fine-tuning and quantization to performance at scale and reproducible evaluations. The goal of this mini-summit is to bring this community together to discuss ideas, share knowledge and build connections.

The agenda features a keynote from Joe Spisak on the state of the Llama ecosystem followed by invited talks from the founders of Axolotl, Unsloth and torchtune. We conclude the summit with a riveting discussion on what’s next for LLMs, fine-tuning and the PyTorch ecosystem with a fabulous panel of experts - Tim Dettmers (author of bitsandbytes and QLoRA), Hailey Schoelkopf (maintainer of LM Eval Harness at EleutherAI), Aakanksha Chowdhery (Lead author on PaLM and Gemini) and Alexis Conneau (Research Lead at OpenAI)
Speakers
avatar for Kartikay Khandelwal

Kartikay Khandelwal

Software Engineer, PyTorch, Meta
Kartikay Khandelwal is a software engineer in the PyTorch and AI Infra team at Meta where he leads the development of the PyTorch ecosystem for Generative AI, including open-source libraries like torchtune for LLM fine-tuning and torchchat for LLM inference. Prior to PyTorch, he worked... Read More →
Wednesday September 18, 2024 4:00pm - 4:05pm PDT
Festival Pavilion - Breakout Room A

4:00pm PDT

[HALIDE] A Halide Backend for TorchInductor - Jason Ansel, Meta
Wednesday September 18, 2024 4:00pm - 4:10pm PDT
This talk will focus on a new Halide backend for TorchInductor, which is in addition to the existing Triton and C++ backends.  The Halide backend is meant to serve as a reference backend to make it easier to extend TorchInductor to support new backend compilers and hardware devices.  Halide has been the inspiration (either in ideas or through forking) of numerous other compiler projects, so it is a good starting point for adding new backends that follow a Halide-like model.
Speakers
JA

Jason Ansel

Research Scientist, Meta
Jason Ansel is a Research Scientist at Meta AI and a technical lead for PyTorch compilers. He started the TorchDynamo and TorchInductor projects, which bring flexible graph capture and a high performance compiler to PyTorch 2. He received a Ph.D. from MIT CSAIL in 2014 with research... Read More →
Wednesday September 18, 2024 4:00pm - 4:10pm PDT
Festival Pavilion - Breakout Room B

4:00pm PDT

Lightning Talk: Debiasing the Data Lifecycle - Shailvi Wakhlu, Shailvi Ventures LLC
Wednesday September 18, 2024 4:00pm - 4:10pm PDT
Biased data, results in biased decision-making. Making sure that at every step of the data lifecycle, we make conscious attempts to debias the data is an important responsibility for all data scientists. In this talk, I highlight the typical data lifecycle, and how to prevent biases at every step. ---- The key takeaways from my talk include: 1) Understanding the data lifecycle 2) What are the typical ways biases creep in 3) How we can proactively prevent and fix biases in data
Speakers
avatar for Shailvi Wakhlu

Shailvi Wakhlu

Founder, Shailvi Ventures LLC
Shailvi is a seasoned Data Leader and Self-Advocacy Expert with over sixteen years of experience building technology products. She has spoken at nearly 100 global conferences and Fortune 500 events, coached close to 500 individuals, and authored the best-selling book "Self-Advocacy... Read More →
Wednesday September 18, 2024 4:00pm - 4:10pm PDT
Gateway Pavilion - Cowell Theater

4:05pm PDT

The State of the Llama Ecosystem - Joe Spisak, Meta
Wednesday September 18, 2024 4:05pm - 4:15pm PDT
As open-source LLMs have become more capable, a substantial ecosystem has developed around the fine-tuning of these models. A thriving community of researchers, developers, practitioners and hobbyists has emerged which focuses on topics ranging from memory efficiency, parameter-efficient fine-tuning and quantization to performance at scale and reproducible evaluations. The goal of this mini-summit is to bring this community together to discuss ideas, share knowledge and build connections.

The agenda features a keynote from Joe Spisak on the state of the Llama ecosystem followed by invited talks from the founders of Axolotl, Unsloth and torchtune. We conclude the summit with a riveting discussion on what’s next for LLMs, fine-tuning and the PyTorch ecosystem with a fabulous panel of experts - Tim Dettmers (author of bitsandbytes and QLoRA), Hailey Schoelkopf (maintainer of LM Eval Harness at EleutherAI), Aakanksha Chowdhery (Lead author on PaLM and Gemini) and Alexis Conneau (Research Lead at OpenAI)
Speakers
JS

Joe Spisak

Product Director, Meta Inc.
Joe Spisak is Product Director and Head of Open Source in Meta’s Generative AI organization. A veteran of the AI space with over 10 years experience, Joe led product teams at Meta/Facebook, Google and Amazon where he focused on open source AI, open science and building developer... Read More →
Wednesday September 18, 2024 4:05pm - 4:15pm PDT
Festival Pavilion - Breakout Room A

4:10pm PDT

[MLIR] Enabling Composition of Kernels and Compilers - Jacques Pienaar, Google
Wednesday September 18, 2024 4:10pm - 4:20pm PDT
Hand written kernels and compilers have been part of the toolbox to provide efficient and broad coverage. These approaches have often been positioned as being at odds with one another - and indeed the software solutions either side have sometimes made it such. MLIR, since inception, aimed to enable general, beneficial composition instead. Rather than treating kernels as a black box escape hatch, treat it as a peer in solving the serving needs. This is not magic and requires consideration of how best to combine. In this talk I'll present the approach and effect of this both in IREE and OpenXLA.
Speakers
avatar for Jacques Pienaar

Jacques Pienaar

SWE, Google
Jacques Pienaar is a lead of the ML Compiler Systems Research team at Google Deepmind. In this role he focuses on accelerating and simplifying machine learning for high-performance model deployment across various architectures. He is one of the founders of MLIR, a founding member... Read More →
Wednesday September 18, 2024 4:10pm - 4:20pm PDT
Festival Pavilion - Breakout Room B

4:15pm PDT

The Challenges of Building an Opinionated Open Source LLM Framework - Wing Lian, Axolotl AI
Wednesday September 18, 2024 4:15pm - 4:25pm PDT
As open-source LLMs have become more capable, a substantial ecosystem has developed around the fine-tuning of these models. A thriving community of researchers, developers, practitioners and hobbyists has emerged which focuses on topics ranging from memory efficiency, parameter-efficient fine-tuning and quantization to performance at scale and reproducible evaluations. The goal of this mini-summit is to bring this community together to discuss ideas, share knowledge and build connections.

The agenda features a keynote from Joe Spisak on the state of the Llama ecosystem followed by invited talks from the founders of Axolotl, Unsloth and torchtune. We conclude the summit with a riveting discussion on what’s next for LLMs, fine-tuning and the PyTorch ecosystem with a fabulous panel of experts - Tim Dettmers (author of bitsandbytes and QLoRA), Hailey Schoelkopf (maintainer of LM Eval Harness at EleutherAI), Aakanksha Chowdhery (Lead author on PaLM and Gemini) and Alexis Conneau (Research Lead at OpenAI)
Speakers
WL

Wing Lian

Maintainer, Axolotl AI
Wing is the maintainer of Axolotl, focusing on improving the developer experience for finetuning.
Wednesday September 18, 2024 4:15pm - 4:25pm PDT
Festival Pavilion - Breakout Room A

4:20pm PDT

[TRITON] Maximizing Kernel Development Productivity under Performance Constraints - Philip Tillet, OpenAI
Wednesday September 18, 2024 4:20pm - 4:30pm PDT
Machine Learning research workflows are often bottlenecked by the development of compute kernels for new algorithms and GPU architectures. This process can be daunting, and often requires a careful trade-off between productivity and performance. In this talk, we will discuss how Triton -- a mid-level programming language for kernel development -- approaches this multi-objective optimization problem, and the design decisions that were made to that effect.
Speakers
PT

Phil Tillet

Member Of Technical Staff, OpenAI
Phil first began working with GPUs in 2011 as a contributor to the ViennaCL library. He then received his B.S. from Telecom SudParis (France) in 2012, his M.S. from NCTU (Taiwan) in 2014, and his Ph.D. from Harvard University in 2020. He joined OpenAI full time in 2020 to pursue his... Read More →
Wednesday September 18, 2024 4:20pm - 4:30pm PDT
Festival Pavilion - Breakout Room B

4:25pm PDT

Hacks to Make LLM Training Faster - Daniel Han, Unsloth AI
Wednesday September 18, 2024 4:25pm - 4:35pm PDT
As open-source LLMs have become more capable, a substantial ecosystem has developed around the fine-tuning of these models. A thriving community of researchers, developers, practitioners and hobbyists has emerged which focuses on topics ranging from memory efficiency, parameter-efficient fine-tuning and quantization to performance at scale and reproducible evaluations. The goal of this mini-summit is to bring this community together to discuss ideas, share knowledge and build connections.

The agenda features a keynote from Joe Spisak on the state of the Llama ecosystem followed by invited talks from the founders of Axolotl, Unsloth and torchtune. We conclude the summit with a riveting discussion on what’s next for LLMs, fine-tuning and the PyTorch ecosystem with a fabulous panel of experts - Tim Dettmers (author of bitsandbytes and QLoRA), Hailey Schoelkopf (maintainer of LM Eval Harness at EleutherAI), Aakanksha Chowdhery (Lead author on PaLM and Gemini) and Alexis Conneau (Research Lead at OpenAI)
Speakers
DH

Daniel Han

Cofounder, Unsloth
I'm the algos guy behind Unsloth which makes finetuning 2x faster and use 70% less VRAM! I helped fix 8 bugs in Gemma, 3 bugs in Llama, a few in Mistral and Phi-3 and used to work at NVIDIA on making algos faster on GPUs! Had another OSS package Hyperlearn which was used by NASA... Read More →
Wednesday September 18, 2024 4:25pm - 4:35pm PDT
Festival Pavilion - Breakout Room A

4:30pm PDT

[TVM] Universally Deploy Large-language Models via ML Compilation - Tianqi Chen, CMU & OctoAI
Wednesday September 18, 2024 4:30pm - 4:40pm PDT
Deploying deep learning models on various devices has become an important topic. Machine learning compilation is an emerging field that leverages compiler and automatic search techniques to accelerate AI models. ML compilation brings a unique set of challenges: emerging machine learning models; increasing hardware specialization brings a diverse set of acceleration primitives; growing tension between flexibility and performance. In this talk. I then discuss our experience in bringing foundational models to a variety of devices and hardware environments through machine learning compilation.
Speakers
TC

Tianqi Chen

Assistant Professor, CMU
Tianqi Chen is currently an Assistant Professor at the Machine Learning Department and Computer Science Department of Carnegie Mellon University. He is also the Chief Technologist of OctoAI. He received his PhD. from the Paul G. Allen School of Computer Science & Engineering at the... Read More →
Wednesday September 18, 2024 4:30pm - 4:40pm PDT
Festival Pavilion - Breakout Room B

4:30pm PDT

A Distributed Stateful Dataloader for Large-Scale Pretraining - Davis Wertheimer, IBM & Linsong Chu, IBM Research
Wednesday September 18, 2024 4:30pm - 4:55pm PDT
Large-scale model pretraining crucially relies on specialized and dedicated dataloaders that can, for example, partition and stream data asynchronously across multiple processes and physical nodes. In this talk we discuss one of the torch-native dataloaders we built and use at IBM Research for addressing these needs. Intended for use in large-scale model pretraining, particularly in research settings where rapid iteration between datasets may be required, our dataloader is distributed, stateful, checkpointable, composable and rescalable – while remaining a simple extension of the existing PyTorch dataloading framework. It automatically and invisibly handles data sharding, shuffling, subdataset weighting, checkpoint saving and loading, and custom user-defined preprocessing functions, with minimal overhead and high throughput. We discuss these properties and how we achieved them, such as reducing overhead by implementing a custom LCG random number generator, and demonstrate proof of concept on production-scale training of a 7B parameter Llama model over 4 trillion tokens.
Speakers
avatar for Davis Wertheimer

Davis Wertheimer

Staff Research Scientist, IBM
Davis Wertheimer earned his Ph.D. in Computer Science at Cornell University in 2022, conducting research under Bharath Hariharan on few-shot learning and machine learning under constraints. He now researches and develops AI models for IBM, training and accelerating large language... Read More →
avatar for LINSONG CHU

LINSONG CHU

Senior Technical Staff Member, IBM Research
Linsong is a STSM at IBM Research, focusing on FSDP, torch compile and FP8 in the area of pre-training.
Wednesday September 18, 2024 4:30pm - 4:55pm PDT
Gateway Pavilion - Cowell Theater

4:35pm PDT

torchtune: Easy and Accessible Finetuning in Native PyTorch - Evan Smothers, Meta
Wednesday September 18, 2024 4:35pm - 4:45pm PDT
As open-source LLMs have become more capable, a substantial ecosystem has developed around the fine-tuning of these models. A thriving community of researchers, developers, practitioners and hobbyists has emerged which focuses on topics ranging from memory efficiency, parameter-efficient fine-tuning and quantization to performance at scale and reproducible evaluations. The goal of this mini-summit is to bring this community together to discuss ideas, share knowledge and build connections.

The agenda features a keynote from Joe Spisak on the state of the Llama ecosystem followed by invited talks from the founders of Axolotl, Unsloth and torchtune. We conclude the summit with a riveting discussion on what’s next for LLMs, fine-tuning and the PyTorch ecosystem with a fabulous panel of experts - Tim Dettmers (author of bitsandbytes and QLoRA), Hailey Schoelkopf (maintainer of LM Eval Harness at EleutherAI), Aakanksha Chowdhery (Lead author on PaLM and Gemini) and Alexis Conneau (Research Lead at OpenAI)
Speakers
avatar for Evan Smothers

Evan Smothers

Software Engineer, Meta
Evan is a software engineer on the PyTorch Domains team at Meta. He currently works on torchtune, a PyTorch library for memory-efficient fine-tuning of large language models. Prior to joining Meta, Evan worked as a data scientist at Uber and received his Ph.D. in mathematics from... Read More →
Wednesday September 18, 2024 4:35pm - 4:45pm PDT
Festival Pavilion - Breakout Room A

4:40pm PDT

[MOJO] Lifting PT to New Heights with MAX and Mojo - Mikhail Zolotukhin, Modular
Wednesday September 18, 2024 4:40pm - 4:50pm PDT
In this talk we'll peek into Modular's inference engine: how it builds on and works with PyTorch and what is unique about it. We will look into how Mojo language can be used to define performant kernels and what optimizations the inference engine can perform. We will also talk briefly about our experience of developing a third party backend for torch.compile.
Speakers
avatar for Mikhail Zolotukhin

Mikhail Zolotukhin

Software Engineering Manager, Modular
Mikhail is an open source enthusiast with contributions ranging from GCC and LLVM to PyTorch. Currently he is at Modular leading a team working on integration of Modular's inference stack with PyTorch.
Wednesday September 18, 2024 4:40pm - 4:50pm PDT
Festival Pavilion - Breakout Room B

4:45pm PDT

Panel Discussion - Tim Dettmers, AI2/Carnegie Melon; Hailey Schoelkopf, EleutherAI; Aakanksha Chowdhery, Meta; Alexis Conneau, OpenAI; Moderated by Kartikay Khandelwal, Meta
Wednesday September 18, 2024 4:45pm - 5:30pm PDT
As open-source LLMs have become more capable, a substantial ecosystem has developed around the fine-tuning of these models. A thriving community of researchers, developers, practitioners and hobbyists has emerged which focuses on topics ranging from memory efficiency, parameter-efficient fine-tuning and quantization to performance at scale and reproducible evaluations. The goal of this mini-summit is to bring this community together to discuss ideas, share knowledge and build connections.

The agenda features a keynote from Joe Spisak on the state of the Llama ecosystem followed by invited talks from the founders of Axolotl, Unsloth and torchtune. We conclude the summit with a riveting discussion on what’s next for LLMs, fine-tuning and the PyTorch ecosystem with a fabulous panel of experts - Tim Dettmers (author of bitsandbytes and QLoRA), Hailey Schoelkopf (maintainer of LM Eval Harness at EleutherAI), Aakanksha Chowdhery (Lead author on PaLM and Gemini) and Alexis Conneau (Research Lead at OpenAI)
Moderators
avatar for Kartikay Khandelwal

Kartikay Khandelwal

Software Engineer, PyTorch, Meta
Kartikay Khandelwal is a software engineer in the PyTorch and AI Infra team at Meta where he leads the development of the PyTorch ecosystem for Generative AI, including open-source libraries like torchtune for LLM fine-tuning and torchchat for LLM inference. Prior to PyTorch, he worked... Read More →
Speakers
TD

Tim Dettmers

Research Scientist & Assistant Professor, Ai2 & Carnegie Mellon University
Tim Dettmers’s is a research scientist at AI2 and an incoming assistant professor at CMU. His research focuses on making foundation models, such as ChatGPT, accessible to researchers and practitioners by reducing their resource requirements. This involves developing novel compression... Read More →
HS

Hailey Schoelkopf

EleutherAI
Hailey Schoelkopf is a Research Scientist at EleutherAI, a non-profit research lab focused on enabling open science on large-scale AI models. Her research has focused on building reproducible infrastructure for empowering open science on large-scale models, with core interests in... Read More →
AC

Aakanksha Chowdhery

Research Scientist, Meta
Aakanksha has been a lead researcher in pre-training large language models, such as PaLM and Gemini. She led the 540B PaLM model at Google and was a core member of the Gemini, Pathways, PaLM-E and MedPaLM projects. Before Google, she led interdisciplinary teams at Microsoft Research... Read More →
AC

Alexis Conneau

Member of Technical Staff, OpenAI
Alexis Conneau is a Member of Technical Staff at OpenAI in the Multimodal Frontiers team, where he has led the research for Audio+Text language modeling and the next-generation GPT4-o VoiceMode ("Her"). Prior to that, he was a research scientist at Facebook AI Research since 2015... Read More →
Wednesday September 18, 2024 4:45pm - 5:30pm PDT
Festival Pavilion - Breakout Room A

4:50pm PDT

Together Goes Brrr: Threading Research & Production with Torch Compile - Pragaash Ponnusamy, together.ai
Wednesday September 18, 2024 4:50pm - 5:00pm PDT
The deployment of large language models for inference at scale is inherently complex, often requiring intricate optimizations across compute-bound and memory-bound regimes. This talk explores how PyTorch's torch.compile has revolutionized the optimization landscape for LLM serving at Together AI. Through its sophisticated Dynamo tracer and Inductor backend, torch.compile has transformed the approach to critical performance bottlenecks in both prefill and decode phases of inference. We examine how automatic vertical fusion, epilogue optimization, and adaptive kernel generation across batch sizes for GEMV and GEMM workloads, addressing key efficiency concerns, from CUDA graph captures and optimized all-reduce strategies to custom kernel registrations. The presentation highlights Together AI's journey in leveraging torch.compile to streamline the transition from research to production, significantly simplifying the deployment process for even custom architectures. By automating many performance-critical optimizations, torch.compile has not only enhanced inference efficiency but also democratized high-performance LLM deployment. We'll conclude by sharing key lessons learned and best practices gleaned from Together AI's experience in deploying torch.compile to production, serving billions of user queries and navigating the complexities of large-scale LLM inference.
Speakers
PP

Pragaash Ponnusamy

Senior Staff AI/ML Researcher, Together AI
Wednesday September 18, 2024 4:50pm - 5:00pm PDT
Festival Pavilion - Breakout Room B

5:00pm PDT

Pushing the Performance Envelope: An Optimization Study for 3D Generative Modelling with PyTorch - Suvaditya Mukherjee & Shireen Chand, University of Southern California
Wednesday September 18, 2024 5:00pm - 5:25pm PDT
This work explores performance optimization strategies for training 3D generative models using PyTorch. We focus on training Variational Autoencoders (VAEs) on the ShapeNet dataset, a popular benchmark for this task. Our objective is to achieve high-fidelity reconstructions while minimizing the computational footprint and training time. We focus on: 1) Large-scale 3D dataset loading strategies using PyTorch & Google Cloud Storage Buckets 2) Implementation details and insights for 3D VAEs using PyTorch 2.x 3) Training using Automatic Mixed-precision regimes 4) Optimized training using torch.compile and different quantization techniques (as supported) - Dynamic Quantization - Static Quantization - Static Quantization-aware Training 5) Comparative Benchmark over several experiments performed with a focus on execution time and memory footprint Through this comprehensive study, we present a comparative analysis of the performance gains achieved by our optimized models. Our findings present empirical insights into the trade-offs between model accuracy, computational complexity, and hardware resource utilization.
Speakers
avatar for Shireen Chand

Shireen Chand

Student, University of Southern California
Shireen is a Masters student at the University of Southern California. She is majoring in Artificial Intelligence. She is also a Machine Learning Developer, a Google Summer of Code Contributor, and a Technical Writer for Medium.
avatar for Suvaditya Mukherjee

Suvaditya Mukherjee

MS AI @ USC | ML GDE, University of Southern California
Suvaditya is a Masters student at the University of Southern California, majoring in Artificial Intelligence. He is also a Google Developer Expert for Machine Learning, and an external author at PyImageSearch. He likes to work on problems related to Computer Vision, VLMs, 3D Reconstruction... Read More →
Wednesday September 18, 2024 5:00pm - 5:25pm PDT
Gateway Pavilion - Cowell Theater

5:00pm PDT

DL Compiler Panel Discussion - Philip Tillet, OpenAI; Jason Ansel, Meta; Jacques Pienaar, Google; Tianqi Chen, CMU & OctoAI; Mikhail Zolotukhin, Modular; Peng Wu, Meta
Wednesday September 18, 2024 5:00pm - 5:30pm PDT
Since the release of PyTorch 2 in 2023, torch.compile() has spurred significant new thinking around DL compiler designs at the framework level. In this session, we invite leaders in this space to share their insights based on real experiences of building DL compilers – Triton, TorchInductor, Halide, TVM, OpenXLA, and Mojo – and growing their ecosystems. We also invite a ‘compiler user representative,’ together.ai, to share their recent journey of redesigning the LLM inference stack around torch.compile(). Each leader will give a 10-minute lightning talk and an engaging panel discussion.
Speakers
avatar for Peng Wu

Peng Wu

Engineering Manager, Meta
Dr. Peng Wu is the engineering manager of the PyTorch Compiler team at Meta.  Dr. Wu spent over a decade at IBM research, working on many aspects of programming systems.  She then founded the Programming Technologies Lab at Huawei and led its growth for six years.  At Meta, she... Read More →
PT

Phil Tillet

Member Of Technical Staff, OpenAI
Phil first began working with GPUs in 2011 as a contributor to the ViennaCL library. He then received his B.S. from Telecom SudParis (France) in 2012, his M.S. from NCTU (Taiwan) in 2014, and his Ph.D. from Harvard University in 2020. He joined OpenAI full time in 2020 to pursue his... Read More →
avatar for Mikhail Zolotukhin

Mikhail Zolotukhin

Software Engineering Manager, Modular
Mikhail is an open source enthusiast with contributions ranging from GCC and LLVM to PyTorch. Currently he is at Modular leading a team working on integration of Modular's inference stack with PyTorch.
TC

Tianqi Chen

Assistant Professor, CMU
Tianqi Chen is currently an Assistant Professor at the Machine Learning Department and Computer Science Department of Carnegie Mellon University. He is also the Chief Technologist of OctoAI. He received his PhD. from the Paul G. Allen School of Computer Science & Engineering at the... Read More →
avatar for Jacques Pienaar

Jacques Pienaar

SWE, Google
Jacques Pienaar is a lead of the ML Compiler Systems Research team at Google Deepmind. In this role he focuses on accelerating and simplifying machine learning for high-performance model deployment across various architectures. He is one of the founders of MLIR, a founding member... Read More →
JA

Jason Ansel

Research Scientist, Meta
Jason Ansel is a Research Scientist at Meta AI and a technical lead for PyTorch compilers. He started the TorchDynamo and TorchInductor projects, which bring flexible graph capture and a high performance compiler to PyTorch 2. He received a Ph.D. from MIT CSAIL in 2014 with research... Read More →
Wednesday September 18, 2024 5:00pm - 5:30pm PDT
Festival Pavilion - Breakout Room B

5:30pm PDT

Attendee Welcome Reception
Wednesday September 18, 2024 5:30pm - 8:30pm PDT
Wednesday September 18, 2024 5:30pm - 8:30pm PDT
Gateway Pavilion - Sponsor Showcase

5:30pm PDT

Poster Presentations
Wednesday September 18, 2024 5:30pm - 8:30pm PDT
  • Purge the GIL: Improved Torch.DataLoader - Michal Szolucha & Rostan Tabet, NVIDIA
  • XFormers - Daniel Haziza, Meta AI 
  • TritonCC: AOT Triton Workflow for TorchScript C++ Runtime - Sijia Chen & Huamin Li, Meta
  • The PyTorch 2.0 Inference Story - Angela Yi, Bin Bao, Sheng Qin & Sherlock Huang, Meta
  • Tensor Subclasses with PT2 - Brian Hirsh, Meta
  • Streamlining PyTorch Eager Mode Support on New Hardware Backends Through Torch.Compile - Eikan Wang, Intel
  • Sparsifying Vision Transformers with Minimal Accuracy Loss - Jesse Cai, Meta
  • Real-Time Art Creation: Stable Diffusion Fine-Tuning Techniques on Gaudi with PyTorch - Alex Sin & Louie Tsai, Intel Corporation
  • Quantization via AI Edge Torch - Pauline Sho, Google LLC
  • PyTorch Korea User Group: Introduction & Encourage - Junghwan Park, PyTorch Korea User Group & Hyoyoung Chang, Freelancer
  • PyTorch + MAX + Mojo - Nick Kreeger & Jack Clayton, Modular 
  • PT2 Torch.Compile and CPython - William Wen, Meta
  • PT2 Cold and Warm Compile Time Improvements in Torch.Compile - Oguz Ulgen & Animesh Jain, Meta
  • Pre-Train Llama3 Models Using Meta's Torchtitan on Amazon SageMaker - Less Wright, Meta & Roy Allela, AWS
  • Optimizing Memory and Compilation with While_loop - Manfei Bai, Google
  • Non-Linear Quantization Functions for Machine Learning Models - Diogo Emanuel da Costa Venâncio, INESC-ID 
  • Nested Tensors for Ragged Data Handling - Joel Schlosser, Meta
  • `Torch.Tensor.Module_load` and Tensor Subclass Serialization - Mikayla Gawarecki, Meta Platforms
  • Accelerating Generative AI on Ubiquitous CPU Instances with Native PyTorch - Mingfei Ma, Intel
  • Addressing Reverse Kinematics Challenges and Geometric Optimization in Robotics with PyTorch - Blair Birdsell, PhD. Student at University of Alberta 
  • Blazingly Fast LLM Inference with Native PyTorch: Update from the Past Year - Yanbo Liang & Horace He, Meta
  • Boosting in-Browser ML: Accelerate PyTorch Generative Models for the Web - Emma Ning & Kshama Pawar, Microsoft; Joshua Lochner, Hugging Face
  • Democratizing AI, One Byte at a Time: The Bitsandbytes Open-Source Saga, Ft. FSDP+QLoRA Fine-Tuning - Titus von Koeller, Hugging Face
  • Depyf: A Tool to Help Write Code in a Torch.Compile-Friendly Way Through Decompilation - Kaichao You, Tsinghua University/UC Berkeley
  • Exploiting on-Chip AI Accelerator for High-Performance LLM Inference - Hiroshi Inoue & Tabari Alexander, IBM Research - Tokyo
  • ExecuTorch Android and IOS on-Device Demo Poster - Hansong Zhang, Meta
  • Fault Tolerance for Large Scale Training - Tristan Rice & Chirag Pandya, Meta
  • FP8 State of the Art Inference Performance with Pytorch - Chih-Chieh Yang & Adnan Hoque, IBM; Antoni Viros i Martin, IBM Research
  • From FSDP to DeepSpeed and Back Again - Yu Chin Fabian Lim, IBM Research, Singapore
  • Large Scale Transformer Model Training with PyTorch Tensor Parallel API - Tianyu Liu, Meta
  • Model Explorer - Visualizing Pytorch Models - Na Li & Eric Yang, Google
  • PT-D Zero Overhead Checkpointing - Lucas Pasqualin, Meta / PyTorch; Chien-Chin Huang & Iris Zhang, Meta
  • PyTorch Performance Debugging in N-Dimensional Parallelism - Wei Sun & Sreen Tallam, Meta
  • Unlock Up to 5x Faster Inference in PyTorch: Recent Innovations in Torch-TensorRT - Laikh Tewari, NVIDIA
  • Torch-Monitor: A Comprehensive Call Path Profiling Tool for PyTorch - Qidong Zhao, North Carolina State University & Hao Wu, George Mason University
Speakers
avatar for Hao Wu

Hao Wu

PhD, George Mason University
I am interested in deep learning profiler
avatar for Yanbo Liang

Yanbo Liang

software engineer, Meta
I'm software engineer at PyTorch team working on torch.compile and LLM.
avatar for Titus Von Koeller

Titus Von Koeller

ML engineer / lead maintainer bitsandbytes, Hugging Face
Titus, lead maintainer of the independent non-profit bitsandbytes (sponsored by Hugging Face), works on co-engineering the democratization of AI and in his free time cherishes electronic music, queer culture and ski mountaineering. With degrees in Psychology and Computer Science... Read More →
avatar for Angela Yi

Angela Yi

Software Engineer, Meta
I've been working on the PyTorch Compilers team for the past 2 years, mainly working on torch.export!
avatar for Animesh Jain

Animesh Jain

Software Engineer, Meta
Animesh Jain works on PyTorch compilers.
avatar for Antoni Viros i Martin

Antoni Viros i Martin

Research Scientist, IBM Research
Antoni is currently a Research Scientist at IBM Research, investigating optimization approaches for ML inference and training, with a focus on open-source technologies such as PyTorch. He holds a PhD in Aerospace Engineering from Texas A&M University, and has previously worked at... Read More →
avatar for Bin Bao

Bin Bao

Software Engineer, Meta
Bin Bao is a software engineer working with the PyTorch Compiler team at Meta. He focuses on developing AOTInductor, an Ahead-of-Time compiler for the PyTorch2 export path.
avatar for Daniel Haziza

Daniel Haziza

Research Engineer, Meta AI
Daniel is a Research Engineer working at FAIR Paris on workloads efficiency, and developing the xFormers library
avatar for Diogo Venâncio

Diogo Venâncio

Researcher, University of Lisbon | INESC-ID
My name is Diogo and I am a Master's student at IST in Lisbon, Portugal and also a ML Engineer at an early stage AI startup. I grew up in the suburbs of Lisbon and always strived to have a positive impact on the lives of others. At the age of 20, I built my own company, called OutGoing... Read More →
avatar for Eikan Wang

Eikan Wang

AI Frameworks Engineer, Intel
Eikan is a staff engineer from Intel and a DL framework tech lead having full-stack experience in DL, from various AI applications to framework, library, and DL compiler. He is actively optimizing on torch.compile stack for Intel platforms, including optimizing Inductor C++/OpenMP... Read More →
avatar for Emma Ning

Emma Ning

Principal PM, Microsoft
Emma Ning is a Principal PM in the Microsoft AI Framework team, focusing on AI model operationalization and acceleration with ONNX Runtime/Olive for open and interoperable AI. She has more than five years of product experience in search engines taking advantage of machine learning... Read More →
avatar for Iris Zhang

Iris Zhang

Software Engineer, Meta
PyTorch Distributed @ Meta
avatar for Junghwan Park

Junghwan Park

Lead maintainer @ PyTorch Korea User Group, PyTorch Korea User Group
- Data engineer at telecommunication company in Korea - Lead maintainer at PyTorch Korea User Group - Interested in open-source, community and time-series forecasting
avatar for Kshama Pawar

Kshama Pawar

Principal Program Manager, Microsoft Corporation
Kshama Pawar is a Program Manager on the AI Platform team at Microsoft. She helps drive Training initiatives for both large language models and on-device training through optimization engines like ONNX Runtime. She is also involved in the Triton community effort to improve developer... Read More →
avatar for Laikh Tewari

Laikh Tewari

Deep Learning Software Product Manager, NVIDIA
Laikh Tewari manages products for inference in deep learning frameworks at NVIDIA and focuses on the usability of performance optimization tools across data center, consumer, and embedded segments. Laikh received his B.S. and M.S. in computer science from Stanford University where... Read More →
avatar for Mingfei Ma

Mingfei Ma

Senior Software Engineer, Intel
Mingfei Ma is a senior deep learning software engineer in Intel. He is also the maintainer of CPU performance module in PyTorch. Mingfei holds a Master degree from Harbin Institute of Technology where he majored in Control Science and Technology. Mingfei has a 12 years’ experience... Read More →
avatar for Chien-Chin Huang

Chien-Chin Huang

Software Engineer, Meta
Software Engineer, PyTorch Distributed, Meta
avatar for Mikayla Gawarecki

Mikayla Gawarecki

Software Engineer, Meta Platforms
Software Engineer at Meta on PyTorch Core Team
avatar for Baihan Huang

Baihan Huang

Software Engineer, Meta
Working on PyTorch
avatar for KaiChao YOU

KaiChao YOU

Ph.D. student, Tsinghua University/UC Berkeley
Kaichao You is a four-th year Ph.D. student from Tsinghua University. He is currently visiting UC Berkeley, working on the vLLM project, a high-throughput and memory-efficient inference and serving engine for LLMs. He is an open-source contributor to PyTorch/Triton, and he leads the... Read More →
avatar for Brian Hirsh

Brian Hirsh

Software Engineer, Meta
Brian is a software engineer at Meta working on PyTorch core and compilers.
avatar for Jesse Cai

Jesse Cai

Software Engineer, Meta
Jesse is a software engineer on the PyTorch Core Performance team, where he works on accelerating models with sparsity. Before joining Meta, he worked at several startups, focusing on natural language processing.
avatar for Pauline Sho

Pauline Sho

Software Engineer, Google
Software engineering at Google LLC currently focused on improving the quantization infrastructure for edge devices.
AS

Alex Sin

AI Software Solutions Engineer, Intel
LT

Louie Tsai

AI SW Engineer, Intel
avatar for Horace He

Horace He

Software Engineer, Meta
To be filled
avatar for Adnan Hoque

Adnan Hoque

Research Engineer, IBM
I am a Research Engineer at IBM. I have a Bachelor of Science degree in Electrical Engineering from the University of Alberta. I have worked on machine learning applications in various domains such as computer vision, network security and most recently have been developing kernels... Read More →
avatar for Blair Birdsell

Blair Birdsell

Data Scientist, Surespan Construction
Blair Birdsell has a MASc in Civil Engineering from the University of Victoria. This background integrates his design and engineering expertise with data science. Over 9 years, Blair has contributed to 4.86 million sq. ft. of building projects and now develops data-driven software... Read More →
avatar for Chih-Chieh Yang

Chih-Chieh Yang

Research Scientist, IBM
Performance optimization of AI workloads
avatar for Chirag Pandya

Chirag Pandya

Software Engineer, Meta
Chirag is backend engineer who's worked for over 20 years in the Software industry. His expertise includes Networks/Storage/Security and Distributed Systems with emphasis on building fast, secure and performant systems.
avatar for Hansong Zhang

Hansong Zhang

Software Engineer, Meta Platforms
Software Engineer at Meta. Worked on integrating ExecuTorch framework into Android apps with Java and JNI library.
avatar for Hiroshi Inoue

Hiroshi Inoue

Research Staff Member, IBM Research - Tokyo
Hiroshi Inoue is a research staff member at IBM Research - Tokyo, where he works on performance optimization of system software. He has a PhD from the University of Tokyo.
avatar for Huamin Li

Huamin Li

Software Engineer, Meta
Software engineer from Meta PyTorch, focusing on GPU and CPU inference for Meta internal workloads
avatar for Hyoyoung Chang

Hyoyoung Chang

Lead maintainer, PyTorch Korea User Group
Data Engineer
avatar for Jack Clayton

Jack Clayton

AI Developer Advocate, Modular
Jack started his career optimizing autonomous truck software for leading mining companies, including BHP and Caterpillar. Most recently he was designing computer vision software, putting AI inference pipelines into production for IDVerse. He is passionate about the developer community... Read More →
avatar for Joel Schlosser

Joel Schlosser

Software Engineer, Meta
Engineer with a decade's worth of ML experience across the research, industry, and framework perspectives.
avatar for Joshua Lochner

Joshua Lochner

Machine Learning Engineer, Hugging Face
Bringing the power of machine learning to the web. Currently working on Transformers.js (@huggingface 🤗)
avatar for Less Wright

Less Wright

PyTorch Partner Engineer, Meta
PyTorch Distributed and Cuda/Triton kernels
avatar for Lucas Pasqualin

Lucas Pasqualin

ML Engineer, PyTorch (Meta)
Lucas has been developing Machine Learning Applications and Machine Learning infrastructure at scale for years, and has recently been focused on extending the product offering of PyTorch's Distributed Checkpointing stack.
avatar for Manfei Bai

Manfei Bai

Software Engineer, Google LLC
Manfei Bai is a software engineer at Google.
avatar for Michał Szołucha

Michał Szołucha

Deep Learning Software Engineer, NVIDIA
During his work at NVIDIA, Michał gained vast experience in Deep Learning Software Development. He tackled challenges in training and inference, ranging from small-scale to large-scale applications, as well as user-facing tasks and highly-optimized benchmarks like MLPerf. Micha... Read More →
avatar for Na Li

Na Li

Software Engineer, Google
Tech Lead Manager at Google Cloud, leading on-device ML developer tools.
avatar for Nick Kreeger

Nick Kreeger

Frameworks Engineering Director, Modular
Software Engineering lead with over 15 years of experience working at Google, Microsoft and a handful of startups. Nick has contributed to many technologies in Machine Learning such as TensorFlow.js, TensorFlow Lite/Micro, and ONNX/ONNXRuntime. Nick enjoys spending his free time with... Read More →
avatar for Oguz Ulgen

Oguz Ulgen

Software Engineer, Meta
I'm a software engineer at Meta where I used to work on the Hack programming language and now work on PyTorch.
avatar for Rostan TABET

Rostan TABET

Software Engineer, NVIDIA
I am a Computer Science student with a passion for Python and deep learning. During my end-of-studies internship, I focused on leveraging free-threaded Python in the context of NVIDIA's deep learning libraries suite. My work aims to improve data handling efficiency in machine learning... Read More →
avatar for Roy Allela

Roy Allela

Sr AI/ML Specialist Architect, AWS
Roy Allela is a Senior AI/ML Specialist Architect at AWS.Roy helps customers-from small startups to large enterprises-train and deploy large language models efficiently on AWS. He previously spent 8 years at Intel as a Senior AI Software Engineer working on low-level ML framework... Read More →
avatar for Sheng Qin

Sheng Qin

Software Engineer, Meta Inc.
Sheng Qin is a software engineer of PyTorch Accelerator Enablement org at Meta
avatar for Sijia Chen

Sijia Chen

Software Engineer, Meta / PyTorch
Sijia is a software engineer in Meta PyTorch Acceleration team, focusing on GPU inference area
avatar for Tianyu Liu

Tianyu Liu

Research Scientist, Meta
Tianyu Liu is a Research Scientist on the PyTorch team at Meta, currently working on distributed training. Prior to this, he was a postdoc at Stanford University and has worked on the Ads Core Machine Learning team at Meta. He obtained his PhD degree at the University of Wisconsin--Madison... Read More →
avatar for Tristan Rice

Tristan Rice

Software Engineer, Meta
Software engineer working on PyTorch Distributed and large scale training.
avatar for Wei Sun

Wei Sun

Research Scientist, Meta Platform
Wei Sun supports the Meta AI Infrastructure organization. He brings deep expertise in analyzing ML model execution during training and serving and identifies efficiency/performance bottlenecks across model and system architecture. This has led him to build some of the most comprehensive... Read More →
avatar for William Wen

William Wen

Software Engineer, Meta Platforms, Inc.
William works on the torch.compile team, specializing in TorchDynamo.
avatar for Yu Chin Fabian Lim

Yu Chin Fabian Lim

Research Staff Member, IBM Research, Singapore
Fabian Lim is currently in IBM Research, Singapore. During 2013 - 2016, he worked in Avago Technologies (now Broadcom), then SK Hynix Memory Systems, in San Jose, CA. From 2010-2013, he was a postdoc at the Massachusetts Institute of Technology, Cambridge, MA. Dr Lim received the... Read More →
TA

Tabari Alexander

STSM, IBM Z AI and Analytics, IBM
avatar for Eric Yang

Eric Yang

Software Engineer, Google
avatar for Sreen Tallam

Sreen Tallam

Software Engineering Manager - AI Performance & Efficiency, Meta
I am a SW Engineering Manager at Meta helping all ML Training & Serving models (RecSys, Content Understanding, GenAI) run optimally and efficiently through various optimization techniques, including scaling them across the entire Meta fleet.
avatar for Qidong Zhao

Qidong Zhao

PHD Student, North Carolina State University
Research Interest:Profiling techniques for different workloads and architectures.
Wednesday September 18, 2024 5:30pm - 8:30pm PDT
Gateway Pavilion - Sponsor Showcase
 
Thursday, September 19
 

8:30am PDT

Registration & Badge Pick-Up
Thursday September 19, 2024 8:30am - 6:00pm PDT
Thursday September 19, 2024 8:30am - 6:00pm PDT
Gateway Pavilion - Foyer

9:00am PDT

Keynote: Welcome Back & Opening Remarks
Thursday September 19, 2024 9:00am - 9:05am PDT
Thursday September 19, 2024 9:00am - 9:05am PDT
Festival Pavilion - Keynote Room

9:07am PDT

Keynote: Why You Should Think Twice Before Paying for an Evaluation Tool - Chip Huyen, VP of AI & OSS, Voltron Data
Thursday September 19, 2024 9:07am - 9:22am PDT
Open-ended evaluation is hard, and the number of evaluation tools has exploded in response to this challenge. However, if tools could solve evaluation, evaluation would have been solved by now. While the right tools can make your life easier, this talk discusses why you should think twice before outsourcing your evaluation to an external tool.
Speakers
avatar for Chip Huyen

Chip Huyen

VP of AI & OSS, Voltron Data
Chip Huyen works to accelerate data analytics on GPUs at Voltron Data. She also advises companies on building AI platforms. Previously, she was with Snorkel AI and NVIDIA, founded an AI infrastructure startup (acquired), and taught Machine Learning Systems Design at Stanford. She’s... Read More →
Thursday September 19, 2024 9:07am - 9:22am PDT
Festival Pavilion - Keynote Room

9:24am PDT

Keynote: Navigating the Architectural Timeline of LLMs - Sebastian Raschka, Staff Research Engineer, Lightning AI
Thursday September 19, 2024 9:24am - 9:39am PDT
The evolution of large language models (LLMs) from the original Generative Pre-trained Transformer (GPT) series to the recent advancements seen in models like Llama 3 has been accompanied by several architectural and methodological innovations. This talk aims to catch attendees up on the latest AI and LLM development trends, highlighting the key changes and motivations that led to the development of recent state-of-the-art LLMs, such as Llama 3.1.

Specifically, this presentation explores key developments in attention mechanisms, such as sliding window attention, group query, multi-query attention, and FlashAttention, and explains their key motivations and advantages. In addition to exploring the structural changes, this presentation also reviews the recent "tricks of the trade" that have improved the training processes and performance of the latest LLMs. This includes the recent two-step pretraining approach in Llama 3.1 and applying knowledge distillation techniques using real datasets like Gemma 2 and synthetic data, as seen in Llama 3.1.

Moreover, we will also examine the integration of system-level optimizations, such as the Mixture of the Expert method and the hybrid model Samba, which combines Mamba techniques with attention mechanisms and illustrates a broader trend toward more specialized and efficient architectures.

This talk will provide attendees with an understanding of the most notable transformations that have defined the architectural timeline of LLMs.
Speakers
avatar for Sebastian Raschka, PhD

Sebastian Raschka, PhD

Staff Research Engineer, Lightning AI
Sebastian Raschka, PhD, has been working in machine learning and AI for more than a decade. In addition to being a researcher, Sebastian has a strong passion for education. He is known for his bestselling books on machine learning with Python and his contributions to open source.Sebastian... Read More →
Thursday September 19, 2024 9:24am - 9:39am PDT
Festival Pavilion - Keynote Room

9:41am PDT

Keynote: Building an Advanced Knowledge Assistant - Jerry Liu, Co-Founder & CEO, LlamaIndex
Thursday September 19, 2024 9:41am - 9:56am PDT
A huge promise for LLMs is being able to answer questions and solve tasks of arbitrary complexity over an arbitrary number of data sources. The world has started to shift from simple RAG stacks, which are mostly good for answering pointed questions, to agents that can more autonomously reason over a diverse set of inputs, and interleave retrieval and tool use to produce sophisticated outputs.

Building a reliable multi-agent system is challenging. There's a core question of developer ergonomics and production deployment - what makes sense outside a notebook setting. In this talk we outline some core building blocks for building advanced research assistants, including advanced RAG modules, event-driven workflow orchestration, and more.
Speakers
avatar for Jerry Liu

Jerry Liu

CEO, LlamaIndex
Jerry is the co-founder/CEO of LlamaIndex, the data framework for building LLM applications. Before this, he has spent his career at the intersection of ML, research, and startups. He led the ML monitoring team at Robust Intelligence, did self-driving AI research at Uber ATG and worked... Read More →
Thursday September 19, 2024 9:41am - 9:56am PDT
Festival Pavilion - Keynote Room

9:58am PDT

Keynote: Ray: A Distributed Framework for Heterogeneous Computing - Ion Stoica, Professor, UC Berkeley
Thursday September 19, 2024 9:58am - 10:13am PDT
Ray has recently become the framework of choice for scaling machine learning workloads—from data preprocessing, to training, fine-tuning, and serving. This talk will highlight Ray’s key features responsible for its flexibility and generality, as well as its recent support for GPUs.
Speakers
avatar for Ion Stoica

Ion Stoica

Professor, UC Berkeley
Ion Stoica is a Professor in the EECS Department at the University of California at Berkeley, and the Director of Sky Computing Lab (https://sky.cs.berkeley.edu/). He is currently doing research on cloud computing and AI systems. Past work includes Ray, Apache Spark, Apache Mesos, Tachyon, Chord DHT, and Dynamic Packet State (DPS). He is an Honorary Member of the Romanian Academy, an ACM Fellow and has received numerous awards, including the Mark Weiser Award (2019... Read More →
Thursday September 19, 2024 9:58am - 10:13am PDT
Festival Pavilion - Keynote Room

10:15am PDT

Keynote: Community Awards
Thursday September 19, 2024 10:15am - 10:25am PDT
Thursday September 19, 2024 10:15am - 10:25am PDT
Festival Pavilion - Keynote Room

10:25am PDT

Coffee Break
Thursday September 19, 2024 10:25am - 10:50am PDT
Thursday September 19, 2024 10:25am - 10:50am PDT
Gateway Pavilion - Sponsor Showcase

10:25am PDT

Sponsor Showcase
Thursday September 19, 2024 10:25am - 8:00pm PDT
Thursday September 19, 2024 10:25am - 8:00pm PDT
Gateway Pavilion - Sponsor Showcase

10:50am PDT

Lightning Talk: On-Device Profiling and Debugging with ExecuTorch - Olivia Liu & Vaun Puri, Meta
Thursday September 19, 2024 10:50am - 11:00am PDT
High developer velocity is crucial to shipping new ML-enabled experiences from a server-trained model to a customers’ device. ExecuTorch is an on-device runtime that seamlessly integrates with the PyTorch stack with a focus on developer productivity. We present the ExecuTorch Dev Tools and highlight key features that tighten the iteration loop when optimizing models for deployment and execution on edge devices. We demonstrate how ExecuTorch’s built-in profiler and bundled tools tackle key pain-points, such as: 1. Examining the memory footprint of an ExecuTorch program ahead-of-time; 2. Collecting runtime performance metrics and intermediate outputs for accuracy analysis; 3. Correlating runtime data with the underlying graph of an exported model.
Speakers
avatar for Olivia Liu

Olivia Liu

Software Engineer, Meta
Olivia has been worked on PyTorch at Meta for over 2 years, focusing on on-device inference and building out profiling and debugging tools for model developers.
Thursday September 19, 2024 10:50am - 11:00am PDT
Festival Pavilion - Breakout Room A

10:50am PDT

Sponsored Session: Democratizing AI: Powering the Future with Arm’s Global Compute Ecosystem - Gian Marco Iodice, Arm
Thursday September 19, 2024 10:50am - 11:15am PDT
Arm is excited to be at the center of the world's largest compute ecosystem at the dawn of the AI era. A key tenant of our mission is to democratize AI capabilities, empowering millions of developers to put advanced AI features into the hands of billions of users.

In this presentation, we'll explore how Arm is enabling the world’s leading open-source AI frameworks to leverage power-efficient Arm-based computing platforms and Arm architecture features, as a tool for enabling fast and secure AI workloads. The session focuses on how our strategic partnership with the Pytorch and Executorch community is enabling a seamless and transparent developer experience, to run workloads everywhere from cloud to edge. This session will highlight some of our optimized libraries, upstreamed contributions and a wealth of AI-related developer material to build the future of AI on Arm.
Speakers
avatar for Gian-Marco Iodice

Gian-Marco Iodice

GenAI Engineering Lead, Arm
Gian Marco Iodice is an experienced edge and mobile computing specialist at Arm for machine learning (ML) and leads engineering development for on-device GenAI. He received the MSc with honors in electronic engineering from the University of Pisa (Italy), where he specialized in HW/SW... Read More →
Thursday September 19, 2024 10:50am - 11:15am PDT
Gateway Pavilion - Cowell Theater

10:50am PDT

The Rise of `Transformers` in the Growing PyTorch Ecosystem - Arthur Zucker, Hugging Face
Thursday September 19, 2024 10:50am - 11:15am PDT
Explore how the `tranformers` library grows and adapts to the fast paced and ever-changing AI field to bring the best to the AI community
Speakers
avatar for Arthur Zucker

Arthur Zucker

Core Maintainer, Hugging Face
Arthur is a Core maintainer at Hugging Face, maintaining several critical libraries such as transformers and tokenizers. He is the owner of the text and LLM parts of Hugging Face's open-source toolkits, resulting in the implementations of LLaMa, Mistral, MoEs, etc and torch.compile... Read More →
Thursday September 19, 2024 10:50am - 11:15am PDT
Festival Pavilion - Breakout Room B

11:05am PDT

Lightning Talk: LLMs on Edge with AI Accelerators - Chen Lai, Kimish Patel & Cemal Bilgin, Meta
Thursday September 19, 2024 11:05am - 11:15am PDT
LLMs are known to be compute heavy and consume lots of resources (almost all resources on phones), including memory and power. A natural thought is to leverage the AI hardware accelerators, for example, Apple Neural Engine (ANE) on Apple devices and HTP on Qualcomm SoCs, to make it run fast and efficiently. Only by optimizing the model latency, memory consumption and power usage to a certain level will users be interested in installing the models on their devices. In this session, we’d like to introduce how we leverage these AI accelerators within the PyTorch ecosystem to achieve the state-of-art performance for llama3 on device, via ExecuTorch and the partnership with Apple and Qualcomm. Hardware companies usually have their own AI accelerators. Likely they have different characteristics, one may support a list of different operators than others, and one may only support static shapes (like HTP). However, transformers-based optimization can be generic. We’ll discuss in more detail how we apply the generic optimization as well as the backend specific optimization. The techniques we applied here are not just for LLMs, but can be applied to other transformer-based models.
Speakers
KP

Kimish Patel

Software Engineer, Meta Platforms
Kimish has worked on enabling PyTorch on Meta's family of apps, primarily focusing on performance optimizations. His past experiences include hardware/software co-design, CPU architecture, and CPU/GPU performance optimization.
avatar for Chen Lai

Chen Lai

Software Engineer, Meta
Software engineers focusing on bringing up accelerators on devices
avatar for CEMAL Bilgin

CEMAL Bilgin

Engineering Manager, Meta
Engineering Manager PyTorch Edge Acceleration
Thursday September 19, 2024 11:05am - 11:15am PDT
Festival Pavilion - Breakout Room A

11:20am PDT

Lightning Talk: Building and Supporting the Chinese PyTorch Community: Resources, Tutorials, and Engagement - Zong Zesheng, Huawei
Thursday September 19, 2024 11:20am - 11:30am PDT
Description: This proposal aims to provide a comprehensive introduction to the Chinese PyTorch community, we hope to inspire more users to join and contribute, fostering a vibrant and inclusive environment for PyTorch enthusiasts in China. Chinese PyTorch Homepage Introduction to the official Chinese version of the PyTorch website, highlighting its features. Navigation tips and key sections, such as documentation, tutorials, and community events. Improve the connection of users from China with PyTorch Community. Localized Tutorials and Documentation The 2.x version not have Translated version, it hard to catch up with latest features of PyTorch if the beginner not good at English. We translated official documents and tutorials, covering everything from basic PyTorch concepts to advanced applications. Interactive tutorials No interactive tutorials(Like Google Colab) for Chinese students or beginners before, they have to setup environment before start with PyTorch, which might be hard for beginners. And now, an online notebook & tutorials are available to practice or tuning steps for beginners.
Speakers
avatar for zong zesheng

zong zesheng

Software Engineer, Huawei
Currently, trying to let Chinese users to have easier access to PyTorch resources and make a friendly user experiences for beginners.
Thursday September 19, 2024 11:20am - 11:30am PDT
Gateway Pavilion - Cowell Theater

11:20am PDT

Sponsored Session: Torchchat: A Showcase of PyTorch LLM Ubiquity - Jack Khuu & Jesse White, Meta
Thursday September 19, 2024 11:20am - 11:45am PDT
This talk explores the journey of enabling LLMs in the PyTorch ecosystem, as well as how the teams behind AOT Inductor, ExecuTorch, and torchao collaborated to create torchchat, a showcase of PyTorch’s ability to run LLM inference everywhere.

Torchchat demonstrates the ubiquity, simplicity, and quality of PyTorch’s LLM support through performant, reproducible implementations for not only Python environments, but on desktop, server, and on-device as-well.

All of our work is open source and available on GitHub.
Speakers
avatar for Jack Khuu

Jack Khuu

Software Engineer, Meta
Software Engineer @ Meta working on the PyTorch Edge team. Currently, the TL for torchchat, which is PyTorch's showcase of LLM inference ubiquity (Python, Desktops, Mobile, etc.). More broadly, I focus on the "Experience" of PyTorch Edge, encompassing User, Developer, and Community... Read More →
avatar for Jesse White

Jesse White

Software Engineering Manager, Meta
Jesse is an engineering manager at PyTorch @ Meta, where he supports the Edge Experience team in improving the experience for on-device inference and training, including mobile, laptops, and embedded devices. With nearly 20 years of experience in startups, Jesse is passionate about... Read More →
Thursday September 19, 2024 11:20am - 11:45am PDT
Festival Pavilion - Breakout Room A

11:20am PDT

Training MoEs at Scale with PyTorch - Mihir Patel & Brian Chu, Databricks
Thursday September 19, 2024 11:20am - 11:45am PDT
Mixture-of-Experts MoE (models) are becoming an increasingly popular architecture choice for large language models (LLMs). In this talk, we describe how to train MoE models with PyTorch. After discussing various performance tradeoffs, we use PyTorch distributed tools like DTensor to build custom parallelism approaches, including expert parallelism via MegaBlocks. We then show how to get near linear scaling to thousands of GPUs, combining PyTorch FSDP and HSDP with our parallelism strategies. We discuss many of the challenges of training at scale, including communication bottlenecks, hardware failures, and networking challenges. We further improve training at scale setups using tools like PyTorch Distributed Checkpointing for rapid saving and loading. We then highlight further optimizations to minimize challenges only present at scale, such as object store failures for large checkpoints.
Speakers
avatar for Mihir Patel

Mihir Patel

Research Engineer, Databricks
Mihir Patel is a Research Engineer at MosaicML / Databricks, where he works on distributed training at scale and serves as the tech lead for Composer, an open-source deep learning training library. His primary focus is on large model training, and he has helped build several open... Read More →
avatar for Brian Chu

Brian Chu

Research Engineer, MosaicML / Databricks
Brian is a Research Engineer at Mosaic / Databricks, where he contributes to Composer and Foundry, open-source libraries for training LLMs. He has been involved in the DBRX project and products like the Databricks finetuning and pretraining API. Prior to joining Databricks, Brian... Read More →
Thursday September 19, 2024 11:20am - 11:45am PDT
Festival Pavilion - Breakout Room B

11:35am PDT

Lightning Talk: Distributing a Million Open Models in the Wild: Lessons Learned from the Hugging Face Hub - Omar Sanseviero, Hugging Face
Thursday September 19, 2024 11:35am - 11:45am PDT
The Hugging Face Hub has over 300,000 PyTorch models. Distributing such number of models poses challenges. In this talk, Omar will share how the community has tackled these challenges, including techniques to ensure torch model security and tooling for researchers to share their models. He'll also take attendees on a journey through the evolution of torch models distributed by the community, highlighting new trends and directions. Attending this talk will give attendees practical insights into the latest developments in model distribution and ecosystem trends.
Speakers
avatar for Omar Sanseviero

Omar Sanseviero

Chief Llama Officer - Head of Platform and Community, Hugging Face
Omar Sanseviero is the Chief Llama Officer and Head of Platform and Community at Hugging Face, where he works at the intersection of open source, community, and product. Omar leads multiple ML teams that work on topics such as Mobile ML, ML for art, and ML Partnerships. Previously... Read More →
Thursday September 19, 2024 11:35am - 11:45am PDT
Gateway Pavilion - Cowell Theater

11:50am PDT

Lightning Talk: Empowering Developers: Tools and Resources for Running Generative AI on Arm CPUs - Pareena Verma, Arm
Thursday September 19, 2024 11:50am - 12:00pm PDT
As the demand for accessible and scalable AI solutions grows, leveraging CPUs for generative AI offers significant advantages in cost, energy efficiency and widespread availability. This sessions aims to equip developers with the ecosystem of tools, resources and technical content needed to effectively run generative AI use cases on Arm CPUs. We have launched a range of easily digestible tutorials for developers, part of our Learning Paths on https://learn.arm.com/, which demonstrate how you can easily and efficiently run small and large language models on Arm-based devices. Learn about end-to-end workflows to accelerate PyTorch based sentiment analysis models from Hugging Face on Arm servers with optimizations in Arm Compute Library kernels for fp32 and bfloat16. Use the new KleidiAI library to accelerate LLMs with AI frameworks and build an Android chat app on your Arm mobile device with ExecuTorch, and XNNPACK. Find out about our roadmap for learning content demonstrating the feasibility and successful deployment of generative AI on Arm-based devices. Help us shape the support that we offer developers.
Speakers
avatar for Pareena Verma

Pareena Verma

Principal Solutions Architect, Arm
Pareena is a Principal Solutions Architect at Arm. She has extensive experience working with software developers and SoC architects on numerous Arm based projects involving usage of modeling, ML frameworks, compilers, debuggers and virtual prototyping simulation tools. Pareena holds... Read More →
Thursday September 19, 2024 11:50am - 12:00pm PDT
Festival Pavilion - Breakout Room B

11:50am PDT

Lightning Talk: Implementing and Using Iterable Datasets: What Could Go Wrong? - Nicolas Hug, Meta
Thursday September 19, 2024 11:50am - 12:00pm PDT
PyTorch supports two kinds of datasets: Iterable datasets and indexable "map-style" datasets. Iterable datasets can be more flexible and potentially faster than their indexable cousins. They are also much harder to use correctly, and can easily lead to silently wrong results. This talk is a quick and fun intro to some of the traps that Iterable datasets lay out for you, with some tips to help you avoid them.
Speakers
avatar for Nicolas Hug

Nicolas Hug

Research Engineer, Meta
Nicolas is a software engineer in the PyTorch team at Meta, where he mainly contributes to the torchvision library. Prior to that, Nicolas was a research scientist at Columbia University, where he became part of the scikit-learn core development team. Nicolas holds a PhD in machine... Read More →
Thursday September 19, 2024 11:50am - 12:00pm PDT
Gateway Pavilion - Cowell Theater

11:50am PDT

Lightning Talk: New Activation Checkpointing APIs in PyTorch - Jeffrey Wan & Horace He, Meta
Thursday September 19, 2024 11:50am - 12:00pm PDT
Activation checkpointing is a commonly used technique to reduce memory usage during model training by reducing the number of activations saved for backward. Instead of keeping tensors needed for backward alive until they are used in gradient computation during backward, those tensors are recomputed during the backward pass. This talk will introduce new activation checkpoint APIs that can help achieve a better trade off between memory savings and compute overhead that recomputing introduces.
Speakers
avatar for Horace He

Horace He

Software Engineer, Meta
To be filled
avatar for Jeffrey Wan

Jeffrey Wan

Software Engineer, Meta
Software Engineer working on PyTorch
Thursday September 19, 2024 11:50am - 12:00pm PDT
Festival Pavilion - Breakout Room A

12:00pm PDT

Lightning Talk: Fast, Scalable Distributed Training with StreamingDataset - Saaketh Narayan, Databricks
Thursday September 19, 2024 12:00pm - 12:10pm PDT
StreamingDataset makes training on large datasets from cloud storage as fast, cheap, and scalable as possible. It’s specially designed for multi-node, distributed training for large models — maximizing correctness guarantees, performance, and ease of use. Key features include elastically deterministic training, instant mid-epoch resumption, effective shuffling, high training throughput, and flexible data mixing, among other features. When training with StreamingDataset, the data shards are written to cloud storage in MDS, our file format that allows for low-latency random access to samples. By being as efficient as possible with shard downloads and shuffling, StreamingDataset minimizes egress costs while ensuring that dataloading never bottlenecks model training. StreamingDataset powers training for LLMs with over 100 billion parameters like DBRX, to advanced diffusion models, to two-tower recommendation models, and more, scaling to training jobs on thousands of GPUs with ease. Join us to learn how StreamingDataset can elevate your distributed model training experience.
Speakers
avatar for Saaketh Narayan

Saaketh Narayan

Machine Learning Engineer, Databricks
Saaketh Narayan is a machine learning engineer at Databricks. As part of the Mosaic AI Runtime team, he works on the GenAI training stack, including dataloading, training frameworks, and performance across the Mosaic Streaming, Composer, and LLM Foundry libraries.
Thursday September 19, 2024 12:00pm - 12:10pm PDT
Gateway Pavilion - Cowell Theater

12:00pm PDT

Lightning Talk: FlexAttention - the Flexibility of PyTorch + the Performance of FlashAttention - Yanbo Liang & Horace He, Meta
Thursday September 19, 2024 12:00pm - 12:10pm PDT
Introducing a novel abstraction leveraging the PyTorch compiler stack to enable custom, user-defined attention mechanisms. This new API supports dynamic modifications to attention scores within SDPA, providing both runtime and memory efficiency through kernel fusion with the FlashAttention algorithm.
Speakers
avatar for Yanbo Liang

Yanbo Liang

software engineer, Meta
I'm software engineer at PyTorch team working on torch.compile and LLM.
avatar for Horace He

Horace He

Software Engineer, Meta
To be filled
Thursday September 19, 2024 12:00pm - 12:10pm PDT
Festival Pavilion - Breakout Room A

12:00pm PDT

Lightning Talk: Optimized PyTorch Inference on aarch64 Linux CPUs - Sunita Nadampalli, Amazon (AWS)
Thursday September 19, 2024 12:00pm - 12:10pm PDT
In the last 2 years we've optimized performance of PyTorch on Arm processors. The optimizations have included changes to ATen, C10, MKLDNN operators, GEMM backend, and Torch inductor. In many cases instead of writing our own kernel we integrated the Arm compute library, used fastmath kernels with format types like bf16, implemented operator caching, selected optimal backend based on the input context etc. Through these optimizations we improved performance by over 2x. In this presentation first we will talk about how we went across this process, what those optimizations are, performance numbers for AWS Graviton3 processors for around 75 models, and CI/CD workflow details. Next, we will walk through a sample PyTorch application showing basic usage, how to tune runtime and the resulting speed up. At the end of the presentation attendees will learn about PyTorch performance optimizations on Arm processors, how to use them, and the areas where they can collaborate to further improve PyTorch for aarch64 CPUs.
Speakers
avatar for Sunita Nadampalli

Sunita Nadampalli

Software Development Manager, Amazon/AWS
Sunita Nadampalli is a Software Development Manager at AWS. She leads Graviton software performance optimizations for AI/ML and HPC workloads. She is passionate about open source software development and delivering high-performance and sustainable software solutions with Arm SoCs... Read More →
Thursday September 19, 2024 12:00pm - 12:10pm PDT
Festival Pavilion - Breakout Room B

12:10pm PDT

Lightning Talk: AOTriton: Ahead of Time Triton Kernel Libraries on ROCm - Jeff Daily, AMD
Thursday September 19, 2024 12:10pm - 12:20pm PDT
Scaled dot product attention provides significant acceleration of the transformer layer through fusion of the multihead attention layer. There are several different algorithms to achieve this but tiled attention through scaled dot product attention via Flash Attention is a very popular approach. In PyTorch on the ROCm platform this is currently achieved through ahead of time compiled (AOT) Triton kernels in a linkable archive. AMD’s work to enable and package these kernels is done through AOTriton, which aims to use Triton’s compiler and GPU kernels for faster development. AOTriton maintains an optimized set of tiling sizes and other parameters to provide optimized, pre-compiled Triton kernels. The differences between JIT and AOT are few but are very important. Despite this, prototyping kernels in Triton is much faster than template-based C++ libraries. In this presentation we will go into detail on the interaction layer between PyTorch and AOTriton, the structure of AOTriton and how to add new triton kernels to AOTriton.
Speakers
avatar for Jeff Daily

Jeff Daily

Principal Member of Technical Staff, Advanced Micro Devices
Jeff Daily is the chief architect of the Machine Learning Software Engineering group supporting ML frameworks such as PyTorch and onnxruntime on AMD GPUs.  He enjoys delivering open source software to answer the challenges of the rapidly-changing ML landscape.  For over five years... Read More →
Thursday September 19, 2024 12:10pm - 12:20pm PDT
Festival Pavilion - Breakout Room B

12:10pm PDT

Lightning Talk: Making the Most of Heterogeneous Memory Capacity Using PyTorch - Syed Ahmed, NVIDIA Corporation
Thursday September 19, 2024 12:10pm - 12:20pm PDT
Memory intensive deep learning workloads require efficient use of all kinds of memories that are available in a system. In this session, we will discuss how we can utilize such heterogeneous memory through memory pools in PyTorch. We will show how to mix-and-match different CUDA system allocators in the same PyTorch program using memory pools. Consequently, this API unlocks new use cases such as Extended GPU Memory (EGM) based all-gathers, Unified Virtual Memory (UVM), and NVLink Sharp (NVLS) reductions. New NVIDIA architectures accelerate such use cases with high-bandwidth and low-latency interconnects in the hardware, driven by extended functionality of CUDA system allocators in the software. Learn how to use these techniques on memory-intensive deep learning models like LLMs, and discover new CUDA features powered by PyTorch.
Speakers
avatar for Syed Ahmed

Syed Ahmed

Senior Software Engineer, NVIDIA
Syed Ahmed is a Senior Software Engineer on the PyTorch Core team at NVIDIA, focused on keeping PyTorch fast and numerically stable on current NVIDIA platforms, and making PyTorch more expressive on future NVIDIA platforms. He holds a Master’s degree in Electrical Engineering from... Read More →
Thursday September 19, 2024 12:10pm - 12:20pm PDT
Festival Pavilion - Breakout Room A

12:10pm PDT

Lightning Talk: PyTorch-Wildlife: A Collaborative Deep Learning Framework for Conservation - Zhongqi Miao, Microsoft
Thursday September 19, 2024 12:10pm - 12:20pm PDT
The alarming decline in global biodiversity, driven by various factors, underscores the urgent need for large-scale wildlife monitoring. To address these challenges, we introduce Pytorch Wildlife, an open-source deep learning platform built on PyTorch. It is designed for creating, modifying, and sharing powerful AI models. This platform emphasizes usability and accessibility, making it accessible to individuals with limited or no technical background. It also offers a modular codebase to simplify feature expansion and further development. Pytorch-Wildlife offers an intuitive, user-friendly interface, accessible through local installation or Hugging Face, for animal detection and classification in images and videos. As two real-world applications, Pytorch-Wildlife has been utilized to train animal classification models for species recognition in the Amazon Rainforest and for invasive opossum recognition in the Galapagos Islands. The Opossum model achieves 98% accuracy, and the Amazon model has 92% recognition accuracy for 36 animals in 90% of the data. As Pytorch-Wildlife evolves, we aim to integrate more conservation tasks, addressing various environmental challenges.
Speakers
avatar for Zhongqi Miao

Zhongqi Miao

Research Scientist, Microsoft
My research focus is AI (especially modern computer vision) applications in environmental science and ecology. I am currently in the AI for Good Lab, working on large-scale wildlife recognition through ground-based cameras (i.e., camera traps), bioacoustics, and overhead imagery... Read More →
Thursday September 19, 2024 12:10pm - 12:20pm PDT
Gateway Pavilion - Cowell Theater

12:25pm PDT

Lunch (Provided Onsite for All Attendees)
Thursday September 19, 2024 12:25pm - 1:25pm PDT
Thursday September 19, 2024 12:25pm - 1:25pm PDT
Gateway Pavilion - Sponsor Showcase

1:25pm PDT

Sponsored Keynote: Accelerating AI: How AMD and PyTorch Drive Innovation with Seamless Day-0 Support and High Performance - Anush Elangovan, CVP Software Development, AMD
Thursday September 19, 2024 1:25pm - 1:30pm PDT
In this keynote presentation, we explore the robust collaboration between AMD and PyTorch that is propelling advancements in artificial intelligence and machine learning. Discover how AMD's commitment to Day-0 PyTorch support ensures that PyTorch users benefit from cutting-edge performance enhancements and out-of-the-box compatibility. We delve into the technical synergies that make AMD hardware an ideal choice for PyTorch frameworks, showcasing real-world examples of accelerated workflows and breakthrough AI applications. Join us to learn how this dynamic partnership is enabling researchers, developers, and data scientists to push the boundaries of innovation and achieve unprecedented results in their AI projects.
Speakers
avatar for Anush Elangovan

Anush Elangovan

Vice President - AI Software, AMD
Thursday September 19, 2024 1:25pm - 1:30pm PDT
Festival Pavilion - Keynote Room

1:32pm PDT

Sponsored Keynote: Optimizing AI Inference for Large Language Models - Mudhakar Srivatsa, Distinguished Engineer, IBM
Thursday September 19, 2024 1:32pm - 1:37pm PDT
This talk will cover two new ways IBM has optimized generative AI inferencing with PyTorch: speculative decoding and Triton kernel development. Speculative decoding leverages predictive modeling to reduce latency by anticipating potential outputs, streamlining the inference process without sacrificing accuracy. IBM Research's team developed new speculative architectures and open sourced speculators for LLama3 models. It will also discuss various Triton kernels to accelerate inference, one of which was contributed to vLLM for accelerating MoE models. Finally, it will share a glimpse of IBM's AI hardware work, including how the IBM Artificial Intelligence Unit (AIU) could integrate into the PyTorch stack.
Speakers
avatar for Mudhakar Srivatsa

Mudhakar Srivatsa

Distinguished Engineer, IBM Research
Mudhakar Srivatsa is a distinguished research staff member at the Distributed Cloud department in IBM T. J. Watson Research Center. His work is focussed on heterogeneous spatiotemporal data with applications to edge computing, AIOps and Hybrid AI Scaling. He is an IBM master inv... Read More →
Thursday September 19, 2024 1:32pm - 1:37pm PDT
Festival Pavilion - Keynote Room

1:40pm PDT

Keynote Panel Discussion: Scaling & Benchmarking - Wei-Lin Chiang & Lisa Dunlap, UC Berkeley; James Bradbury, Anthropic; Tri Dao, together.ai; Aparna Ramani & Soumith Chintala, Meta
Thursday September 19, 2024 1:40pm - 2:10pm PDT
Moderators
avatar for Soumith Chintala

Soumith Chintala

VP/Fellow of Meta & Co-Creator of PyTorch
I am an Artificial Intelligence researcher, engineer and community builder.I am currently at Meta, jumping between Engineering, Research and Leadership as I find convenient. I also visit NYU as a part-time researcher.My career interests have been defined by two sets of work: AI Platforms/Ecosystems... Read More →
Speakers
avatar for James Bradbury

James Bradbury

Software Engineer, Anthropic
James is Head of Compute at Anthropic, where he is focused on ensuring that the company has the accelerator resources it needs to pursue its mission, and that the resources can be used effectively and efficiently across the organization. He joined in 2023 from Google DeepMind, where... Read More →
avatar for Lisa Dunlap

Lisa Dunlap

Student, UC Berkeley
PhD student at UC Berkeley working on (1) interpreting and evaluating generative models and (2) automating data science on unstructured data using large multimodal modelsAlso an underwhelming nail enthusiast and reader of old psychiatry books.
avatar for Wei-Lin Chiang

Wei-Lin Chiang

PhD Student, UC Berkeley / LMSYS
Wei-Lin Chiang is a PhD candidate at UC Berkeley advised by Ion Stoica and a core member at LMSYS. His research focuses on developing robust evaluation systems for AI. He currently leads efforts of Chatbot Arena, a crowdsourced AI evaluation platform and community leaderboards.
avatar for Tri Dao

Tri Dao

Assistant Professor at Princeton University, Chief Scientist of Together AI, Princeton University, Together AI
Tri Dao is an Assistant Professor at Princeton University and chief scientist of Together AI. He completed his PhD in Computer Science at Stanford, co-advised by Christopher Ré and Stefano Ermon. He works at the intersection of machine learning and systems, and his research highlights... Read More →
avatar for Aparna Ramani

Aparna Ramani

VP Engineering, Meta
Aparna is VP Engineering at Meta, responsible for AI Infrastructure, Data Infrastructure and Developer Infrastructure. Over the last eight years at Meta, Aparna has built a world-class team that is responsible for some of the largest scale systems on the planet - to process exabyte-scale... Read More →
Thursday September 19, 2024 1:40pm - 2:10pm PDT
Festival Pavilion - Keynote Room

2:15pm PDT

Building PyTorch Computer Vision Algorithms for 100 Skin Shades - Emmanuel Acheampong, roboMUA
Thursday September 19, 2024 2:15pm - 2:40pm PDT
At roboMUA we're leading the charge in building predictive AI models for diverse skin shades with the use of Convolutional Neural Networks (CNNs), and harnessing the power of Generative Adversarial Networks (GANs) specifically for generating realistic images of black hairstyles. Our session showcases PyTorch's versatility in both predictive and generative tasks, offering a comprehensive approach to inclusive AI. For predictive AI models, we leverage PyTorch's flexible framework to develop CNNs. Through innovative techniques in feature engineering and model architecture design, we demonstrate how PyTorch enables accurate prediction across 100 skin shades. Simultaneously, we showcase the transformative potential of GANs in the realm of black hairstyles. By training GANs on a curated dataset of diverse hair textures and styles, we illustrate how PyTorch facilitates the generation of lifelike images that celebrate the beauty and diversity of black hair. Attendees will gain insights into the data preprocessing, model training, and evaluation processes and and learn how PyTorch empowers developers to build inclusive solutions.
Speakers
avatar for Emmanuel Acheampong

Emmanuel Acheampong

CEO / Head of AI, yShade.ai (formerly roboMUA)
Emmanuel Acheampong is a co-founder and CEO of roboMUA - an innovative AI solutions company with a visionary focus on catering to all skin shades and types. He graduated from Notre Dame’s ESTEEM program with a Masters thesis on the intersection of Artificial Intelligence and directed... Read More →
Thursday September 19, 2024 2:15pm - 2:40pm PDT
Gateway Pavilion - Cowell Theater

2:15pm PDT

Data-Dependent Shapes in PT2 - Edward Yang, Meta
Thursday September 19, 2024 2:15pm - 2:40pm PDT
Data-dependent shapes are ubiquitous whenever you want to take advantage of sparsity in your data representation, whether it is in recommendation systems, mixture of experts or other use cases. We have made a lot of improvements to torch.compile's support for capturing and compiling data dependent shapes, but they also require some user knowledge to work with effectively. This talk will give an overview of PT2's facilities for data dependent compute and how to use them effectively.
Speakers
avatar for Edward Z. Yang

Edward Z. Yang

Research Engineer, Meta
Edward Yang has worked on PyTorch at Meta since nearly the very beginning. Currently, he works on all aspects of PT2, but with a particular focus on dynamic shapes support across the stack.
Thursday September 19, 2024 2:15pm - 2:40pm PDT
Festival Pavilion - Breakout Room A

2:15pm PDT

vLLM: Easy, Fast, and Cheap LLM Serving for Everyone - Woosuk Kwon, UC Berkeley & Xiaoxuan Liu, UCB
Thursday September 19, 2024 2:15pm - 2:40pm PDT
We will present vLLM, an open-source high-performance LLM inference engine built on top of PyTorch. Starting as a research project at UC Berkeley, vLLM has been one of the fastest and most popular LLM inference solutions in industry, reaching 20K+ stars and 350+ contributors. In this talk, we will cover how vLLM adopts various LLM inference optimizations and how it supports various AI accelerators such as AMD GPUs, Google TPUs, and AWS Inferentia. Also, we will discuss how vLLM benefits from PyTorch 2 and its ecosystem.
Speakers
avatar for Lily Liu

Lily Liu

Student, UCB
Lily (Xiaoxuan) Liu is a PhD student at UC Berkeley, working with Professors Ion Stoica and Alvin Cheung. Her research focuses on machine learning systems, particularly optimizing latency for LLM inference and addressing memory bottlenecks in LLM systems. Her recent work explores... Read More →
avatar for Woosuk Kwon

Woosuk Kwon

PhD Student, UC Berkeley
Woosuk Kwon is a Ph.D. student at UC Berkeley, advised by Prof. Ion Stoica. He is interested in building practical, flexible, and high-performance software systems for emerging applications such as large language models. Recently, he has been developing vLLM, a high-performance open-source... Read More →
Thursday September 19, 2024 2:15pm - 2:40pm PDT
Festival Pavilion - Breakout Room B

2:45pm PDT

Lightning Talk: What's New for PyTorch Developer Infrastructure - Sahan Paliskara & Catherine Lee, Meta
Thursday September 19, 2024 2:45pm - 2:55pm PDT
Having a chat about all of the work being done to continue supporting PyTorch's Developer Infrastructure needs including updates around Target Determination, Releases, and OSS Tooling.
Speakers
avatar for Catherine Lee

Catherine Lee

Software Engineer, META
Software engineer on the PyTorch Dev Infra team primarily working on reducing time to signal, testing infrastructure, and CI related developer tooling.
avatar for Sahan Paliskara

Sahan Paliskara

Software Engineer, Meta
After spending a lot of time using PyTorch to train computer vision models, Sahan joined the PyTorch team three years ago. He started off working on inference and packaging, and now he's part of the dev infra team. These days, he's involved in everything from managing releases to... Read More →
Thursday September 19, 2024 2:45pm - 2:55pm PDT
Festival Pavilion - Breakout Room A

2:45pm PDT

Blobs to Clips: Efficient End-to-End Video Data Loading - Andrew Ho & Ahmad Sharif, Meta
Thursday September 19, 2024 2:45pm - 3:10pm PDT
The PyTorch team has improved training speed by an order of magnitude for teams at Meta working on Small-to-Large-Scale MultiModal Video models. In this talk we’ll share our learnings on reducing GPU starvation by overcoming data loading challenges such as dealing with large distributed datasets, worker imbalance, compute-bottlenecks due to parallel video decoding and sampling, checkpointing, and debuggability. As part of our commitment to open-source, we are releasing a new decoding library and updating existing PyTorch libraries on GitHub, and invite feedback and contributions from the community.
Speakers
avatar for Ahmad Sharif

Ahmad Sharif

Software Engineer, Meta
SWE in Pytorch Content Domains Past: SWE at Google in Search, Privacy, ChromeOS
avatar for Andrew Ho

Andrew Ho

Machine Learning Engineer, Meta Platforms
We are ML Engineers at Meta on PyTorch working on multi-modal LLM dataloading
Thursday September 19, 2024 2:45pm - 3:10pm PDT
Gateway Pavilion - Cowell Theater

2:45pm PDT

Torchtitan: Large-Scale LLM Training Using Native PyTorch 3D Parallelism - Wanchao Liang, Meta & Linsong Chu, IBM Research
Thursday September 19, 2024 2:45pm - 3:10pm PDT
torchtitan is a proof-of-concept for Large-scale LLM training using native PyTorch. It is a repo that showcases PyTorch's latest distributed training features in a clean, minimal codebase. We show-cased end to end large scale training features enablement: 1. 3D/4D Parallelism 2. Efficient distributed checkpoint save/load/resharding 3. Many efficient training techniques including Float8, torch.compile, activation checkpoint, etc.
Speakers
avatar for Wanchao Liang

Wanchao Liang

Software Engineer, Meta Platforms, Inc.
Software Engineer at Meta, PyTorch team Tech Lead in PyTorch Distributed training. Author of torchtitan, Tensor Parallel and DTensor, a fundamental distributed abstraction to perform distributed computation. Previously worked on the TorchScript compiler, ONNX.
avatar for LINSONG CHU

LINSONG CHU

Senior Technical Staff Member, IBM Research
Linsong is a STSM at IBM Research, focusing on FSDP, torch compile and FP8 in the area of pre-training.
Thursday September 19, 2024 2:45pm - 3:10pm PDT
Festival Pavilion - Breakout Room B

3:00pm PDT

Lightning Talk: PyTorch Release Process - Andrey Talman, Meta
Thursday September 19, 2024 3:00pm - 3:10pm PDT
I would like to present and quickly discuss PyTorch Release process, how it happens. What are milestones. What is our cherry-picking criteria, how we validate the release.
Speakers
avatar for Andrey Talman

Andrey Talman

Software Engineer, Meta Inc.
Software Engineer - Meta Inc. 2021-Present Part of PyTorch Dev Infra team. Working on PyTorch OSS Releases. Lead Software Engineer - Dow Jones & Company 2019-2021 Part of the team developing software and the API Services used by Dow Jones Factiva website and WSJ. Software Engineer... Read More →
Thursday September 19, 2024 3:00pm - 3:10pm PDT
Festival Pavilion - Breakout Room A

3:15pm PDT

Slaying OOMs - Mark Saroufim & Jane Xu, Meta
Thursday September 19, 2024 3:15pm - 3:40pm PDT
Have you ever hit an OOM (and wished you had more VRAM)? Who hasn't! Hop on the bus with us and feel the road become smoother as we talk about stacking together techniques like FSDP2 + QLoRa + CPU Offloading + Fused ADAM (thanks Intel) + more in PyTorch native. We will give an overview of these techniques as well as the hard edges we solved in their composition. Curious for more? Or...still OOMing? We also plan on discussing our more researchy work on offloading, pagedness, and low precision optimizers.
Speakers
avatar for Jane Xu

Jane Xu

SWE, Meta
I'm Jane and I focus on making our optimizers more...optimal :) in terms of stability, consistency, and performance. My favorite part of PyTorch is the people--they're all smart and cool and fun to learn from! I also like potatoes quite a lot.
avatar for Mark Saroufim

Mark Saroufim

Software Engineer, Meta
Mark Saroufim is a PyTorch Engineer at Meta working on inference, compilers and community.
Thursday September 19, 2024 3:15pm - 3:40pm PDT
Festival Pavilion - Breakout Room B

3:15pm PDT

Sponsored Session: PyTorch Support by Google Enabling Performance from Cloud to Edge - Mark Sherwood & Shauheen Zahirazami, Google
Thursday September 19, 2024 3:15pm - 3:40pm PDT
In this session we will cover various ways teams at google are working to help the Pytorch community achieve performance and scale from cloud to edge. We will cover how Google Cloud customers can use PyTorch and OpenXLA to get competitive performance for their ML workloads.  We’ll also cover how Google AI Edge Torch works with Pytorch to help developers integrate LLMs, vision models and more to easily create new edge applications that can run on a wide set of devices.
Speakers
avatar for Mark Sherwood

Mark Sherwood

Senior Product Manager, Google AI Edge, Google
Mark is a Senior Product Manager on the Google AI Edge team, responsible for TensorFlow Lite and MediaPipe. He specializes in shipping ML powered features on Android, iOS, and Web using the very smallest to the very largest on-device models.
avatar for Shauheen Zahirazami

Shauheen Zahirazami

Senior Staff Engineering Manager, Cloud Machine Learning Compute Services, Google
Shauheen has a PhD in control engineering with a BSc in applied mathematics. He is currently leading Cloud TPU Machine Learning teams at Google who are responsible for ML Frameworks and 3P ecosystem including the PyTorch teams that develop PyTorch/XLA.
Thursday September 19, 2024 3:15pm - 3:40pm PDT
Gateway Pavilion - Cowell Theater

3:15pm PDT

Torch.Compile for Autograd, DDP and FSDP - Will Feng , Chien-Chin Huang & Simon Fan, Meta
Thursday September 19, 2024 3:15pm - 3:40pm PDT
In this talk, we will present the latest advancements in torch.compile for distributed training via DDP and FSDP. We will first introduce Compiled Autograd, a torch.compile mode to fully capture the backpropagation step, including the communication collective operators used in distributed. We will then cover the improvements this new approach brought to Compiled DDP/FSDP, notably by removing DDP/FSDP graph breaks which brings the potential of improving compute/communication overlap.
Speakers
avatar for Chien-Chin Huang

Chien-Chin Huang

Software Engineer, Meta
Software Engineer, PyTorch Distributed, Meta
avatar for Simon Fan

Simon Fan

Software Engineer, Meta
I'm a software engineer on the PyTorch Compiler team, I focus on torch.compile for distributed training frameworks.
avatar for Will Feng

Will Feng

Software Engineer, Meta Platforms, Inc.
Will Feng is a Software Engineer in PyTorch Compiler team at Meta. He has been working in PyTorch core and ecosystem for the past 7 years. He is now working on and most excited about torch.compile for distributed training performance.
Thursday September 19, 2024 3:15pm - 3:40pm PDT
Festival Pavilion - Breakout Room A

3:40pm PDT

Coffee Break
Thursday September 19, 2024 3:40pm - 4:05pm PDT
Thursday September 19, 2024 3:40pm - 4:05pm PDT
Gateway Pavilion - Sponsor Showcase

3:45pm PDT

Sponsor Scavenger Hunt Raffle Drawing
Thursday September 19, 2024 3:45pm - 4:00pm PDT
Grab your scavenger hunt card at registration, visit all our awesome sponsors, and you'll be in the running to win some fantastic prizes!
Thursday September 19, 2024 3:45pm - 4:00pm PDT
Gateway Pavilion - Sponsor Showcase

4:05pm PDT

Lightning Talk: Understanding and Optimizing PyTorch Models with Thunder - Luca Antiga, Lightning AI
Thursday September 19, 2024 4:05pm - 4:15pm PDT
A hallmark feature of PyTorch is the natural expression of computation. This enables practitioners to implement AI models with ease. However, it prompts the question how to optimize the workload for a given hardware setup because those optimizations clutter our code and are tricky to combine. Lightning Thunder provides a Python-to-Python compiler to scale and optimize PyTorch programs that focuses on usability, understandability, and extensibility. A key tool in delivering on these goals is the composability of transformations: without changing the user code, we can stack quantization, distributing the computation across multiple GPUs, dispatching to optimized kernels, offloading, and other pluggable optimizations. Lightning Thunder flourishes in the PyTorch ecosystem: with PyTorch eager and with executors like torch.compile and nvFuser. It also dispatches to libraries like cuDNN, TransformerEngine, Apex, OpenAI Triton. The ability to apply multiple optimizations just-in-time leads to significant compounded speed-ups over unoptimized code out of the box. Luca will discuss the design of Thunder and demonstrate applications on training and inference for large language and multimodal models.
Speakers
avatar for Luca Antiga

Luca Antiga

CTO, Lightning AI
CTO @ Lightning AI, Founder (Orobix, Tensorwerk), early PyTorch core contributor, Manning Author (Deep Learning with PyTorch). PhD in Bioengineering.
Thursday September 19, 2024 4:05pm - 4:15pm PDT
Gateway Pavilion - Cowell Theater

4:05pm PDT

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA
Thursday September 19, 2024 4:05pm - 4:30pm PDT
Understanding how to effectively size a production grade LLM deployment requires understanding of the model(s), the compute hardware, quantization and parallelization methods, KV Cache budgets, input and output token length predictions, model adapter management and much more. - Why LLM inference is different to standard deep learning inference - Current and future NVIDIA GPU overview - which GPU(s) for which models and why - Understanding the importance of building inference engines - Deep recap on the attention mechanism along with different types of popular attention mechanisms used in production - Deep dive on KV Cache and managing KV Cache budgets - Parallelism (reducing latency) - mainly tensor parallelism, but data, sequence, pipeline, and expert parallelism will be highlighted - Quantization methods on weights, activations, and KV Cache to reduce engine sizes for more effective GPU utilization - Increasing throughput with inflight batching and other techniques - Detailed performance analysis of LLM deployments looking at Time to first token, inter-token latencies, llm deployment characterizations, and more that can help reduce deployment costs
Speakers
avatar for Mark Moyou

Mark Moyou

Sr. Data Scientist, NVIDIA
Dr. Mark Moyou Senior Data Scientist at NVIDIA working with enterprise clients on AI strategy and deploying machine learning applications to production. He is the host of the Caribbean Tech Pioneers Podcast, The AI Portfolio Podcast and is the Director of the Optimized AI Confere... Read More →
Thursday September 19, 2024 4:05pm - 4:30pm PDT
Festival Pavilion - Breakout Room B

4:05pm PDT

Startup Showcase
Thursday September 19, 2024 4:05pm - 5:30pm PDT
The PyTorch Conference Startup Showcase is giving emerging companies the chance to pitch to a panel of VCs looking to support AI/ML startups with high growth potential, and meet some of the best AI focused Engineers in the Industry. This is an exciting and unique opportunity for early-stage founders to showcase their ideas and breakthroughs, connect with leading VCs, and increase visibility in the generative AI and machine learning industry.

Judged by Industry Leaders – The Startup Showcase will be evaluated by top AI VCs. This is your opportunity to gain visibility with investors who can propel your business forward and provide valuable feedback.

Connect and Collaborate – The PyTorch Conference fosters a vibrant community of AI and Software Architects, Engineers, Data Scientists, Research Scientists, and academic leaders. This premier event is essential for those dedicated to advancing machine learning, and also provides an excellent opportunity to recruit top talent.

Submit Your ApplicationApply now for your chance to pitch! Presentations will be 5 minutes long and will receive feedback from judges. This is a valuable opportunity for visibility with top AI-focused investors and industry leaders.

Details
  • Submit your application by 5:00pm Pacific Time on Friday, Sep 6, 2024
  • The program committee will review applications, and notifications will be sent out to applicants on Friday, Sep 13, 2024 by 4:00pm Pacific Time.
  • Finalists will present their startup in a 5 minute pitch to a panel of VC judges onsite at the event on the afternoon of Thursday, September 19.
  • Judges will have time to ask questions, and feedback will be provided to finalists post-event.
  • The winning startup will be announced at the end of the event.
  • Application information will be shared with our panel of VCs giving all applicants the opportunity for future connection!

Moderators
avatar for Chappy Asel

Chappy Asel

Co-founder, GenAI Collective
Successful entrepreneur with an expansive technical and operational background built across 10+ years of experience. Co-founder of the GenAI Collective: a community of founders, funders, and thought leaders built around our shared curiosity for generative AI. Ex-Apple AR/VR. Ex-Apple... Read More →
Judges
avatar for Astasia Myers

Astasia Myers

General Partner, Felicis
Astasia Myers is a General Partner at Felicis. Before joining Felicis, she was an enterprise partner at Quiet Capital and an investor at Redpoint Ventures. Astasia focuses on early-stage investing across AI, data, open source, developer tools, and security. She has invested in LaunchDarkly... Read More →
avatar for Kevin Crosby

Kevin Crosby

Sr. Director, Open Source Funding, GitHub
Kevin Crosby is Senior Director leading Open Source Funding at Microsoft’s M12 Github fund. Prior to GitHub, Kevin led business development for VC and Accelerators at Carta and spent 8 years at Amazon in corporate venture and leading product, engineer, and business teams. He is... Read More →
avatar for Rajko Radovanovic

Rajko Radovanovic

Investor, Andreessen Horowitz
Rajko Radovanovic is an investing partner on the infrastructure team at Andreessen Horowitz.
avatar for Simon Tiu

Simon Tiu

VC Investor, Vertex Ventures
Simon Tiu joined Vertex Ventures US in 2024, focusing on enterprise software and cybersecurity investments. Prior to joining Vertex Ventures, Simon worked at Qatalyst Partners, where he was a core member of the Enterprise Software team. During his tenure, he provided strategic and... Read More →
avatar for Vig Sachidananda

Vig Sachidananda

Investor, Gradient Ventures
Vig is an investor at Gradient Ventures.Vig received his M.S., Ph.D in Electrical Engineering from Stanford University and his B.S. in Mechanical Engineering from the University of Maryland, College Park. During his Ph.D, he worked as a seed stage software engineer at Clockwork.io... Read More →
avatar for Vijay Reddy

Vijay Reddy

Partner, Mayfield Fund
Vijay Reddy brings over a decade of inception and early-stage investing experience in AI and Enterprise infrastructure. He had a front-row seat to the rise of AI and has invested across the AI stack from silicon, infrastructure, data, middleware and AI-first applications. Vijay is... Read More →
Thursday September 19, 2024 4:05pm - 5:30pm PDT
Festival Pavilion - Breakout Room A

4:20pm PDT

Lightning Talk: d-Matrix LLM Compression Flow Based on Torch.Fx: Simplifying PTQ/QAT - Zifei Xu & Tristan Webb, d-Matrix Corporation
Thursday September 19, 2024 4:20pm - 4:30pm PDT
We introduce dmx-compressor, d-Matrix's open-source LLM compression toolkit that is modular, robust, efficient, and user-friendly. It utilizes symbolic tracing and fx.Transformer for network compression while keeping the model a first-class citizen in PyTorch for the user, despite prevalent graph dynamism in LLMs. It achieves this by maintaining both the original nn.Module and a just-in-time (JIT) traced and transformed fx.GraphModule representation behind the scenes, in conjunction with an abstraction that cleanly decouples network compression from the original model graph definition. This design allows the FXIR to dynamically adapt to diverse forward call signatures and flow-control arguments throughout quantization-aware training and post-training quantization written in plain PyTorch, yielding a compressed FXIR fully compatible with application-level APIs like the Hugging Face pipeline. We also provide a graph visualizer based on fx.Interpreter for ease of debugging. We believe this project shall empower the community to build efficient LLMs for deployment on custom hardware accelerators and contribute to the PyTorch ecosystem.
Speakers
avatar for Zifei Xu

Zifei Xu

Senior Machine Learning Research Engineer, d-Matrix Corporation
Zifei is a Senior Machine Learning Research Engineer at d-Matrix. Her current work focuses on developing model quantization pipelines and efficient quantization algorithms. She graduated from Stanford University with a Master's degree in Computational & Mathematical Engineering and... Read More →
avatar for Tristan Webb

Tristan Webb

ML Engineer, d-Matrix
Tristan's background is primarily in computer science and mathematics, and which let him to graduate with a PhD in Complexity Science at the University of Warwick, where he worked with large computational neuroscience models of spiking neural networks using simulators written in C... Read More →
Thursday September 19, 2024 4:20pm - 4:30pm PDT
Gateway Pavilion - Cowell Theater

4:35pm PDT

Intel GPU in Upstream PyTorch: Expanding GPU Choices and Enhancing Backend Flexibility - Eikan Wang & Min Jean Cho, Intel
Thursday September 19, 2024 4:35pm - 5:00pm PDT
The integration of Intel GPU support into PyTorch marks a pivotal enhancement for PyTorch device and runtime. We generalized the PyTorch device and runtime to accommodate streaming devices. The generalization not only facilitates the deployment of PyTorch on ubiquitous hardware but also makes the integration of different HW backends easier. In addition, PyTorch with Intel GPU supports various Intel GPUs from the data center to the client. It enriches and democratizes PyTorch HW ecosystem. Particularly in AIPC scenarios where Intel's integrated and discrete GPUs are prevalent, Pytorch with Intel GPU can deliver promising performance and improved OOB experience in the AIPC domain that can extend PyTorch's applicability significantly.
Speakers
avatar for Eikan Wang

Eikan Wang

AI Frameworks Engineer, Intel
Eikan is a staff engineer from Intel and a DL framework tech lead having full-stack experience in DL, from various AI applications to framework, library, and DL compiler. He is actively optimizing on torch.compile stack for Intel platforms, including optimizing Inductor C++/OpenMP... Read More →
MJ

Min Jean Cho

Deep Learning Software Engineer, Intel Corporation
Thursday September 19, 2024 4:35pm - 5:00pm PDT
Festival Pavilion - Breakout Room B

4:35pm PDT

Unlocking the Enigma: Crafting Unbiased, Transparent, and Explainable Large Language Models - Rashmi Nagpal, Patchstack
Thursday September 19, 2024 4:35pm - 5:00pm PDT
In an era where artificial intelligence reigns supreme, the statistics are both perplexing and thought-provoking – only a mere 13% of large language models manage to transcend the realms of research and enter the practical world of production. Who bears the responsibility when these models err, spewing out biased or discriminatory outputs? It's time to demystify the complex landscape of machine learning ethics and carve a path towards a brighter, more accountable future! In this talk, firstly, we will navigate the profound impacts of large language models across diverse domains, from the lifesaving advances in medicine to safeguarding our nations through enhanced security protocols. Secondly, as we marvel at data-driven decisions laid by these models, we will confront the darker shadows cast by – the looming spectre of bias in the data. Finally, we will delve deep into the art of building interpretable models and navigating the maze of ethical considerations. Through a live demonstration in PyTorch, we will witness how to craft unbiased, transparent, and explainable models.
Speakers
avatar for Rashmi Nagpal

Rashmi Nagpal

Machine Learning Engineer, Patchstack
Rashmi, a passionate researcher at the MIT CSAIL and machine learning engineer at Patchstack, is dedicated to crafting beautiful AI applications. With nearly 5 years of industrial experience, she has brought ideas to life at pre-seed startups and contributed to impactful redesigns... Read More →
Thursday September 19, 2024 4:35pm - 5:00pm PDT
Gateway Pavilion - Cowell Theater

5:05pm PDT

Implementing a Custom Torch.Compile Backend - A Case Study - Maanav Dalal & Yulong Wang, Microsoft
Thursday September 19, 2024 5:05pm - 5:30pm PDT
This presentation will dive into the development of the ONNXRuntime (ORT) backend for torch.compile. We'll cover the implementation process, starting with a PyTorch 2.0 generated FX graph, highlighting the unique challenges encountered when serving ORT-specific scenarios and how we solved them. Attendees will gain insights into optimizing performance, overcoming integration hurdles, and achieving efficient execution. Whether you're a developer looking to extend PyTorch's capabilities for your own use cases, keen to learn about ONNX Runtime, or interested in backend performance optimization, and the many steps we've taken to get to where we are now, this session promises valuable takeaways and practical knowledge.
Speakers
YW

Yulong Wang

Software Engineer, Microsoft
ONNX Runtime Web
avatar for Maanav Dalal

Maanav Dalal

Program Manager, Microsoft
PM @Microsoft, working on the ONNX Exporter team. I adore learning about consumer tech and experimenting with bleeding edge software. I'm passionate about creating delightful user experiences.
Thursday September 19, 2024 5:05pm - 5:30pm PDT
Festival Pavilion - Breakout Room B

5:05pm PDT

The Ethical Implications of AI and the Environment: A Focus on Water - Amber Hasan, Ethical Tech AI & Senegal Tuklor Williams, Broken Pencil Pictures llc
Thursday September 19, 2024 5:05pm - 5:30pm PDT
Artificial Intelligence (AI) has the potential to revolutionize various sectors, including environmental conservation and water management. However, the deployment of AI technologies raises ethical questions about the environmental impact, particularly water resources. This presentation will discuss the ethical implications of AI concerning water while also exploring how AI can both positively and negatively affect water resources along with the broader ecosystem. My goal is to facilitate a critical conversation around how to balance technological advancements with environmental stewardship. Objectives: Understanding Ethical Implications: Provide an in depth overview of how AI impacts water resources. Focus on ethical concerns related to AI's water footprint, including, but not limited to energy consumption and water usage in data centers. Explore Positive Applications: Talk about the possible successful implementations of AI in water conservation, pollution monitoring, and efficient resource management. Discuss potential future applications where AI could contribute to sustainable water management and connect stakeholders to address ethical concerns and solutions.
Speakers
avatar for Amber Hasan

Amber Hasan

Owner, Ethical Tech AI
Amber Hasan is an interdisciplinary artist and community organizer focused on using Creative Practice as a tool for change. Amber is Co-Founder of The Sister Tour collective, she has worked with photographer LaToya Ruby Frazier regarding the Flint Water Crisis, she is a Board Member... Read More →
avatar for Senegal Tuklor Williams

Senegal Tuklor Williams

C.O.O., ETHICAL TECH AI
From the standpoint of Broken Pencil Pictures, we are a dynamic and multi-disciplinary creative cil company. Our achievements are a testament to our dedication to social change and the betterment of our community. "The Sister Tour" stands out as an initiative through which we distributed... Read More →
Thursday September 19, 2024 5:05pm - 5:30pm PDT
Gateway Pavilion - Cowell Theater

5:30pm PDT

The PyTorch Flare Party Sponsored by Hugging Face
Thursday September 19, 2024 5:30pm - 8:00pm PDT
Join us as we ignite the evening with something blazing hot.

THE NIGHT HEATS UP AT 7:45 PM! - You won't want to miss the sizzling finale of the PyTorch Flare Party!

It'll be LIT in more ways than one!

Thursday September 19, 2024 5:30pm - 8:00pm PDT
Gateway Pavilion - Sponsor Showcase
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Audience
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.