Loading…
Attending this event?
September 18-19, 2024
San Francisco, California
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Pacific Daylight Time (UTC-7). To see the schedule in your preferred timezone, please select from the drop-down located at the bottom of the menu to the right.

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Intermediate clear filter
arrow_back View All Dates
Thursday, September 19
 

10:50am PDT

Lightning Talk: d-Matrix LLM Compression Flow Based on Torch.Fx: Simplifying PTQ/QAT - Zifei Xu & Tristan Webb, d-Matrix Corporation
Thursday September 19, 2024 10:50am - 11:00am PDT
We introduce dmx-compressor, d-Matrix's open-source LLM compression toolkit that is modular, robust, efficient, and user-friendly. It utilizes symbolic tracing and fx.Transformer for network compression while keeping the model a first-class citizen in PyTorch for the user, despite prevalent graph dynamism in LLMs. It achieves this by maintaining both the original nn.Module and a just-in-time (JIT) traced and transformed fx.GraphModule representation behind the scenes, in conjunction with an abstraction that cleanly decouples network compression from the original model graph definition. This design allows the FXIR to dynamically adapt to diverse forward call signatures and flow-control arguments throughout quantization-aware training and post-training quantization written in plain PyTorch, yielding a compressed FXIR fully compatible with application-level APIs like the Hugging Face pipeline. We also provide a graph visualizer based on fx.Interpreter for ease of debugging. We believe this project shall empower the community to build efficient LLMs for deployment on custom hardware accelerators and contribute to the PyTorch ecosystem.
Speakers
avatar for Zifei Xu

Zifei Xu

Senior Machine Learning Research Engineer, d-Matrix Corporation
Zifei is a Senior Machine Learning Research Engineer at d-Matrix. Her current work focuses on developing model quantization pipelines and efficient quantization algorithms. She graduated from Stanford University with a Master's degree in Computational & Mathematical Engineering and... Read More →
avatar for Tristan Webb

Tristan Webb

ML Engineer, d-Matrix
Tristan's background is primarily in computer science and mathematics, and which let him to graduate with a PhD in Complexity Science at the University of Warwick, where he worked with large computational neuroscience models of spiking neural networks using simulators written in C... Read More →
Thursday September 19, 2024 10:50am - 11:00am PDT
Festival Pavilion - Breakout Room A
  Lightning Talks

10:50am PDT

The Rise of `Transformers` in the Growing PyTorch Ecosystem - Arthur Zucker, Hugging Face
Thursday September 19, 2024 10:50am - 11:15am PDT
Explore how the `tranformers` library grows and adapts to the fast paced and ever-changing AI field to bring the best to the AI community
Speakers
avatar for Arthur Zucker

Arthur Zucker

Core Maintainer, Hugging Face
Arthur is a Core maintainer at Hugging Face, maintaining several critical libraries such as transformers and tokenizers. He is the owner of the text and LLM parts of Hugging Face's open-source toolkits, resulting in the implementations of LLaMa, Mistral, MoEs, etc and torch.compile... Read More →
Thursday September 19, 2024 10:50am - 11:15am PDT
Festival Pavilion - Breakout Room B

11:05am PDT

Lightning Talk: LLMs on Edge with AI Accelerators - Chen Lai, Kimish Patel & Cemal Bilgin, Meta
Thursday September 19, 2024 11:05am - 11:15am PDT
LLMs are known to be compute heavy and consume lots of resources (almost all resources on phones), including memory and power. A natural thought is to leverage the AI hardware accelerators, for example, Apple Neural Engine (ANE) on Apple devices and HTP on Qualcomm SoCs, to make it run fast and efficiently. Only by optimizing the model latency, memory consumption and power usage to a certain level will users be interested in installing the models on their devices. In this session, we’d like to introduce how we leverage these AI accelerators within the PyTorch ecosystem to achieve the state-of-art performance for llama3 on device, via ExecuTorch and the partnership with Apple and Qualcomm. Hardware companies usually have their own AI accelerators. Likely they have different characteristics, one may support a list of different operators than others, and one may only support static shapes (like HTP). However, transformers-based optimization can be generic. We’ll discuss in more detail how we apply the generic optimization as well as the backend specific optimization. The techniques we applied here are not just for LLMs, but can be applied to other transformer-based models.
Speakers
avatar for Kimish Patel

Kimish Patel

Software Engineer, Meta Platforms
Kimish has worked on enabling PyTorch on Meta's family of apps, primarily focusing on performance optimizations. His past experiences include hardware/software co-design, CPU architecture, and CPU/GPU performance optimization.
avatar for Chen Lai

Chen Lai

Software Engineer, Meta
Software engineers focusing on bringing up accelerators on devices
avatar for CEMAL Bilgin

CEMAL Bilgin

Engineering Manager, Meta
Engineering Manager PyTorch Edge Acceleration
Thursday September 19, 2024 11:05am - 11:15am PDT
Festival Pavilion - Breakout Room A
  Lightning Talks

11:20am PDT

Sponsored Session: Torchchat: A Showcase of PyTorch LLM Ubiquity - Jack Khuu & Jesse White, Meta
Thursday September 19, 2024 11:20am - 11:45am PDT
This talk explores the journey of enabling LLMs in the PyTorch ecosystem, as well as how the teams behind AOT Inductor, ExecuTorch, and torchao collaborated to create torchchat, a showcase of PyTorch’s ability to run LLM inference everywhere.

Torchchat demonstrates the ubiquity, simplicity, and quality of PyTorch’s LLM support through performant, reproducible implementations for not only Python environments, but on desktop, server, and on-device as-well.

All of our work is open source and available on GitHub.
Speakers
avatar for Jack Khuu

Jack Khuu

Software Engineer, Meta
Software Engineer @ Meta working on the PyTorch Edge team. TL for torchchat, which is PyTorch's showcase of LLM inference ubiquity (Python, Desktops, Mobile, etc.). More broadly, I focus on the "Experience" of PyTorch Edge, encompassing User, Developer, and Community Experience.Ex-Lecturer... Read More →
avatar for Jesse White

Jesse White

Software Engineering Manager, Meta
Jesse is an engineering manager at PyTorch @ Meta, where he supports the Edge Experience team in improving the experience for on-device inference and training, including mobile, laptops, and embedded devices. With nearly 20 years of experience in startups, Jesse is passionate about... Read More →
Thursday September 19, 2024 11:20am - 11:45am PDT
Festival Pavilion - Breakout Room A
  Breakout Sessions

11:20am PDT

Training MoEs at Scale with PyTorch - Mihir Patel & Brian Chu, Databricks
Thursday September 19, 2024 11:20am - 11:45am PDT
Mixture-of-Experts MoE (models) are becoming an increasingly popular architecture choice for large language models (LLMs). In this talk, we describe how to train MoE models with PyTorch. After discussing various performance tradeoffs, we use PyTorch distributed tools like DTensor to build custom parallelism approaches, including expert parallelism via MegaBlocks. We then show how to get near linear scaling to thousands of GPUs, combining PyTorch FSDP and HSDP with our parallelism strategies. We discuss many of the challenges of training at scale, including communication bottlenecks, hardware failures, and networking challenges. We further improve training at scale setups using tools like PyTorch Distributed Checkpointing for rapid saving and loading. We then highlight further optimizations to minimize challenges only present at scale, such as object store failures for large checkpoints.
Speakers
avatar for Mihir Patel

Mihir Patel

Research Engineer, Databricks
Mihir Patel is a Research Engineer at MosaicML / Databricks, where he works on distributed training at scale and serves as the tech lead for Composer, an open-source deep learning training library. His primary focus is on large model training, and he has helped build several open... Read More →
avatar for Brian Chu

Brian Chu

Research Engineer, Databricks
Brian is a Research Engineer at MosaicML / Databricks, where he contributes to Composer and Foundry, open-source libraries for training LLMs. He has been involved in the DBRX project and products like the Databricks finetuning and pretraining API. Prior to joining Databricks, Brian... Read More →
Thursday September 19, 2024 11:20am - 11:45am PDT
Festival Pavilion - Breakout Room B

11:50am PDT

Lightning Talk: Empowering Developers: Tools and Resources for Running Generative AI on Arm CPUs - Pareena Verma, Arm
Thursday September 19, 2024 11:50am - 12:00pm PDT
As the demand for accessible and scalable AI solutions grows, leveraging CPUs for generative AI offers significant advantages in cost, energy efficiency and widespread availability. This sessions aims to equip developers with the ecosystem of tools, resources and technical content needed to effectively run generative AI use cases on Arm CPUs. We have launched a range of easily digestible tutorials for developers, part of our Learning Paths on https://learn.arm.com/, which demonstrate how you can easily and efficiently run small and large language models on Arm-based devices. Learn about end-to-end workflows to accelerate PyTorch based sentiment analysis models from Hugging Face on Arm servers with optimizations in Arm Compute Library kernels for fp32 and bfloat16. Use the new KleidiAI library to accelerate LLMs with AI frameworks and build an Android chat app on your Arm mobile device with ExecuTorch, and XNNPACK. Find out about our roadmap for learning content demonstrating the feasibility and successful deployment of generative AI on Arm-based devices. Help us shape the support that we offer developers.
Speakers
avatar for Pareena Verma

Pareena Verma

Principal Solutions Architect, Arm
Pareena is a Principal Solutions Architect at Arm. She has extensive experience working with software developers and SoC architects on numerous Arm based projects involving usage of modeling, ML frameworks, compilers, debuggers and virtual prototyping simulation tools. Pareena holds... Read More →
Thursday September 19, 2024 11:50am - 12:00pm PDT
Festival Pavilion - Breakout Room B

11:50am PDT

Lightning Talk: New Activation Checkpointing APIs in PyTorch - Jeffrey Wan & Horace He, Meta
Thursday September 19, 2024 11:50am - 12:00pm PDT
Activation checkpointing is a commonly used technique to reduce memory usage during model training by reducing the number of activations saved for backward. Instead of keeping tensors needed for backward alive until they are used in gradient computation during backward, those tensors are recomputed during the backward pass. This talk will introduce new activation checkpoint APIs that can help achieve a better trade off between memory savings and compute overhead that recomputing introduces.
Speakers
avatar for Horace He

Horace He

Software Engineer, Meta
To be filled
avatar for Jeffrey Wan

Jeffrey Wan

Software Engineer, Meta
Software Engineer working on PyTorch
Thursday September 19, 2024 11:50am - 12:00pm PDT
Festival Pavilion - Breakout Room A

12:00pm PDT

Lightning Talk: Fast, Scalable Distributed Training with StreamingDataset - Saaketh Narayan, Databricks
Thursday September 19, 2024 12:00pm - 12:10pm PDT
StreamingDataset makes training on large datasets from cloud storage as fast, cheap, and scalable as possible. It’s specially designed for multi-node, distributed training for large models — maximizing correctness guarantees, performance, and ease of use. Key features include elastically deterministic training, instant mid-epoch resumption, effective shuffling, high training throughput, and flexible data mixing, among other features. When training with StreamingDataset, the data shards are written to cloud storage in MDS, our file format that allows for low-latency random access to samples. By being as efficient as possible with shard downloads and shuffling, StreamingDataset minimizes egress costs while ensuring that dataloading never bottlenecks model training. StreamingDataset powers training for LLMs with over 100 billion parameters like DBRX, to advanced diffusion models, to two-tower recommendation models, and more, scaling to training jobs on thousands of GPUs with ease. Join us to learn how StreamingDataset can elevate your distributed model training experience.
Speakers
avatar for Saaketh Narayan

Saaketh Narayan

Machine Learning Engineer, Databricks
Saaketh Narayan is a machine learning engineer at Databricks. As part of the Mosaic AI Runtime team, he works on the GenAI training stack, including dataloading, training frameworks, and performance across the Mosaic Streaming, Composer, and LLM Foundry libraries.
Thursday September 19, 2024 12:00pm - 12:10pm PDT
Gateway Pavilion - Cowell Theater

12:00pm PDT

Lightning Talk: FlexAttention - The Flexibility of PyTorch + The Performance of FlashAttention - Yanbo Liang & Horace He, Meta
Thursday September 19, 2024 12:00pm - 12:10pm PDT
Introducing a novel abstraction leveraging the PyTorch compiler stack to enable custom, user-defined attention mechanisms. This new API supports dynamic modifications to attention scores within SDPA, providing both runtime and memory efficiency through kernel fusion with the FlashAttention algorithm.
Speakers
avatar for Yanbo Liang

Yanbo Liang

software engineer, Meta
I'm software engineer at PyTorch team working on torch.compile and LLM.
avatar for Horace He

Horace He

Software Engineer, Meta
To be filled
Thursday September 19, 2024 12:00pm - 12:10pm PDT
Festival Pavilion - Breakout Room A

12:10pm PDT

Lightning Talk: AOTriton: Ahead of Time Triton Kernel Libraries on ROCm - Jeff Daily, AMD
Thursday September 19, 2024 12:10pm - 12:20pm PDT
Scaled dot product attention provides significant acceleration of the transformer layer through fusion of the multihead attention layer. There are several different algorithms to achieve this but tiled attention through scaled dot product attention via Flash Attention is a very popular approach. In PyTorch on the ROCm platform this is currently achieved through ahead of time compiled (AOT) Triton kernels in a linkable archive. AMD’s work to enable and package these kernels is done through AOTriton, which aims to use Triton’s compiler and GPU kernels for faster development. AOTriton maintains an optimized set of tiling sizes and other parameters to provide optimized, pre-compiled Triton kernels. The differences between JIT and AOT are few but are very important. Despite this, prototyping kernels in Triton is much faster than template-based C++ libraries. In this presentation we will go into detail on the interaction layer between PyTorch and AOTriton, the structure of AOTriton and how to add new triton kernels to AOTriton.
Speakers
avatar for Jeff Daily

Jeff Daily

Principal Member of Technical Staff, Advanced Micro Devices
Jeff Daily is the chief architect of the Machine Learning Software Engineering group supporting ML frameworks such as PyTorch and onnxruntime on AMD GPUs.  He enjoys delivering open source software to answer the challenges of the rapidly-changing ML landscape.  For over five years... Read More →
Thursday September 19, 2024 12:10pm - 12:20pm PDT
Festival Pavilion - Breakout Room B

12:10pm PDT

Lightning Talk: Implementing and Using Iterable Datasets: What Could Go Wrong? - Nicolas Hug, Meta
Thursday September 19, 2024 12:10pm - 12:20pm PDT
PyTorch supports two kinds of datasets: Iterable datasets and indexable "map-style" datasets. Iterable datasets can be more flexible and potentially faster than their indexable cousins. They are also much harder to use correctly, and can easily lead to silently wrong results. This talk is a quick and fun intro to some of the traps that Iterable datasets lay out for you, with some tips to help you avoid them.
Speakers
avatar for Nicolas Hug

Nicolas Hug

Research Engineer, Meta
Nicolas is a software engineer in the PyTorch team at Meta, where he mainly contributes to the torchvision library. Prior to that, Nicolas was a research scientist at Columbia University, where he became part of the scikit-learn core development team. Nicolas holds a PhD in machine... Read More →
Thursday September 19, 2024 12:10pm - 12:20pm PDT
Gateway Pavilion - Cowell Theater
  Lightning Talks

12:10pm PDT

Lightning Talk: Making the Most of Heterogeneous Memory Capacity Using PyTorch - Syed Ahmed, NVIDIA Corporation
Thursday September 19, 2024 12:10pm - 12:20pm PDT
Memory intensive deep learning workloads require efficient use of all kinds of memories that are available in a system. In this session, we will discuss how we can utilize such heterogeneous memory through memory pools in PyTorch. We will show how to mix-and-match different CUDA system allocators in the same PyTorch program using memory pools. Consequently, this API unlocks new use cases such as Extended GPU Memory (EGM) based all-gathers, Unified Virtual Memory (UVM), and NVLink Sharp (NVLS) reductions. New NVIDIA architectures accelerate such use cases with high-bandwidth and low-latency interconnects in the hardware, driven by extended functionality of CUDA system allocators in the software. Learn how to use these techniques on memory-intensive deep learning models like LLMs, and discover new CUDA features powered by PyTorch.
Speakers
avatar for Syed Ahmed

Syed Ahmed

Senior Software Engineer, NVIDIA
Syed Ahmed is a Senior Software Engineer on the PyTorch Core team at NVIDIA, focused on keeping PyTorch fast and numerically stable on current NVIDIA platforms, and making PyTorch more expressive on future NVIDIA platforms. He holds a Master’s degree in Electrical Engineering from... Read More →
Thursday September 19, 2024 12:10pm - 12:20pm PDT
Festival Pavilion - Breakout Room A

2:15pm PDT

vLLM: Easy, Fast, and Cheap LLM Serving for Everyone - Woosuk Kwon & Xiaoxuan Liu, UC Berkeley
Thursday September 19, 2024 2:15pm - 2:40pm PDT
We will present vLLM, an open-source high-performance LLM inference engine built on top of PyTorch. Starting as a research project at UC Berkeley, vLLM has been one of the fastest and most popular LLM inference solutions in industry, reaching 20K+ stars and 350+ contributors. In this talk, we will cover how vLLM adopts various LLM inference optimizations and how it supports various AI accelerators such as AMD GPUs, Google TPUs, and AWS Inferentia. Also, we will discuss how vLLM benefits from PyTorch 2 and its ecosystem.
Speakers
avatar for Lily Liu

Lily Liu

Student, UCB
Lily (Xiaoxuan) Liu is a PhD student at UC Berkeley, working with Professors Ion Stoica and Alvin Cheung. Her research focuses on machine learning systems, particularly optimizing latency for LLM inference and addressing memory bottlenecks in LLM systems. Her recent work explores... Read More →
avatar for Woosuk Kwon

Woosuk Kwon

PhD Student, UC Berkeley
Woosuk Kwon is a Ph.D. student at UC Berkeley, advised by Prof. Ion Stoica. He is interested in building practical, flexible, and high-performance software systems for emerging applications such as large language models. Recently, he has been developing vLLM, a high-performance open-source... Read More →
Thursday September 19, 2024 2:15pm - 2:40pm PDT
Festival Pavilion - Breakout Room B

2:45pm PDT

Lightning Talk: What's New for PyTorch Developer Infrastructure - Sahan Paliskara & Catherine Lee, Meta
Thursday September 19, 2024 2:45pm - 2:55pm PDT
Having a chat about all of the work being done to continue supporting PyTorch's Developer Infrastructure needs including updates around Target Determination, Releases, and OSS Tooling.
Speakers
avatar for Catherine Lee

Catherine Lee

Software Engineer, META
Software engineer on the PyTorch Dev Infra team primarily working on reducing time to signal, testing infrastructure, and CI related developer tooling.
avatar for Sahan Paliskara

Sahan Paliskara

Software Engineer, Meta
After spending a lot of time using PyTorch to train computer vision models, Sahan joined the PyTorch team three years ago. He started off working on inference and packaging, and now he's part of the dev infra team. These days, he's involved in everything from managing releases to... Read More →
Thursday September 19, 2024 2:45pm - 2:55pm PDT
Festival Pavilion - Breakout Room A

2:45pm PDT

Blobs to Clips: Efficient End-to-End Video Data Loading - Andrew Ho & Ahmad Sharif, Meta
Thursday September 19, 2024 2:45pm - 3:10pm PDT
The PyTorch team has improved training speed by an order of magnitude for teams at Meta working on Small-to-Large-Scale MultiModal Video models. In this talk we’ll share our learnings on reducing GPU starvation by overcoming data loading challenges such as dealing with large distributed datasets, worker imbalance, compute-bottlenecks due to parallel video decoding and sampling, checkpointing, and debuggability. As part of our commitment to open-source, we are releasing a new decoding library and updating existing PyTorch libraries on GitHub, and invite feedback and contributions from the community.
Speakers
avatar for Ahmad Sharif

Ahmad Sharif

Software Engineer, Meta
SWE in Pytorch Content Domains Past: SWE at Google in Search, Privacy, ChromeOS
avatar for Andrew Ho

Andrew Ho

Machine Learning Engineer, Meta Platforms
We are ML Engineers at Meta on PyTorch working on multi-modal LLM dataloading
Thursday September 19, 2024 2:45pm - 3:10pm PDT
Gateway Pavilion - Cowell Theater

2:45pm PDT

Torchtitan: Large-Scale LLM Training Using Native PyTorch 3D Parallelism - Wanchao Liang, Meta & Linsong Chu, IBM Research
Thursday September 19, 2024 2:45pm - 3:10pm PDT
torchtitan is a proof-of-concept for Large-scale LLM training using native PyTorch. It is a repo that showcases PyTorch's latest distributed training features in a clean, minimal codebase. We show-cased end to end large scale training features enablement: 1. 3D/4D Parallelism 2. Efficient distributed checkpoint save/load/resharding 3. Many efficient training techniques including Float8, torch.compile, activation checkpoint, etc.
Speakers
avatar for Wanchao Liang

Wanchao Liang

Software Engineer, Meta Platforms, Inc.
Software Engineer at Meta, PyTorch team Tech Lead in PyTorch Distributed training. Author of torchtitan, Tensor Parallel and DTensor, a fundamental distributed abstraction to perform distributed computation. Previously worked on the TorchScript compiler, ONNX.
avatar for LINSONG CHU

LINSONG CHU

Senior Technical Staff Member, IBM Research
Linsong is a STSM at IBM Research, focusing on FSDP, torch compile and FP8 in the area of pre-training.
Thursday September 19, 2024 2:45pm - 3:10pm PDT
Festival Pavilion - Breakout Room B

3:00pm PDT

Lightning Talk: PyTorch Release Process - Andrey Talman, Meta
Thursday September 19, 2024 3:00pm - 3:10pm PDT
I would like to present and quickly discuss PyTorch Release process, how it happens. What are milestones. What is our cherry-picking criteria, how we validate the release.
Speakers
avatar for Andrey Talman

Andrey Talman

Software Engineer, Meta Inc.
Software Engineer - Meta Inc. 2021-Present Part of PyTorch Dev Infra team. Working on PyTorch OSS Releases. Lead Software Engineer - Dow Jones & Company 2019-2021 Part of the team developing software and the API Services used by Dow Jones Factiva website and WSJ. Software Engineer... Read More →
Thursday September 19, 2024 3:00pm - 3:10pm PDT
Festival Pavilion - Breakout Room A

3:15pm PDT

Slaying OOMs - Mark Saroufim & Jane Xu, Meta
Thursday September 19, 2024 3:15pm - 3:40pm PDT
Have you ever hit an OOM (and wished you had more VRAM)? Who hasn't! Hop on the bus with us and feel the road become smoother as we talk about stacking together techniques like FSDP2 + QLoRa + CPU Offloading + Fused ADAM (thanks Intel) + more in PyTorch native. We will give an overview of these techniques as well as the hard edges we solved in their composition. Curious for more? Or...still OOMing? We also plan on discussing our more researchy work on offloading, pagedness, and low precision optimizers.
Speakers
avatar for Jane Xu

Jane Xu

SWE, Meta
I'm Jane and I work on the PyTorch core library! Tell me your favorite optimizer, complain to me about your latest OOM, teach me about what you’re excited about.
avatar for Mark Saroufim

Mark Saroufim

Software Engineer, Meta
Mark Saroufim is a PyTorch Engineer at Meta working on inference, compilers and community.
Thursday September 19, 2024 3:15pm - 3:40pm PDT
Festival Pavilion - Breakout Room B

3:15pm PDT

Torch.Compile for Autograd, DDP and FSDP - Will Feng , Chien-Chin Huang & Simon Fan, Meta
Thursday September 19, 2024 3:15pm - 3:40pm PDT
In this talk, we will present the latest advancements in torch.compile for distributed training via DDP and FSDP. We will first introduce Compiled Autograd, a torch.compile mode to fully capture the backpropagation step, including the communication collective operators used in distributed. We will then cover the improvements this new approach brought to Compiled DDP/FSDP, notably by removing DDP/FSDP graph breaks which brings the potential of improving compute/communication overlap.
Speakers
CH

Chien-Chin Huang

Software Engineer, Meta
Software Engineer, PyTorch Distributed, Meta
avatar for Simon Fan

Simon Fan

Software Engineer, Meta
I'm a software engineer on the PyTorch Compiler team, I focus on torch.compile for distributed training frameworks.
avatar for Will Feng

Will Feng

Software Engineer, Meta Platforms, Inc.
Will Feng is a Software Engineer in PyTorch Compiler team at Meta. He has been working in PyTorch core and ecosystem for the past 7 years. He is now working on and most excited about torch.compile for distributed training performance.
Thursday September 19, 2024 3:15pm - 3:40pm PDT
Festival Pavilion - Breakout Room A

4:05pm PDT

Lightning Talk: Debiasing the Data Lifecycle - Shailvi Wakhlu, Shailvi Ventures LLC
Thursday September 19, 2024 4:05pm - 4:15pm PDT
Biased data, results in biased decision-making. Making sure that at every step of the data lifecycle, we make conscious attempts to debias the data is an important responsibility for all data scientists. In this talk, I highlight the typical data lifecycle, and how to prevent biases at every step. ---- The key takeaways from my talk include: 1) Understanding the data lifecycle 2) What are the typical ways biases creep in 3) How we can proactively prevent and fix biases in data
Speakers
avatar for Shailvi Wakhlu

Shailvi Wakhlu

Founder, Shailvi Ventures LLC
Shailvi is a seasoned Data Leader and Self-Advocacy Expert with over sixteen years of experience building technology products. She has spoken at nearly 100 global conferences and Fortune 500 events, coached close to 500 individuals, and authored the best-selling book "Self-Advocacy... Read More →
Thursday September 19, 2024 4:05pm - 4:15pm PDT
Festival Pavilion - Breakout Room A

4:05pm PDT

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA
Thursday September 19, 2024 4:05pm - 4:30pm PDT
Understanding how to effectively size a production grade LLM deployment requires understanding of the model(s), the compute hardware, quantization and parallelization methods, KV Cache budgets, input and output token length predictions, model adapter management and much more. - Why LLM inference is different to standard deep learning inference - Current and future NVIDIA GPU overview - which GPU(s) for which models and why - Understanding the importance of building inference engines - Deep recap on the attention mechanism along with different types of popular attention mechanisms used in production - Deep dive on KV Cache and managing KV Cache budgets - Parallelism (reducing latency) - mainly tensor parallelism, but data, sequence, pipeline, and expert parallelism will be highlighted - Quantization methods on weights, activations, and KV Cache to reduce engine sizes for more effective GPU utilization - Increasing throughput with inflight batching and other techniques - Detailed performance analysis of LLM deployments looking at Time to first token, inter-token latencies, llm deployment characterizations, and more that can help reduce deployment costs
Speakers
avatar for Mark Moyou

Mark Moyou

Sr. Data Scientist, NVIDIA
Dr. Mark Moyou Senior Data Scientist at NVIDIA working with enterprise clients on AI strategy and deploying machine learning applications to production. He is the host of the Caribbean Tech Pioneers Podcast, The AI Portfolio Podcast and is the Director of the Optimized AI Confere... Read More →
Thursday September 19, 2024 4:05pm - 4:30pm PDT
Festival Pavilion - Breakout Room B

4:35pm PDT

Unlocking the Enigma: Crafting Unbiased, Transparent, and Explainable Large Language Models - Rashmi Nagpal, Patchstack
Thursday September 19, 2024 4:35pm - 5:00pm PDT
In an era where artificial intelligence reigns supreme, the statistics are both perplexing and thought-provoking – only a mere 13% of large language models manage to transcend the realms of research and enter the practical world of production. Who bears the responsibility when these models err, spewing out biased or discriminatory outputs? It's time to demystify the complex landscape of machine learning ethics and carve a path towards a brighter, more accountable future! In this talk, firstly, we will navigate the profound impacts of large language models across diverse domains, from the lifesaving advances in medicine to safeguarding our nations through enhanced security protocols. Secondly, as we marvel at data-driven decisions laid by these models, we will confront the darker shadows cast by – the looming spectre of bias in the data. Finally, we will delve deep into the art of building interpretable models and navigating the maze of ethical considerations. Through a live demonstration in PyTorch, we will witness how to craft unbiased, transparent, and explainable models.
Speakers
avatar for Rashmi Nagpal

Rashmi Nagpal

Machine Learning Engineer, Patchstack
Rashmi, a passionate researcher at the MIT CSAIL and machine learning engineer at Patchstack, is dedicated to crafting beautiful AI applications. With nearly 5 years of industrial experience, she has brought ideas to life at pre-seed startups and contributed to impactful redesigns... Read More →
Thursday September 19, 2024 4:35pm - 5:00pm PDT
Festival Pavilion - Breakout Room A
  Breakout Sessions

5:05pm PDT

Implementing a Custom Torch.Compile Backend - A Case Study - Maanav Dalal & Yulong Wang, Microsoft
Thursday September 19, 2024 5:05pm - 5:30pm PDT
This presentation will dive into the development of the ONNXRuntime (ORT) backend for torch.compile. We'll cover the implementation process, starting with a PyTorch 2.0 generated FX graph, highlighting the unique challenges encountered when serving ORT-specific scenarios and how we solved them. Attendees will gain insights into optimizing performance, overcoming integration hurdles, and achieving efficient execution. Whether you're a developer looking to extend PyTorch's capabilities for your own use cases, keen to learn about ONNX Runtime, or interested in backend performance optimization, and the many steps we've taken to get to where we are now, this session promises valuable takeaways and practical knowledge.
Speakers
YW

Yulong Wang

Software Engineer, Microsoft
avatar for Maanav Dalal

Maanav Dalal

Program Manager, Microsoft
PM @Microsoft, working on the ONNX Exporter team. I adore learning about consumer tech and experimenting with bleeding edge software. I'm passionate about creating delightful user experiences.
Thursday September 19, 2024 5:05pm - 5:30pm PDT
Festival Pavilion - Breakout Room B
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Audience
  • Slides Attached
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -