Loading…
Attending this event?
September 18-19, 2024
San Francisco, California
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Pacific Daylight Time (UTC-7). To see the schedule in your preferred timezone, please select from the drop-down located at the bottom of the menu to the right.

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Thursday September 19, 2024 11:35am - 11:45am PDT
StreamingDataset makes training on large datasets from cloud storage as fast, cheap, and scalable as possible. It’s specially designed for multi-node, distributed training for large models — maximizing correctness guarantees, performance, and ease of use. Key features include elastically deterministic training, instant mid-epoch resumption, effective shuffling, high training throughput, and flexible data mixing, among other features. When training with StreamingDataset, the data shards are written to cloud storage in MDS, our file format that allows for low-latency random access to samples. By being as efficient as possible with shard downloads and shuffling, StreamingDataset minimizes egress costs while ensuring that dataloading never bottlenecks model training. StreamingDataset powers training for LLMs with over 100 billion parameters like DBRX, to advanced diffusion models, to two-tower recommendation models, and more, scaling to training jobs on thousands of GPUs with ease. Join us to learn how StreamingDataset can elevate your distributed model training experience.
Speakers
avatar for Saaketh Narayan

Saaketh Narayan

Machine Learning Engineer, Databricks
Saaketh Narayan is a machine learning engineer at Databricks. As part of the Mosaic AI Runtime team, he works on the GenAI training stack, including dataloading, training frameworks, and performance across the Mosaic Streaming, Composer, and LLM Foundry libraries.
Thursday September 19, 2024 11:35am - 11:45am PDT
Room C

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link