Name: Keynote: Navigating the Architectural Timeline of LLMs - Sebastian Raschka, Staff Research Engineer, Lightning AI
Start: 2024-09-19T09:24:00-0700
End: 2024-09-19T09:39:00-0700

September 18-19, 2024
San Francisco, California
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Pacific Daylight Time (UTC-7). To see the schedule in your preferred timezone, please select from the drop-down located at the bottom of the menu to the right.

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Thursday September 19, 2024 9:24am - 9:39am PDT

Festival Pavilion - Keynote Room

The evolution of large language models (LLMs) from the original Generative Pre-trained Transformer (GPT) series to the recent advancements seen in models like Llama 3 has been accompanied by several architectural and methodological innovations. This talk aims to catch attendees up on the latest AI and LLM development trends, highlighting the key changes and motivations that led to the development of recent state-of-the-art LLMs, such as Llama 3.1.

Specifically, this presentation explores key developments in attention mechanisms, such as sliding window attention, group query, multi-query attention, and FlashAttention, and explains their key motivations and advantages. In addition to exploring the structural changes, this presentation also reviews the recent "tricks of the trade" that have improved the training processes and performance of the latest LLMs. This includes the recent two-step pretraining approach in Llama 3.1 and applying knowledge distillation techniques using real datasets like Gemma 2 and synthetic data, as seen in Llama 3.1.

Moreover, we will also examine the integration of system-level optimizations, such as the Mixture of the Expert method and the hybrid model Samba, which combines Mamba techniques with attention mechanisms and illustrates a broader trend toward more specialized and efficient architectures.

This talk will provide attendees with an understanding of the most notable transformations that have defined the architectural timeline of LLMs.

Speakers

Sebastian Raschka, PhD

Staff Research Engineer, Lightning AI

Sebastian Raschka, PhD, has been working in machine learning and AI for more than a decade. In addition to being a researcher, Sebastian has a strong passion for education. He is known for his bestselling books on machine learning with Python and his contributions to open source.Sebastian... Read More →

Thursday September 19, 2024 9:24am - 9:39am PDT
Festival Pavilion - Keynote Room

Keynote Sessions

PyTorch Conference 2024

Sebastian Raschka, PhD

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!