Schedule
π WANT poll - Tell us your insights and thought about efficient training of neural networks! Your vote does matter! (Poll results)
π WANT page at OpenReview - Accepted papers (Orals & Posters) are here!
π WANT page at Whova - Add to your NeurIPS agenda!
Workshop on Advancing Neural Network Training (WANT) will take place on December 16, 2023
- offline: at the venue of the NeurIPS 2023 conference in New Orleans, USA, room 243-245,
- online: with streaming from the venue π₯, poster session and networking in Gather Town π°.
Time (New Orleans) | Morning |
---|---|
08:15 - 08:50 | Poster placement |
08:50 - 09:00 | Welcome speech from Organizers π₯ β slides Julia Gusak |
09:00 - 09:30 | Invited talk π₯ β slides A data-centric view on workflows that couple HPC with large-scale models Ana Gainaru In recent years, scientific computing workloads at HPC facilities have been undergoing a significant shift. While traditionally dominated by numerical simulations, these facilities are increasingly handling AI/ML applications for training and inference, processing and producing ever-increasing amounts of scientific data. Despite the focus on optimizing the execution of new AI/HPC workflows, little attention has been paid to the I/O runtime challenges they present. This talk aims to address that gap by analyzing these emerging trends from an I/O perspective. We will explore the performance of the multilayer high-performance I/O systems under the strain of these new workflows that combine traditional HPC techniques with AI interacting in new challenging ways. |
09:30 - 10:00 | Invited talk π₯ β slides Rematerialization algorithms for Memory-efficient learning Lionel Eyraud-Dubois The training phase of Deep Neural Networks is often a very memory-intensive procedure, where large amounts of intermediate data have to be kept in memory during one iteration. One possible approach to reduce memory usage is rematerialization, aka gradient checkpointing, where some intermediate data are recomputed when needed rather than kept in memory. This provides a tradeoff between memory usage and recomputation time. In this talk I will present several approaches for the optimization problem, where one wants to minimize the recomputation time given a fixed memory budget. The corresponding algorithms have been implemented in easy-to-use libraries for the PyTorch framework, which can significantly reduce memory usage with reasonable overhead |
10:00 - 10:30 | Coffee break π° |
10:30 - 11:00 | Invited talk π₯ β slides Navigating the Landscape of Enormous AI Model Training Yang You The proliferation of large models based on Transformers has outpaced advances in hardware, resulting in an urgent need for the ability to distribute enormous models across multiple GPUs. Despite this increasing need, the absence of established best practices for selecting an optimal strategy persists, owing to the extensive expertise required in High-Performance Computing (HPC), Deep Learning (DL), and distributed systems. These challenges have motivated both AI and HPC developers to delve into pivotal questions: How can the training and inference efficiency of large models be enhanced to minimize costs? How can larger AI models be accommodated, even with limited resources? What measures can be taken to facilitate broader community access to large models and large-scale applications? In this talk, I will discuss potential solutions to these challenges by exploring hybrid parallelisms, heterogeneous memory management, and the design of user-friendly frameworks such as our open-source systemic solution: Colossal-AI (https://github.com/hpcaitech/ColossalAI). |
11:00 - 11:30 | Invited talk π₯ β slides Enabling efficient trillion parameter scale training for deep learning models Tunji Ruwase Deep Learning (DL) is driving unprecedented progress in a wide range of Artificial Intelligence domains, including natural language processing, vision, speech, and multimodal. However, sustaining this AI revolution requires practical solutions to the extreme demands of model scaling on the compute, memory, communication and storage components of modern computing hardware. To address this challenge, we created a deep learning optimization library called DeepSpeed to make distributed model training and inference efficient, effective, and easy on commodity hardware. This talk will focus on DeepSpeed training optimizations, particularly on ZeRO and DeepSpeed-MoE, which help to address the memory and compute requirements of extreme model scaling. |
11:30 - 12:00 | Contributed talks π₯ |
11:31 - 11:36 | Contributed talk π₯ Training and inference of large language models using 8-bit floating point Sergio Perez, Yan Zhang, James Briggs, Charles Blake, Josh Levy-Kramer, Paul Balanca, Carlo Luschi, Stephen Barlow, Andrew Fitzgibbon |
11:37 - 11:42 | Contributed talk π₯ MatFormer: Nested Transformer for Elastic Inference Fnu Devvrit, Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, Kaifeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, Prateek Jain |
11:43 - 11:48 | Contributed talk π₯ Sparse Backpropagation for MoE Training Liyuan Liu, Jianfeng Gao Β· Weizhu Chen |
11:49 - 11:54 | Contributed talk π₯ Efficient Parallelization Layouts for Large-Scale Distributed Model Training Johannes Hagemann, Samuel Weinbach, Konstantin Dobler, Maximilian Schall, Gerard de Melo |
11:55 - 12:00 | Contributed talk π₯ CoTFormer: More Tokens With Attention Make Up For Less Depth Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi |
Time (New Orleans) | Afternoon |
---|---|
12:00 - 13:00 | Lunch π° |
13:00 - 13:30 | Lunch (offline) | Poster session (Gather Town) π° |
13:30 - 14:00 | Poster session (offline + Gather Town) π° |
14:00 - 14:30 | Invited Talk π₯ β slides Crafting Computational Efficiency for Large Models: Training Recipes, Scaling Strategies and Sparsity Sorcery with Specialized Hardware Natalia Vassilieva Large models are shifting βwhatβs possibleβ with AI. Brute-force scaling of model parameter count increases model capacity, and when presented with enough training data, has shown remarkable results. However, the advantages of large-scale models come at a price of steep increase in system complexity and infrastructure cost. Training and serving these models is an engineering challenge and is very expensive. Even minor errors in model design or training procedure can result in significant waste of resources. At Cerebras we have trained our share of large language models and learned along the way how to train these models efficiently to get βthe biggest bang for the buckβ. In this talk we will share our experience and insights from training various LLMs. In addition to techniques for compute efficient training of dense models, we will look into benefits of sparse training and inference on Cerebras hardware, designed to take full advantage of all types of sparsity. |
14:30 - 15:00 | Invited Talk π₯ The MosaicML Approach to LLM Training Johnathan Frankle, Naveen Rao In this talk, I will describe the many tools and approaches that MosaicML uses to train its LLMs. We rely heavily on and contribute to a variety of open-source frameworks that form the backbone of our product. Since our business is to make it possible for anyone to train their own LLM from scratch, our stack must be robust to many different data distributions and use-cases, and it must be simple, straightforward, and extensible enough for a wide variety of end users to work with. This presents unique demands and constraints that have shaped the way we build our toolchain. |
15:00 - 15:30 | Coffee break π° |
15:30 - 16:00 | Invited Talk π₯ Efficient LLM Training and Inference on GPUs Mohammad Shoeybi, Bryan Catanzaro Training and inference of large transformer models is one of the most important computational challenges of modern AI. Systems for training these models must be highly scalable and run at extreme efficiency, because the amount of work necessary to converge a model can be extraordinarily large. Inference needs to be fast and accommodate different query sizes. In this talk, Iβll discuss the work we have been doing at NVIDIA to optimize systems for Large Language Model training and inference on GPUs. I will present different parallelism techniques we are using in our LLM framework Megatron-LM and will discuss how parallelism techniques can be combined to maximize the training throughput of large models while retaining strict optimizer semantics. I will discuss optimizations techniques for inference and methods to accelerate inference and reduce memory fragmentation. |
16:00 - 16:50 | Panel Discussion π₯ β slides Ana Gainaru, Lionel Eyraud-Dubois, Tunji Ruwase, Natalia Vassilieva, Mohammad Shoeybi, Jean Kossaifi |
16:50 - 17:00 | Closing remarks π₯ |
17:00 - 17:30 | Poster session (offline + Gather Town) π° |