Accelerating AI using distributed model training at Stitch Fix
Dec 07, 2023
Room 1A12
![Practitioners Stage](https://cdn.asp.events/CLIENT_Informa__AADDE28D_5056_B739_5481D63BF875B0DF/sites/ai-summit-NY-2022/media/icons/seminars/NY-Site-Icons-2022---Practitioner.png)
- Stitch Fix utilizes a sophisticated multi-tiered recommender system stack, encompassing feature generation, scoring, ranking, and business policy decision-making. This presentation delves into the training architecture of the scoring model, a deep learning model that predicts the likelihood of a user purchasing an item.
- Give a walkthrough of our journey transitioning from training on a single GPU to multiple GPUs.
- Highlight the benefit of Distributed Data Parallel (DDP) training methodology.
- Present empirical results scaling up training from 1 to N GPUs.
- System design considerations that went into our decision making