DAIS21 Meetup: Machine Learning Frameworks, Model Management and Operations

This meetup was part of the Data + AI Summit 2021. It was ran live hosted by Jules Damji and Srijith Rajamohan who gave a recap of Summit + two tech talks by. subject matter experts in Machine Learning Frameworks, Model Management and Operations. They will shared their insights into ML frameworks, ML platforms, model lifecycle management and operations.

0:00 Starting soon
01:56 Building a Unified Machine Learning Monitoring Solution by Max
34:26 FlexFlow presentation
1:05:45 Hugging Face

Talk One
Title: Building a Unified Machine Learning Monitoring Solution in Databricks by Max Fisher

Abstract: Today, many customers leverage a variety of tools for monitoring models in production which leads to a confusing array of dashboards and reports. This talk will focus on how your team can leverage the Databricks workspace to unify the monitoring of your models and data for drift, and even facilitating the retraining part of the ML Lifecycle. The demo will cover how each part of the Databricks ecosystem is crucial towards building a solution that truly unifies model management in Databricks.

Talk Two
Title: FlexFlow: Automatically Discovering Fast and Scalable Parallelization Strategies for ML Training

Abstract: Existing deep learning frameworks commonly parallelize model training using manually designed strategies (e.g., combinations of data and model parallelism), but these strategies often result in suboptimal parallelization performance due to the increasing complexity of today's DNN models and parallel machine architectures.

FlexFlow (https://flexflow.ai/) is a distributed deep learning engine that supports training DNN models written in PyTorch, TensorFlow Keras, and ONNX. It identifies parallelization dimensions not considered in existing frameworks and automatically discovers fast and scalable parallelization strategies for a specific parallel machine. Companies and national labs are using FlexFlow to train production ML models that do not scale well in current frameworks, achieving over 10x performance improvement.

Talk Three
Tech talk on Hugging Face
How hard it is to put state-of-the-art machine learning models in production? We'll share some examples from Hugging Face transformers.

#### Speakers ####

**Clément Delangue is co-founder and CEO of Hugging Face, a 100,000+ member community democratizing AI through open source and open science. Clement started his carreer at Moodstocks, a machine learning startup for computer vision which got acquired by Google. He is passionate about building AI products.

** Max Fisher is a Solutions Architect at Databricks based out of Chicago. Before joining Databricks, Max worked at Microsoft for three years helping customers build enterprise data platforms on Azure with a primary focus on Azure Databricks and the broader Azure Data + AI stack. When Max is not busy helping customers build on Databricks, he can be found running up and down Chicago's lakefront, reading, or obsessing over college basketball (Go Illini!).

** Zhihao Jia is a research scientist at Facebook and will join CMU as an assistant professor of computer science in Fall 2021. He obtained his Ph.D. from Stanford working with Alex Aiken and Matei Zaharia. His research interests lie in the intersection of computer systems and machine learning, with a focus on building efficient, scalable, and high-performance systems for ML computations.
Be the first to comment