Train In Data - Udemy
Talk Abstract: Building and Deploying Reproducible Machine Learning Pipelines
Deployment of machine learning (ML) models, or simply, putting ML models into production, is fundamentally about bridging the gap between the research environment and live systems. Successful deployments make our models available so they can be easily accessed by both internal and external systems, depending on business requirements. Once our ML models are deployed, other systems can send input data to these models and receive back predictions. Only through effective machine learning model deployment can we maximize the business value of the models we build. When we think about data science, we think about how to build machine learning models. We think about which algorithm will be more predictive, how to engineer our features and which variables to use to make the models more accurate. However, the “last mile” of planning how to use the models in production is often neglected, despite its critical importance. Machine learning systems have all the usual challenges of software development, combined with additional data science-specific challenges, which means that deployments and system architecture require careful planning. This is a realisation that many individuals and organisations make when it is too late. In this talk, we will discuss the steps and challenges involved in putting a machine learning model into production. We will cover setting up an effective machine learning pipeline for feature engineering, feature selection and model building. We will describe the architecture of the research and production environments and how they can be connected. We will highlight the challenges to obtaining reproducible models between the two environments and how to ensure reproducibility. Finally we will present a machine learning pipeline solution that tackles these problems.
Bio: Chris is a Machine Learning Software Engineer at Babylon Health. He’s been writing code for eight years, and for the past three years, his work has focused on scaling machine learning applications. He has worked in various roles in Healthtech, Fintech and consulting. Chris enjoys sharing ideas, and to that end ran the Beijing Python meetup for two years, mentors junior developers, and continues to write software development tutorials and guides. He is the co-author of the popular Udemy course “Deployment of Machine Learning Models”.