Train In Data
Talk Abstract: Building and Deploying Reproducible Machine Learning Pipelines
Deployment of machine learning (ML) models, or simply, putting ML models into production, is fundamentally about bridging the gap between the research environment and live systems. Successful deployments make our models available so they can be easily accessed by both internal and external systems, depending on business requirements. Once our ML models are deployed, other systems can send input data to these models and receive back predictions. Only through effective machine learning model deployment can we maximize the business value of the models we build. When we think about data science, we think about how to build machine learning models. We think about which algorithm will be more predictive, how to engineer our features and which variables to use to make the models more accurate. However, the “last mile” of planning how to use the models in production is often neglected, despite its critical importance. Machine learning systems have all the usual challenges of software development, combined with additional data science-specific challenges, which means that deployments and system architecture require careful planning. This is a realisation that many individuals and organisations make when it is too late. In this talk, we will discuss the steps and challenges involved in putting a machine learning model into production. We will cover setting up an effective machine learning pipeline for feature engineering, feature selection and model building. We will describe the architecture of the research and production environments and how they can be connected. We will highlight the challenges to obtaining reproducible models between the two environments and how to ensure reproducibility. Finally we will present a machine learning pipeline solution that tackles these problems.
Bio: Soledad is a Lead Data Scientist at the insurance company LV= and a Udemy instructor of machine learning courses. Soledad has 2+ years of experience in data science and analytics in finance and insurance, and 10+ years of experience in scientific research in academia. Having transitioned from academia to data science, Soledad is passionate about enabling data scientists and academics to transition into the field, and helping data scientists increase their breadth of knowledge. Over the last 2 years, Soledad shared insights in blogs and talks for the data science community. She also created 3 online courses on machine learning which are live in Udemy and continue to receive excellent reviews from students. Soledad is passionate about extracting meaningful information from data and supporting people and organisations make solid and reliable data driven decisions. At LV=, Soledad is leading the implementation of machine learning across the multiple company’s business areas. At Udemy, she is teaching 3000+ students around the globe. And in her free time, she is supporting a team of conservationists in Namibia to understand human-wildlife interaction with data analysis and statistics.