TL;DR; A way to analyze current and define future ML processes automation level.
Problem: Building ML infrastructure is hard, showing the business value of ML infrastructure is even harder.
Context: This approach has been taken in an NLP-focused company. Dispute ML-based services can be the main business of a company, but it does not mean that internal ML processes automation is prioritized at the same level as product features.
Suggested solution:
Where we are? Where we should be? How to achieve it sooner? (c) Managers.
1. Where we are?
First and most important - it starts with the analysis of current ML processes and their automation level.
Based on your company's workflow, create a simple flowchart of ML model creation phases and processes.
TODO: Explain each process and its purpose.
Define processes actors and fill in processes with different colours, depending on the automation level you think it achieved at the current level.
Automation level legend:
Automated - if you feel the process is automated and you only do minimal required actions (like press a button or execute a simple line(s) of code);
Semi-Automated - if the process requires your analysis of results, or action to improve results, or action to find something you need;
Non repeatable - you need to figure out a way of doing this process multiple times, meaning there is no strictly defined way of doing the process;
Exceptionally manual - the process does not exist without humans leading it from the beginning to the end with limited or absent automation tools. For example, copy model training results from console output to a notion.so doc, reformatting it as a table, uploading to excel and building excel charts to be pasted back to the notion page.
Transparent with dash borders (Process is absent) - never done by anyone, but something that still can be beneficial to start doing.
Now that you know where the company processes are, it is time to define where the processes should be.
2. Where we should be?
Revisit the process workflow and remove all manual or non-repeatable steps, ensuring the workflow and data flow are standardized using automated tools.
It is important to ensure central interfaces are defined for everyone to work and collaborate. That interfaces can be both UI and a programmatic library for the ML platform.
ML Platform ensures separation of workflow between different actors and at the same time ensures work-sharing and re-use. It allows actors to focus on the question "what needs to be done?" and less on "how the process needs to be done?".
Notice the future development of ML platform also envision a new set of actors - product engineering teams. The platform will support people without a machine learning background to easily train and deploy ML models to solve business problems faster.
3. How to achieve it sooner?
Eventually, you should come up with a strategy to achieve automation level with a central ML platform.
Depending on a company budget, you can look for an ML platform as a service that covers all phases of model creation processes end to end or come up with an architecture for ML platform based on open-source tools.
In both scenarios, to choose a platform and tools, it is required to develop an ML platform evaluation framework that can be used for quantitive comparison of different ML platforms and tools.
The evaluation framework can be a form with the #2 diagram coloured based on ML platform automation level and a simple questionnaire that represent concerns of different groups, such as data scientists, data analysts, data curators, security, legal and so on.
It is helpful to represent a sentiment of an answer within the questionnaire with emoji
🔴, 🟡, 🟢, this way you can quickly compare answers between different evaluations.
At the end of the platform evaluation period, the suitable solution can be identified by comparing automation level diagrams as well as answers sentiments of the questionnaire.
Useful resources:
Editable flowchart: https://whimsical.com/mlops-the-path-towards-ml-processes-automation-VLa4tLivK7xDYDjcv5Xtm2
Hidden technical debt in machine learning systems: https://proceedings.neurips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf
DevOps vs MLOps: https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning#devops_versus_mlops
Suggested solution diagrams do not reflect real company processes and should not be associated with any company the author worked for.
Comments