How we used Argo Workflows to power-up our ML evaluation pipeline

Niels ten Boom
Promaton
Published in
4 min readSep 22, 2021

--

At Promaton we have an AI product that takes 3D CBCT scans as input and outputs results that can make the lives of dental practitioners much easier. We are constantly improving our models and code in order to make the best product out there!

When we want to deploy new models or code changes, we run our evaluation test set on all the changes we make to assess an improvement in performance. This test set is composed of 30 scans which the models have never seen before while they were trained. These scans can vary from ~100MB up to ~1GB in file size. We do a thorough evaluation of each step of our ML pipeline, evaluating each model in isolation as well as the pipeline as a whole, so processing time is long for each scan.

Originally we were running this evaluation pipeline sequentially in Jenkins that took about ~4 hours to finish. This was not ideal for fast iteration.

Old setup where each scan gets evaluation after the other vs. the desired setup where all the scans get evaluated in parallel.

Solution

So we would like to parallelize this process, which is not as straightforward as there are quite some resources needed to run (GPU, large amount of RAM & CPUs), so spinning up sub processes or threads is not going to cut it, for this reason processing needs to be distributed across several machines.

We were already using Kubernetes to orchestrate container workloads across a mix of CPU-only & GPU AWS EC2 machines. With this setup we can scale horizontally depending on the amount of resources that are needed.

Therefore we decided to use Argo Workflows on top of Kubernetes to power-up our ML pipelines! Argo Workflows is a Kubernetes native pipelining framework which lets you define pipelines in YAML to do whatever you want as long as the processes are in containers. We also have plans to define our inference pipelines as Argo Workflows, but that’s material for another blog post.

Implementation

To accomplish this parallelization across multiple GPU nodes, we make use of the Argo Workflows fan-out capabilities to dynamically generate workflow steps. Here we use a simple python script to generate a JSON list of paths (you can use any programming language to accomplish this). You can see the full template with all the steps below:

Each item in the spec.templates.steps[] list refers to a container definition. All the steps have a descriptive name, the git-pull-data-and-src step utilizes a linux container to pull the source and data using git-lfs (we store the evaluation dataset in git) for example. We will take a closer look at the two more interesting steps: generate-path-list & run-evaluation.

In this generate-path-list step template we define which image we want to use, we specify that it should mount the volume with the evaluation data on it and lastly we define the script we would like to run. This script will dump a JSON list containing the path of each scan on the mounted volume. This JSON list can be used by the next step to “fan-out” and evaluate each scan in a separate container process. Have a look at the Argo loop examples for the ins and outs on how this works!

In the full WorkflowTemplate we tell Argo Workflows to create a step for each path in the JSON that the generate-path-list outputs and to pass this path as an input to the run-evaluation step by using the withParam option. In the next step we pass this path as an input to the evaluation script.

It takes the path of the scan as an input to the evaluation.py script to evaluate it with the code & model from the new PR. It will write back the results to the temporary folder on the mounted volume so that the next step can aggregate the results and write a report that can be inspected by our engineers and they can then decide if they want to merge this model into the staging or production branches.

This report contains metrics like precision, recall and IoU that quantify how well the ML models performed on the evaluation dataset. If the metrics of the challenging models are worse than the models currently running in production, then we know that we’ll have to take another look.

This is how a successful workflow will look like (most of it, the whole workflow does not fit in one image):

Results

Going from a sequential evaluation process to a parallelized process on Argo Workflows, cut down the evaluation running time from 4 hours to 40 minutes. So now Machine Learning Researchers and Engineers do not have to wait half a day before they can evaluate their changes any longer!

Join us!

Sounds cool? We are hiring: https://careers.promaton.com/

--

--