Experiment Tracking with Data Version Control (DVC)

Michael Gruner

Oct 2, 20248 min read

Updated: Oct 22, 2024

Part III: Experiment Tracking

An image of an owl wearing a lab coat and performing experiments with a chemistry set. — Deevee performing experiments, as imagined by Dalle

Welcome!

This is the third of a series of articles dedicated to DVC and the Iterative ecosystem. If you are new to the topic, I suggest you start from the first blogs instead!

Part I: The need for DVC

Part II: Hello DVC world

Part III: Experiment tracking (You are here)

Part IV: Continuous Machine Learning (coming soon!)

Part V: Advanced DVC (coming soon!)

Experimentation is an implicit part of machine learning projects. Deep learning architectures have tens, if not hundreds of hyper parameters that can affect the final model performance. Having appropriate artifact, code and parameter tracking is essential for quick experiment iteration. This post shows how you can use DVC to perform new ML experiments, analyze and compare several results, discard unsuccessful ones and save the ones that actually improved the model. The blog assumes the reader has a basic understanding of the DVC+Git workflow, so if that is not your case I recommend starting from the previous entries first.

Introduction
1. Challenges in Experiment Management
What are DVC Experiments?
Creating Experiments with DVC
Collaborative Experimentation
1. Pushing Experiments
2. Pulling Experiments
Key Takeaways

Introduction

In machine learning, experimentation is the cornerstone of innovation. Whether you're fine-tuning hyperparameters, trying out new algorithms, or experimenting with different data preprocessing techniques, each iteration brings you closer to building a better model. Managing these experiments, however, can quickly become complex and you can easily lose control of what you have tried or, worse yet, lose a promising experiment among all the chaos.

Challenges in Experiment Management

If you have worked on machine learning on the past, you know how easily they get polluted. In our experience, the biggest challenges can be summarized as:

Tracking Changes: Keeping track of code changes, data versions, and parameter settings across experiments can be difficult.
Reproducibility: Without proper tracking, reproducing past results can be nearly impossible.
Collaboration Barriers: Sharing experiments with team members often involves manual processes and extensive documentation.
Resource Management: Running multiple experiments can strain computational resources if not managed efficiently.

These challenges will slow down your progress and hinder collaboration. You probably remember the Jupyter Notebook with thousands of cells with a different experiment each, the time you spent looking for a script you sent you colleague via email, or the time you broke your environment beyond repair trying out something new.

Versioning and tracking allows developers to make aggressive changes without fear, and fearless developers make progress faster.

Whether you're new to DVC or looking to enhance your workflow, this guide will provide practical steps and examples to get you started.

What Are DVC Experiments?

Unlike traditional version control systems like Git, which are optimized for code, DVC is designed to handle large datasets and machine learning pipelines. DVC (Data Version Control) Experiments extend DVC's core functionalities by providing a robust mechanism to manage and track machine learning experiments.

One option is to handle experiments as Git branches. While this could potentially work in practice, there are several drawback:

Branch overhead: Some model explorations can spawn hundreds of experiments. It would be highly inconvenient to create a branch for each of them.
Hyper-parameter sweeping: Its common to perform a sweep of certain parameters in order to understand its impact in the model. The Git branching strategy wouldn't work here.
Unnecessary process: Most experiments are meant to be discarded. It would be very tedious and inefficient to go through the process of creating a Git branch just to remove it moments later.
Merge conflicts: I'm sure you've struggled with this one already :)

DVC proposes a different approach where it keeps the benefits of versioning and tracking but without bloating your Git repository. All the experiments you spawn with DVC are run in a hidden workspace, separate from your Git repository. The following figure illustrates this concept.

A graph showing a git repository where several feature branches have been merged into the main branch. Separate from the git repository there is a workspace where all experiments are held. — DVC experiments run in a workspace separate from your Git repository, avoiding bloating and pollution.

All the experiments are born from the latest commit, as if it were another branch, but they do not form part of the Git history.

DVC allows you to track your machine learning experiments without bloating your Git repository.

By handling them this way, they can be compared, discarded or persisted with ease. For example, lets say Experiment A from the figure above seems promising, then at that point you apply it as part of your Git repository and discard the rest. The following figure shows this concept.

A Git graph where the last branch merged into main is "Experiment A". The rest of the experiments have been discarded as shown with them being in a trashcan to the right. — When an experiment is successful it is made part of the Git repository and the rest are discarded.

Creating Experiments with DVC

Time for the fun part! How do you actually run an experiment with DVC? There are three ways to spawn new experiments:

Direct experiment execution
Experiment queue
Hyperparameter sweeping

Let's study each of them in the next sections and, for that, we'll use the skin disease classification project from our previous blog post. As a refresher, the project structure looks something like the following:

A block diagram showing a machine learning pipeline with the following stages: preprocess, train, test. In each stage a series of parameters, plots and metrics derive. — The typical structure of a DVC project. In this case a skin disease classifier.

Direct Experiment Execution

This mode is ideal to test quick independent changes. The results will be recorded as independent experiments so you can compare them later on without needing to commit them as separate branches!

Step 1: Modify Parameters

Let's see if we can improve our model metrics by training for more epochs. Open the params.yaml file and increase the boosting rounds from 50 to 100.

 train:
-  boost_rounds: 50
+  boost_rounds: 100
   learning_rate: 0.01
   max_depth: 10
test:
   threshold: 0.3

Step 2: Run the Experiment

Execute the experiment with DVC by running dvc exp run:

DVC detects changes in params.yaml and reruns the pipeline stages that depend on it. The --name parameter allows you, as you may have imagined, to specify a descriptive name. Otherwise, a random one will be assigned. You're name must follow the rules here.

Step 3: View Experiment Results

At this point your workspace contains the result of you experiment. As usual, you can run the following to see if there was any improvement:

Equivalently, and since this is an experiment, we can visualize the run in the experiments view by using dvc exp show. This will become more useful later, when we actually have multiple experiments.

At this point both visualizations show the same: increasing the number of training epochs slightly improves the testing metrics. Let's try and increase the tree depth.

Step 3: Try a New Experiment

Lets revert the params.yaml to its previous state (boost_rounds=50) and increase the the tree length in a different way. The --set-param option allows you to vary a parameter without needing to manually change the params.yaml file.

And again, lets visualize our experiments so far:

Now you can see and compare both experiments simultaneously even though you have not committed anything to Git! You can use the --md parameter, which prints out the table in a Markdown format, useful for when you are creating a report, for example. The --json or --csv parameters serve a similar purpose.

Step 4: Discard Unsuccessful Experiments

So far, the max-depth our experiment wasn't fruitful. Time to discard it! For that, use dvc exp remove:

Step 5: Absorb a Successful Experiment

Increasing the epochs actually improved the testing metrics. Let's absorb it right away. At this point you probably want to explore the plots and other information from your project. Let's apply the experiment to your workspace by using dvc exp apply:

If you decide you're happy with the run, you can create a branch out of it using dvc exp branch:

Finally, a little bit tricky, if we want to merge that branch into main, we need to clean our workspace, merge and sync DVC with Git:

At this point DVC is fully synchronized with Git and you have successfully absorbed your experiment.

Remember to do a dvc checkout whenever you switch to a different Git commit to maintain both tools synchronized.

Utilizing the DVC Experiment Queue

When you have multiple experiments to run, such as testing various combinations of hyperparameters, the experiment queue becomes useful.

Step 1: Queue Experiments

Lets attempt to improve the model by experimenting with different learning rates. Since each experiment may potentially last a very long time, we might schedule the system to run them all overnight, for example. Lets use again dvc exp run with the --queue parameter to queue experiment runs.

Besides the 0.01 learning rate we are currently using, we have queued three experiments trying out 0.001, 0.1 and 0.2. If we now view the experiments we can see that they are marked as "queued":

Step 2: Run All Queued Experiments

To start processing the experiments in the queue use dvc queue start. In this case, since the model is small enough we can launch them in 3 simultaneous jobs by using the --jobs parameter. If this is not specified, the experiments will run sequentially.

At any point in time, you can check their status by using dvc queue status.

Similarly, you can view the logs of each run by using dvc queue logs. For example, if we wanted to see the logs of the "lr-0.001" experiment:

Step 3: Compare Results

Now we can run dvc exp show to compare our new runs!

Note how I've used --drop and --keep to the experiment name and the test metrics, since its what I'm mostly interested in.

Step 4: Absorb What Works and Discard the Rest

We've hit a gold mine! Selecting a learning rate of 0.1 significantly improves our testing metrics. Let's use another approach to absorb the experiment. Instead of creating a branch, let's simply apply the successful experiment and commit the changes.

Finally, discard the experiments and move on!

Hyperparameter Sweeps

By now you probably have a gist of how to handle experiments using DVC. There's one missing detail though: how can we perform hyperparameter sweeps?

If you are not familiar with hyperparameter sweeps, they are a ways of exploring a large portion of the model's search space in an attempt to find the combination of hyper-parameters that achieve the lower loss.

DVC supports two types of sweeps:

Choices: --set-param param=A,B,C
Ranges: --set-param param=range(start,stop,step)

For example, the following command:

dvc exp run --queue --set-param "test.threshold=0.4,0.5" --set-param "train.max_depth=range(11,14)"

will spawn the following 6 experiments as a result of combining the sweeps above:

Threshold	Max Depth
0.4	11
0.4	12
0.4	13
0.5	11
0.5	12
0.5	13

The configuration above can will execute as the following:

And from now on, you proceed as you would normally do!

Hydra Advanced Usage

DVC uses a tool named Hydra to perform these sweeps. Hydra, however, allows you to do much more powerful stuff which, unfortunately falls outside the scope of this post.

Refer to the DVC user guide if you are interested in these use cases.

Collaborative Experimentation

We are finally over with our DVC experiments overview. I didn't want to finish without going over how you can share experiments with your teammates. Even though these experiments are not versioned with Git, you can leverage your DVC remote to share and pull experiments

Pushing Experiments

In order to share an experiment, use the dvc exp push command.

Pulling Experiments

Similarly, your teammate can pull the experiment to their workspace by using the dvc exp pull command:

While it is more typical to share full-fledged branches, there are definitely occasions where sharing a simple experiment can be useful.

Key Takeaways

Experiments in machine learning are the cornerstone of innovation.
DVC experiments allow you to quickly track experiments without bloating your workspace.
Use dvc exp run to directly run an experiment.
Use dvc exp run --queue to queue an experiments and send them in batch later.
Use Hydra syntax to combine choices and ranges to perform sweeps or grid searches.
Share your experiments using dvc exp push and dvc exp pull.

Need Help with your Machine Learning Project?

Let's talk! RidgeRun.ai offers Deep Learning and Computer Vision consulting services for a wide variety of needs. Feel free to reach out to contactus@ridgerun.ai so we can schedule a call and discuss your project requirements!

Experiment Tracking with Data Version Control (DVC)

Part III: Experiment Tracking

Welcome!