Versioned Data Labeling With Pachyderm + Label Studio

Jimmy Whitaker

June 30, 2022

At Pachyderm, we’re constantly looking for ways to improve our user experience. And one of the most common workloads that we see machine learning groups tackling is data labeling. A little while back we created a “light” integration with Label Studio and Pachyderm to incorporate data versioning and data-driven pipelines. In our new integration, we’ve focused specifically on making the setup easier to configure and more scalable through batched commits.

Check out the step-by-step walkthrough on medium or watch our demo:

What is Label Studio?

With Label Studio, data science teams can label and classify any data of any type. Label Studio offers the flexibility you need, no matter your machine learning project. Our first integration with Label Studio launched in 2021, and has been used in data science projects processing rich datasets. Try Label Studio

Why use Pachyderm when labeling data?

Pachyderm’s containerized pipelines enable data science teams to transform any data type, with any coding language. When your machine learning stack uses Pachyderm, your data is automatically versioned every step of the machine learning lifecycle, from start to finish. Try Pachyderm

Request a Demo