In the real world, machine learning models require iteration and careful dataset curation. We’ve seen many examples of data bugs and bias existing in models trained on uncurated datasets. And correcting these bugs takes human time and expertise to correct.
Continuous improvement for data is crucial, which is why we need the right tooling to manage it. We need to be able to change incorrectly labeled data, version it, and incorporate it into our model. In Versioning and Labeling - Better Together, we went through the process of integrating Label Studio, a diverse and open source labeling tool, with Pachyderm. Check out our blog or the example here.