Data prep is hard.
It’s estimated that a majority of a data scientist’s time is spent labeling, managing, and cleaning their datasets. Even with a perfect dataset, it’s hard to track the effects of a specific label or value on the overall model.
In our recent joint webinar with Toloka, we discussed how to balance building automated pipelines with human intelligence. By including humans in the training, validation, monitoring, and re-training phases we can iterate faster and with more precision.
Pachyderm and Toloka have been working on a joint example which uses annotated clickbait data from Toloka that is being passed through Pachyderm pipelines. Through the Toloka platform, we’re able to collect labeled data through their crowdsourced network which can then be versioned and tracked through Pachyderm pipelines.