While AI technologies are becoming more and more integrated into all areas of our lives, organizations need to implement the same level of reliability that exists for traditional software development practices. Data versioning is one of the critical components to build a robust AI development workflow. With reproducible pipelines and integration with leading data science tools, such as JupyterHub and Kubeflow, Pachyderm is just the solution many data scientists are looking for.
In this presentation, Svetlana Karslioglu, a Senior Technical Writer at Pachyderm, talks about reproducibility and data versioning and how not tracking your data might contribute to data science project failures when seamingly everything goes right. Bias can sneak into most reliable datasets and produce misleading results that can impact lives of many people.