Data lineage means the entire life cycle of your data from start to finish. It’s knowing the complete journey your data takes over time. It describes what happens as your data goes through various transformations and changes.
In AI/ML that means tracking changes to data, your models, your results and your code, as well as how all those changes link together. Data science teams may track 100s of models and do 1000s of training sessions and experiments. Pachyderm’s data lineage system lets them reproduce any of those training results perfectly so they can see what went right or what went wrong.