GitHub Examples Archives

Intro to Pachyderm Tutorial

This Notebook provides an introduction to Pachyderm, using the pachctl command line utility to illustrate the basics of data repositories and pipelines

Boston Housing Prices

A machine learning pipeline to train a regression model on the Boston Housing Dataset to predict the value of homes.

Boston Housing 201

Extends the original Boston Housing Prices example to show a multi-pipeline DAG and data rollbacks.

A spout is a type of pipeline that ingests streaming data (message queue, database transactions logs, event notifications... ), acting as a bridge between an external stream of data and Pachyderm's repo.

Market Sentiment

Train and deploy a fully automated financial market sentiment BERT model. As data is manually labeled, the model will automatically retrain and deploy.

Object Detection

Train an object detector on the COCO128 dataset with Lightning Flash, modify predictions with Label Studio, and version everything in Pachyderm.

JupyterLab Mount Ext

A notebook showing how to use the JupyterLab Pachyderm Mount Extension to mount Pachyderm data repositories into your Notebook environment.

Jsonnet Pipeline Specs

A notebook introducing and showing how use Jsonnet Pipeline Specs to templatize common pipelines.

Label Studio Integration

Incorporate data versioning into any labeling project with Label Studio and Pachyderm.

Superb AI Integration

This example shows how you can create a Pachyderm pipeline to automatically version and save data you've labeled in Superb.ai to use in downstream machine learning workflows.

Toloka Integration

Uses Pachyderm to create crowdsourced annotation jobs for news headlines in Toloka, aggregate the labeled data, and train a model.

Churn Prediction with Snowflake

Create a churn analysis model for a music streaming service with Pachyderm and Snowflake using the Data Warehouse integration.

Breast Cancer Detection Image Processing Example

Apache Spark – MLflow

End-to-end example demonstrating the full ML training process of a fraud detection model with Spark, MLlib, MLflow, and Pachyderm.

Distributed hyperparameter tuning

This example demonstrates how you can evaluate a model or function in a distributed manner on multiple sets of parameters.