GitHub Examples
Here are some curated examples from Github of Pachyderm in action.

Intro to Pachyderm Tutorial
This Notebook provides an introduction to Pachyderm, using the pachctl command line utility to illustrate the basics of data repositories and pipelines

Boston Housing Prices
A machine learning pipeline to train a regression model on the Boston Housing Dataset to predict the value of homes.

Boston Housing 201
Extends the original Boston Housing Prices example to show a multi-pipeline DAG and data rollbacks.

Stream Data Processing
A spout is a type of pipeline that ingests streaming data (message queue, database transactions logs, event notifications… ), acting as a bridge between an external stream of data and Pachyderm’s repo.

Market Sentiment
Train and deploy a fully automated financial market sentiment BERT model. As data is manually labeled, the model will automatically retrain and deploy.

Object Detection
Train an object detector on the COCO128 dataset with Lightning Flash, modify predictions with Label Studio, and version everything in Pachyderm.

JupyterLab Mount Ext
A notebook showing how to use the JupyterLab Pachyderm Mount Extension to mount Pachyderm data repositories into your Notebook environment.

Jsonnet Pipeline Specs
A notebook introducing and showing how use Jsonnet Pipeline Specs to templatize common pipelines.

Label Studio Integration
Incorporate data versioning into any labeling project with Label Studio and Pachyderm.

Superb AI Integration
Uses Pachyderm to create crowdsourced annotation jobs for news headlines in Toloka, aggregate the labeled data, and train a model.

Toloka Integration
Uses Pachyderm to create crowdsourced annotation jobs for news headlines in Toloka, aggregate the labeled data, and train a model.

Churn Prediction with Snowflake
Create a churn analysis model for a music streaming service with Pachyderm and Snowflake using the Data Warehouse integration.

Breast Cancer Detection
A breast cancer detection system based on radiology scans scaled and visualized using Pachyderm.

Apache Spark – MLflow
End-to-end example demonstrating the full ML training process of a fraud detection model with Spark, MLlib, MLflow, and Pachyderm.

Distributed hyperparameter tuning
This example demonstrates how you can evaluate a model or function in a distributed manner on multiple sets of parameters.