GitHub Examples

Here are some curated examples from Github of Pachyderm in action.

Resource Image
Getting Started
Intro to Pachyderm Tutorial

This Notebook provides an introduction to Pachyderm, using the pachctl command line utility to illustrate the basics of data repositories and pipelines

Resource Image
Getting Started
Boston Housing Prices

A machine learning pipeline to train a regression model on the Boston Housing Dataset to predict the value of homes.

Resource Image
Getting Started
Boston Housing 201

Extends the original Boston Housing Prices example to show a multi-pipeline DAG and data rollbacks.

Resource Image
Getting Started
Stream Data Processing

A spout is a type of pipeline that ingests streaming data (message queue, database transactions logs, event notifications… ), acting as a bridge between an external stream of data and Pachyderm’s repo.

Resource Image
Getting Started
Market Sentiment

Train and deploy a fully automated financial market sentiment BERT model. As data is manually labeled, the model will automatically retrain and deploy.

Resource Image
Getting Started
Object Detection

Train an object detector on the COCO128 dataset with Lightning Flash, modify predictions with Label Studio, and version everything in Pachyderm.

Resource Image
Notebook
JupyterLab Mount Ext

A notebook showing how to use the JupyterLab Pachyderm Mount Extension to mount Pachyderm data repositories into your Notebook environment.

Resource Image
Notebook
Jsonnet Pipeline Specs

A notebook introducing and showing how use Jsonnet Pipeline Specs to templatize common pipelines.

Resource Image
Data Labeling
Label Studio Integration

Incorporate data versioning into any labeling project with Label Studio and Pachyderm.

Resource Image
Data Labeling
Superb AI Integration

Uses Pachyderm to create crowdsourced annotation jobs for news headlines in Toloka, aggregate the labeled data, and train a model.

Resource Image
Data Labeling
Toloka Integration

Uses Pachyderm to create crowdsourced annotation jobs for news headlines in Toloka, aggregate the labeled data, and train a model.

Resource Image
Data Warehouse
Churn Prediction with Snowflake

Create a churn analysis model for a music streaming service with Pachyderm and Snowflake using the Data Warehouse integration.

Resource Image
Machine Learning
Breast Cancer Detection

A breast cancer detection system based on radiology scans scaled and visualized using Pachyderm.

Resource Image
Machine Learning
Apache Spark – MLflow

End-to-end example demonstrating the full ML training process of a fraud detection model with Spark, MLlib, MLflow, and Pachyderm.

Resource Image
Advanced
Distributed hyperparameter tuning

This example demonstrates how you can evaluate a model or function in a distributed manner on multiple sets of parameters.