Pachyderm has been acquired by Hewlett Packard Enterprise - Learn more

GitHub Examples

Here are some curated examples from GitHub of Pachyderm in action.

Github Examples
Getting Started

Intro to Pachyderm Tutorial

This Notebook provides an introduction to Pachyderm, using the pachctl command line utility to illustrate the basics of data repositories and pipelines

Github Examples
Getting Started

Boston Housing Prices

A machine learning pipeline to train a regression model on the Boston Housing Dataset to predict the value of homes.

Github Examples
Getting Started

Boston Housing 201

Extends the original Boston Housing Prices example to show a multi-pipeline DAG and data rollbacks.

Github Examples
Getting Started

Stream Data Processing

A spout is a type of pipeline that ingests streaming data (message queue, database transactions logs, event notifications... ), acting as a bridge between an external stream of data and Pachyderm's repo.

Github Examples
Getting Started

Market Sentiment

Train and deploy a fully automated financial market sentiment BERT model. As data is manually labeled, the model will automatically retrain and deploy.

Github Examples
Getting Started

Object Detection

Train an object detector on the COCO128 dataset with Lightning Flash, modify predictions with Label Studio, and version everything in Pachyderm.

Github Examples
Notebook

JupyterLab Mount Ext

A notebook showing how to use the JupyterLab Pachyderm Mount Extension to mount Pachyderm data repositories into your Notebook environment.

Github Examples
Notebook

Jsonnet Pipeline Specs

A notebook introducing and showing how use Jsonnet Pipeline Specs to templatize common pipelines.

Github Examples
Data Labeling

Label Studio Integration

Incorporate data versioning into any labeling project with Label Studio and Pachyderm.

Github Examples
Data Labeling

Superb AI Integration

This example shows how you can create a Pachyderm pipeline to automatically version and save data you've labeled in Superb.ai to use in downstream machine learning workflows.

Github Examples
Data Labeling

Toloka Integration

Uses Pachyderm to create crowdsourced annotation jobs for news headlines in Toloka, aggregate the labeled data, and train a model.

Github Examples
Data Warehouse

Churn Prediction with Snowflake

Create a churn analysis model for a music streaming service with Pachyderm and Snowflake using the Data Warehouse integration.

Github Examples
Machine Learning

Apache Spark – MLflow

End-to-end example demonstrating the full ML training process of a fraud detection model with Spark, MLlib, MLflow, and Pachyderm.

Github Examples
Advanced

Distributed hyperparameter tuning

This example demonstrates how you can evaluate a model or function in a distributed manner on multiple sets of parameters.