Pachyderm has been acquired by Hewlett Packard Enterprise - Learn more

Product Overview

Pachyderm is cost-effective at scale, enabling data engineering teams to automate complex pipelines with sophisticated data transformations.

Pachyderm Overview

Automatic Detection

Data-driven pipelines automatically trigger based on detecting data changes.

Version Control

Immutable data lineage with data versioning of any data type.

Autoscaling

Autoscaling and parallel processing built on Kubernetes for resource orchestration.

Automatic Deduplication

Uses standard object stores for data storage with automatic deduplication.

Cloud & On-prem

Runs across all major cloud providers and on-premises installations.

Pachyderm Editions

Pachyderm is available in two editions, Enterprise and Community. Choose the edition that is right for your use case. Read more.

Enterprise

For organizations that require advanced features and unlimited potential.

  • Unlimited Data-Driven Pipelines
  • Unlimited Parallel Processing
  • Role Based Access Controls (RBAC)
  • Pluggable Authentication
  • Enterprise Support
Contact Sales 30 day free trial

Community

For small teams that prefer to build and support their own software.

Complete data-driven pipeline solution with data versioning, and data lineage.

Free Download

Scalability

Deliver reliable results optimizing resource utilization and maximizing developer efficiency.

Run complex data pipelines with sophisticated data transformations with auto scaling and parallelism.

Deduplication of data and code saves infrastructure costs.

Flexibility

Data-Driven Pipelines

Automatically trigger pipelines based on data changes.

Orchestrate batch or real-time data pipelines from any data source.

Diff-based automation just like a CI/CD system but for data.

Deduplication

Versioned data is automatically deduplicated.

Intelligently process only changes dependent data.

Track every change automatically as you work.

Autoscale

Autoscale jobs up and down based on resource demand.

Automatically parallelized processing of large data sets.

Full process visibility and monitoring using Kubernetes-native tools.

Reproducability

Reproducibility

Ensure compliance via immutable data lineage.

No data loss via automatic data versioning of all data types.

Increase team efficiency via git-like structure of commits, branches, and repositories.

Data Lineage

Immutable data lineage of all data and process steps.

Track any result all the way back to its raw input.

Version Control

Full versioning of all data and metadata.

Automatic Git-like tracking of every change.

Flexibility

Leverage your infrastructure investments and run on your existing cloud or on-premises infrastructure.

Run any data type, size, or scale of data in both batch or real-time pipelines.

Support effective team collaboration through git-like structure of commits.

Flexibility

Code and Data Agnostic

Container-native pipelines empower developer autonomy.

Use any languages or libraries that are best for the job.

Seamlessly ingest from streaming, real-time, or batch data sources.

Infrastructure Agnostic

Runs in all major cloud providers and on-premises data centers.

Integrates with existing tools – CI/CD, logging, auth, and data APIs.

Integrates with standard data processing and machine learning tools.

Composability

Easily share data sets or pipelines across teams or use cases.

Make any process data-driven by subscribing to data repo changes.

Microservices-like approach increases reuse and collaboration.

Console

Console is a complete web UI for visualizing running pipelines and exploring your data.

  • Map out the overall structure and flow of all pipelines.
  • View repositories, commit histories, and preview data directly in your browser.
  • Follow job statuses, pipeline processes, and execution history.
Console
Notebook

Notebook

JupyterLab mount extension that selectively maps the contents of data repositories right into your Jupyter environment.

  • Ideal for Data Scientists to explore and analyze data.
  • Run and test pipeline code against versioned data.
  • Create reliable, shareable development environments.

Enterprise Administration

Robust tools for deploying and administering Pachyderm at scale across different teams in your organization.

  • Centralized licensing and administration of all clusters.
  • Authentication against any OIDC provider.
  • Role based access control (RBAC) support for governance and data privacy.
EnterpriseAdmin

See Pachyderm In Action

Watch a short demo which outlines the product in action

Data Pipeline

Transform your data pipeline

Learn how companies around the world are using Pachyderm to automate complex pipelines at scale.

Request a Demo