Pachyderm 2.3 Release

At Pachyderm, we want to make it easy for our customers to automate their data transformations with data-driven pipelines. We provide a highly scalable solution that’s Kubernetes native, and that supports any data type and any language. For the 2.3 release of Pachyderm, we have concentrated our efforts on making it even easier to deploy and configure Pachyderm. This will allow enterprise and open source users to get up and running faster than before. We’re also excited to announce support for ARM64 which can significantly reduce the cost of running pipelines on ARM64 architectures. For the detailed list of changes please see our change log in GitHub.

Easier Deployment & Configuration

Pachyderm provides a cost effective way to scale and automate complex pipelines and sophisticated data transformations, but deploying and configuring a product on top of Kubernetes can be challenging. This is especially true for products with multiple components that need to be upgraded and maintained on long-lived clusters. We want to make deployment a snap, so we’ve made deployment and upgrades easier through:

  • Streamlined network ingress: Pachyderm now includes a built-in proxy that routes all network traffic through a single port. This dramatically reduces the amount of network configuration required during deployment, and ensures seamless communication with Pachyderm components such as the Pachyderm Console.
  • Configuration via environment variables: A GitOps approach to managing configuration via environment variables can significantly reduce the complexity and workload involved in maintaining and upgrading long-lived Pachyderm clusters. With 2.3, Pachyderm is moving to supporting all configuration options via environment variables that can be managed with Kubernetes tooling including ArgoCD and Vault. As part of this improvement the Configuration Pod has been eliminated which also simplifies maintenance and upgrades.
  • Bundled logging with Loki: Pachyderm pipelines are much easier to debug if you’ve got a full set of logs. Pachyderm now comes bundled with Loki to simplify deployment and ensure logs are retained for debugging. Customers can easily swap Loki for a different solution if they so choose.

ARM64 Support

Pachyderm now supports ARM64 architectures in addition to x86. Customers are increasingly looking at strategies to reduce cloud expense by running on ARM64 based instances like AWS Graviton. ARM64 support also increases the performance of Pachyderm for local deployment on M1-based Macs.

Other Enhancements

Pachyderm 2.3 also includes features previously announced including:

  • Pachyderm Snowflake Integration: It’s now even easier to get your Snowflake data into Pachyderm. Whether you’re looking to build ML models with your Snowflake data or just want to gain versioning as part of your data transformations, Pachyderm makes it easy to work with your Snowflake data. Read our blog for more details!
  • Console now Available to Community Edition Users: We’re happy to announce that Community Edition users can now use Pachyderm Console to visualize and interact with their pipelines, repositories and Directed Acyclic Graphs (DAGs). Read our blog for more details!

Upgrading 2.3

Upgrading from any 2.x release to 2.3 should be seamless with no breaking changes. Please note that our built-in proxy for simplified network configuration is optional in 2.3, but will be mandatory in 2.4. Pachyderm support is available to assist Enterprise customers with upgrades from 1.13 or older versions and will continue to address any upgrade issues. Please contact Pachyderm support for any assistance you need in upgrading.

Pachyderm 1.13 and 2.0 EOL

Pachyderm’s policy is to support the current and last two minor releases. Therefore, with the release of 2.3, we’ll be ending support for 1.13 (which had been extended) and 2.0. This means that Pachyderm will not backport any fixes to releases before 2.1, and will ask customers on older releases to upgrade in order to try to resolve their issues.