It’s here! We’re excited to announce the release of Pachyderm 1.10, and trust us when we say it comes with a trunk-load of new features that you’ll love and want to deploy right away. Terrible-puns aside, deploying Pachyderm has never been easier and you can head over to pachyderm.com/getting-started/ and deploy it on your choice of infrastructure.
Focusing on the broader ecosystem & community
We spent 2019 on the road, going to conference after conference and talking to thousands of data scientists, developers, and MLOps engineers. But more than anything, we listened. We wanted to know the struggles and challenges that data science teams face every day.
After listening to various use-cases and learning about their struggles to make things work, we continue to see that data science is not owned by individual contributors; it’s a collective responsibility.
Pachyderm was always a platform focused deeply on collaboration but with version 1.10 we looked to build on that rock-solid foundation and make collaboration a way of life from end to end.
S3 Gateway expansion brings support for Kubeflow and more
Ever since we announced the S3 gateway in version 1.9, we knew that it would open up a lot of possibilities to integrate with other great data science tools. With Kubeflow as our muse, we engineered a new configuration of the S3 gateway that makes it possible to directly integrate Kubeflow and Pachyderm together. As of 1.10, you’ll be able to leverage Pachyderms powerful data lineage capabilities with TFJobs (or any other Kubeflow run) directly from within the Kubeflow ecosystem.
And while not technically a 1.10 feature, the S3 Gateway is now completely FREE. As of 1.9.10, users will no longer need an enterprise subscription to leverage this awesome feature.
Before 1.10, you could use Jupyter notebooks with Pachyderm and do some manual twists and turns to link them together but it was far from simple. As of version 1.10 that changes with fully supported integration with JupyterHub for Pachyderm Enterprise users. We deliver a smoothly scripted way to deploy JupyterHub. Now you don’t have to do it from scratch or write your own connection point. We also connected JupyterHub auth into Pach Auth, which makes sign-on seamless. Now when you sign into JupyterHub you’re automatically signed into Pachyderm too.
Data science is hard enough as it is and the last thing you need is another tool with a steep learning curve. You want to pick up a command line and get right to work, not spend days or weeks bogged down in the intricate details of a new CLI tool. With the 1.10 release, we created the Pachyderm Shell, which makes interacting with Pachyderm much easier by delivering a time-saving auto-completion feature, combined with helpful suggestions displayed directly in the prompt.
Spouts have become one of the most widely used features in Pachyderm. Spouts allow you to easily stream data into a Pachyderm repo from outside sources and create automated commits. In addition to a number of smaller robustness fixes, we’ve added the Spout Marker feature to make spouts stateful. Now, if your spout or stream errors, you can use the marker to track to the last successful message and pick up right where you left off.
Joins are a really common input pattern in Pachyderm. They give data science teams a powerful way to combine two data sets on particular join keys. We released the first implementation of joins back in September, and in 1.10 made some significant performance and resource usage improvements.
60+ additional bug fixes and stability improvements
As with every release of Pachyderm, we’ve also included a number of stability improvements, performance tweaks, and general bug fixes. You can check out the changelog of those fixes right here.
Pachyderm: Community is getting a new license
The open source community is changing fast and with that, the definition of open source is evolving to keep up. It’s amazing how many successful OSS-based businesses have been able to thrive as they develop and distribute free source code. Over the last few years, we’ve seen nearly every major open source business make adjustments to their licenses that enable them to protect their business, while still being able to make code open to everyone.
As part of that movement, Pachyderm also needs to balance our open source roots with the needs of our business in order to continue providing value to our users, communities, and customers alike. With that, we’re following in the footsteps of other major open source companies and updating our community license so that we can protect our business without hindering our ability to remain an open source project.
Strictly speaking, Pachyderm is now “source-available.” Many people use the phrase “open source” in a loose sense to mean that you can freely download, modify, and redistribute the code, and those things are all true of the code under the Pachyderm Community License. However, in the strictest sense “open source” means a license that meets the Open Source Definition or is approved by the Open Source Initiative (“OSI”). The Pachyderm Community License is not approved by the OSI and likely would not be as it excludes the use case of creating a competing offering of the code.
The full license is available in our GitHub repo and we’ve provided a detailed FAQ to help clarify things for our users.
Upgrading from 1.9
Upgrading from 1.9.x is a simple and straight-forward process. Just head over to our upgrades & migration instructions and you’ll be running the latest release in no time.
Pachyderm wouldn’t be what it is today without the support of both our open source community as well as our enterprise customers. We thank you all very much for your support and we look forward to seeing you in the slack channel.