The Pachyderm team is proud to announce our first major release of 2019 - Pachyderm v1.9. Since we released v1.8 back in November of 2018 we’ve been hard at work, and v1.9 has the feature list to show for it. Let’s dive right in, there’s a lot to cover…
Support For Streaming Data
With v1.9 we’ve introduced a new Pachyderm concept called Spouts. The purpose of Spouts is to introduce the same provenance that Pachyderm provides for batch data to streaming data. And as far as we know, Pachyderm is the first data science platform to do it. With Spouts, teams can continuously ingest streaming data from platforms like Kafka, Nifi, and RabbitMQ directly into Pachyderm and use that data in their pipelines.
Pachyderm S3 Gateway
Back in February we mentioned that in 2019 Pachyderm will have new ways to access and manage data. Pachyderm v1.9 is the first major step towards honoring that promise and users can rejoice in the fact that they can now interact with their data directly instead of being constrained to the Pachyderm API. This will save users from having to build in extra pipeline steps to egress data out of Pachyderm just so it can be consumed by external resources.
Transactions - Batch Your Pachctl PFS Commands
Another exciting addition in this release is a new set of Pachyderm commands called transactions. This new level of functionality enables users to not only batch multiple pachctl commands into a single transaction, but also have those changes be applied atomically across multiple Pachyderm elements.
Expanded Error Handling
In some cases, flaws or gaps in your data are expected and even anticipated. In these scenarios, teams need the ability to construct pipelines where expected errors won’t cause the entire pipeline to come to a halt. Thankfully, as of v1.9 users will be able to configure pipelines with error handling code that will instruct how the pipeline should proceed when facing certain errors.