Pachyderm Hub Is Now in Production


Today we’re announcing the production ready availability of Pachyderm Hub, our powerful end-to-end machine learning and data lineage platform in the cloud. Hub delivers all the key features of the Pachyderm suite but in a completely managed cloud-native environment. You no longer need to have Kubernetes and cloud infrastructure expertise in-house, Pachyderm Hub let’s you seamlessly spin up a cluster in minutes.

While Pachyderm always delivered the power teams needed to do data science at scale, you also needed a strong systems administration and architecture team to get Kubernetes and containers into production. Kuberenetes is way more than just software you download and install. It needs monitoring and management, backup, redundancy, capacity and upgrade planning, to name just a few. That represents a real challenge for some organizations. Now, we’ve removed the barriers to entry, which makes it possible for smaller teams to start doing data science faster.

Pachyderm Hub includes all of the features that were challenging for teams to set up, like autoscaling with GPUs, security and isolation, backup, and automation. Now you can rely on Pachyderm’s trained team of Kubernetes experts to handle the infrastructure for you and concentrate on data science instead of systems administration. Kubernetes is powerful, but the real power comes from the applications you can run on it.

You simply bring your code and data, Pachyderm Hub handles all the infrastructure.

More than anything, Pachyderm Hub delivers on our ultimate vision for a fully collaborative, shareable, and reproducible data science platform. It does what Git did for data, bringing powerful version control and collaboration to your AI applications development. Pachyderm Hub makes team collaboration and sharing data, code, and infra totally seamless. Team members can create and share workspaces, invite other team members to collaborate, and behind the scenes the platform scales transparently as you add workloads.

Since our launch 2014, Pachyderm’s data science platform has quickly become a foundation for the emerging Canonical Stack (CS) in machine learning. Because other platforms only let you track metadata, they lack the true iron-clad immutability of a robust version controlled file system. If your data can change after you’ve recorded your metadata then you can’t reproduce critical data science results. Our customers understand the need for true data lineage and data versioning. That’s why over the past year, Pachyderm has attracted tremendous new enterprise customers like Shell, LogMeIn, Battelle Ecology, and AgBiome, as well as multiple government agencies, pharmaceutical and bioinformatics companies, two major North American banks, and other Fortune 500 powerhouses.

Beyond data versioning, customers choose Pachyderm because they need a clean, simple and elegant data science pipeline system that delivers data science at scale. The Pachyderm platform allows teams to bring any framework, language or library and fit them together into a smooth, automated workflow with ease. If it can run in a Docker container it can run on Pachyderm.Customers aren’t locked into one tool like Spark, R or Python, or one machine learning framework, like Pytorch or Tensorflow. They can use it all on one complete system.

Launch your free workspace today!

If you’re ready to do production workloads you can easily upgrade to benefit from GPUs and enterprise team collaboration, just talk with one of our experts, or join our open source community on Slack and check out the Pachyderm codebase on Github.

About the Author

Joey Zwicker

Joey is Co-Founder and COO of Pachyderm. He is a multitasking wizard and leverages his technical background to be better at the enormous breadth of tasks actually on his plate.

About the Author

Joe Doliner

JD is Co-Founder and CEO of Pachyderm. He has been an opensource software aficionado since before he learned how to program, and prior to founding Pachyderm was an amateur fashion designer.