Pachyderm has been acquired by Hewlett Packard Enterprise - Learn more

Productivity increases in Pachyderm 2.7

We are excited to announce Pachyderm SDK as our new Python client! The general availability of Pachyderm 2.7 also includes a slew of eagerly awaited features such as integration of ChatGPT into our docs, Pachyderm Console improvements and better debugging functionality. We’ve also added preflight checks in the Helm chart so you can do a dry run of the upgrade and validate that the upgrade will be successful. The themes we are emphasizing with this release are increased productivity, a better developer experience and usability.

Simpler, easier Python client

The hallmark feature in Release 2.7 is Pachyderm SDK. This client improves on the old client Python-Pachyderm because it is simpler and easier to use. Code generated with this client includes type hints, can be parsed by most IDEs, and works well with autocompletion. The generated code is simpler, clearer, and easy to debug. The API offers better organization of methods and message objects making it more intuitive. For example, methods and message objects are organized according to the service they are associated with such as Auth, RBAC, Pachyderm File System, Pachyderm Pipeline System, etc. In addition, some methods were renamed to remove redundancies and thus reduce confusion. 

Pachyderm 2.7 supports both the Pachyderm SDK and Python-Pachyderm clients. With the release of Pachyderm SDK, we are deprecating Python-Pachyderm and will stop adding support for Python-Pachyderm in future releases. This means you can expect Python-Pachyderm to be fully supported by HPE through the full lifecycle of 2.7 as per our End of Life policy.  If needed, customers can contact Customer Engineering for assistance migrating their code to Pachyderm SDK.

ChatGPT powered chatbot in Technical Documentation 

If you have used our technical documentation recently you may have noticed some big changes. We have reorganized the documentation home page to make it intuitive and added a chatbot leveraging OpenAI’s API. The home page screenshot below shows navigation is much simpler with cards used to highlight common topics such as how to get started, what’s new, tutorials and deployment options.  We’ve also added a new changelog page that focuses on user facing changes of each release. This helps reduce the noise of our nightly builds for those who previously followed our GitHub releases – which are still accessible. 

001

Clicking beyond the home page exposes our Chatbot (in beta). Given the rapid adoption of large language models (LLM) and our thought leadership in ML, our documentation engineers couldn’t resist the opportunity to integrate ChatGPT with Pachyderm’s technical documentation. With this approach we implemented a Pachyderm-specific retrieval augmented generation (RAG) to overcome ChatGPT’s knowledgebase, which is limited to Sept 2021.  

At a high level, this RAG involves two key steps. First, you provide a question or input prompt, which is used to fetch relevant Pachyderm documents. Second, it feeds these documents, along with your question into ChatGPT to create an in-context response. We think this feature will make it faster to find the correct docs and invite you to try out the Chatbot and send feedback via Slack

ezgif 3 00b1fb850d

Console drives productivity Improvements 

We have multiple usability and performance improvements in Console, our web-based Graphical User Interface (GUI) for Pachyderm. Now, users can add and edit user role assignments directly in Console as shown below. 

003

We have enhanced the appearance and interactivity of the DAG viewer. For example, in the screenshot below, clicking on the “edges” pipeline or “edges” output repo will drill down to the respective code and file repository details while clicking on an error message will take you to the appropriate logs. 

004

Additional improvements include the ability to delete or download multiple files at once. We’ve resolved an issue reported by a user in GitHub to fix font and styling issues with air gapped deployments. We now bundle all of our fonts and CSS into the Pachyderm images instead of pulling them from public CDNs so everyone can have a consistent experience. 

Faster, easier debugging 

 When you’re working with multistage DAGs, a detailed debug dump is extremely useful for debugging whether you are doing it yourself or in collaboration with our Customer Engineering. pachctl debug dump has been enhanced to be more detailed providing more information on Postgres, Kubernetes events & Helm to name a few. Thus, giving a full snapshot of your Pachyderm instance. The end result is more time spent on creating and scaling pipelines rather than on troubleshooting.  

Summary 

For data and ML engineers working with large and complex datasets, Pachyderm is a flexible data management solution that automates and quickly scales data pipelines. Pachyderm 2.7 adds a new Python client (Python SDK), an AI powered Chatbot to query technical docs, Console improvements and debug dump to improve debugging. For more details, check our change log or our technical documentation. As always, we appreciate your feedback and look forward to hearing about your experiences with Pachyderm 2.7. 

Are you seeking an enterprise data pipelining or data versioning solution? Take the next step and schedule a demo of HPE Machine Learning Data Management Software, an enterprise version of Pachyderm that can revolutionize your data management and collaboration experience.