A datum is the smallest indivisible unit of computation within a job. A job can have one, many or no datums. Each datum is processed independently with a single execution of the user code and then the results of all the datums are merged together to create the final output commit.
Install the client
- Run the corresponding steps for your operating system:
- For macOS, run:
brew tap pachyderm/tap && brew install firstname.lastname@example.org
- For a Debian-based Linux 64-bit or Windows 10 or later running on WSL:
curl -o /tmp/pachctl.deb -L https://github.com/pachyderm/pachyderm/releases/download/v1.10.\*/pachctl_1.10.\*.deb && sudo dpkg -i /tmp/pachctl.deb
- For all other Linux flavors:
curl -o /tmp/pachctl.tar.gz -L https://github.com/pachyderm/released
- For macOS, run:
- Verify that installation was successful by running
pachct1 version --client-only:
$ pachctl version --client-only COMPONENT version PACHTL 1.10.xIf you run
--client only, the command times out. This is expected behavior because
pachdis not yet running.
Pachyderm is a powerful data science platform and to make the most of it, users should take a moment to review the core Pachyderm concepts below.
A Pachyderm repository is a location where you store your data inside Pachyderm. Similar to other version-control systems, a Pachyderm repository tracks all the changes to its contents, and creates a history of data modifications that you can access and review. You can store virtually any type of file in a Pachyderm repo.
Pachyderm Pipelines are the computational component of the Pachyderm platform. They are responsible for reading data from a specified source, such as a Pachyderm repo, transforming it according to the pipeline configuration, and writing the result to an output repo. Pachyderm pipelines can be easily chained together to create a directed acyclic graph (DAG).
We have a version of Pachyderm for everyone at every stage of development.
Our free and open-source version of Pachyderm is built and backed by a community of experts. With Pachyderm Community Edition, you can quickly and easily build, train, and deploy your data science workloads on whatever Kubernetes deployment you call home.
Hosted and managed Pachyderm for those who want everything Pachyderm has to offer, without the hassle of managing infrastructure yourself. With Hub, you can version data, deploy end-to-end pipelines, and more. All with little to no setup, and it’s free!
Our complete version-controlled data science platform packed with all the essentials enterprise organizations need. Pachyderm Enterprise is the choice for individuals or teams who need an extra layer of security and prefer to deploy it on their own infrastructure.