Autoscaling Pipelines On AWS

Tyler Britten

January 7, 2022

Some steps in your machine learning pipelines may need a lot of extra horsepower to finish in a reasonable amount of time. In Pachyderm, the number of pipeline workers can be increased manually using the parallelism_spec, but that still requires the underlying compute to be available to run those workers.

We can also add more nodes to our cluster, but what if we only need those workers for a short period of time? Obviously, you don’t want to pay for idle compute.

But this is computer technology designed for people who are building computer technology (and, specifically, the most advanced software humans know how to build). That means there has to be a way to automate this, right? Isn’t that what all of us are all about?

Cluster Autoscaling

One of the advantages of Pachyderm running on Kubernetes is we can take advantage of new underlying technologies that are built for Kubernetes. One of those features is the cluster autoscaler, a function that is supported on over 14 cloud providers today. Here’s how it works.

In supported environments that allow you to programmatically add more compute nodes, the cluster autoscaler can be configured to add nodes to meet capacity until the demand is satisfied or the configured node limit is reached. Once the demand recedes (your job finishes), the autoscaler can reduce the number of nodes to the configured minimum number.

This allows you to only pay for the resources when you actually need them. The main downside of cluster autoscaler is the time it takes to scale the cluster. The actual request for more or fewer nodes happens relatively quick, generally under a minute, but it may take your cloud provider a few minutes to add or remove nodes, meanwhile, your pipeline job will be limited to the available nodes (though you can still use nodes with GPUs, Spot instances, etc.).

There is plenty of documentation on how to set up the cluster autoscaler on AWS, so we won’t go into detail here. After all, if you’re reading this (and understand what’s been said), you’re already well versed in the skill of looking up and using relevant documentation.

Fargate

There’s another way to get this done. If you’re using AWS EKS for Pachyderm, another option for worker scaling is AWS Fargate. Fargate gives users of ECS or EKS the ability to schedule containers without underlying nodes—those are managed automatically. The main advantage of Fargate over the cluster autoscaler is that its much simpler to configure and you’re not limited to the specific node configurations in your existing pool.

The three main downsides to fargate are:

Spot instances are not available for Fargate.
GPUs are not available for Fargate
Fargate has specific CPU/Memory configurations that you must conform to

As long as those constraints are fine, you’re good to go with fargate. Let’s walk through setting up and using Fargate with pachyderm.

Here are the basic prerequisites:

A Pachyderm installation setup on EKS according to the documentation.
Familiarity with the Fargate on EKS documentation.

First, we need to create a Fargate pod execution role, and this is done via the AWS Management Console. This is the AWS IAM role that Fargate will use to run your pods, so it needs access to your Kubernetes API.

Be aware that this is not the role your pod will run on, so if you need additional IAM access, that is configured separately.

Here are the steps:

We’re going to go into the IAM Console and select Roles and create Role.

Next, we’ll select EKS from the list of services, EKS – Fargate pod for your use case, and then Next.

Keep hitting next until you get to Review. Give your role a name, and then hit create.

We now have a Fargate pod execution role. Now, we just need a Fargate profile. What the profile does is identify which pods should be run on Fargate instead of the existing nodes. You can use either eksctl or AWS eks CLI to create it.

To demonstrate, we’ll use eksctl to create a profile that looks for pods in the Pachyderm namespace with a label that says node: fargate.

   eksctl create fargateprofile \
    --cluster eksctl-tyler-pachyderm \
    --name fargate-pachyderm \
    --namespace pachyderm \
    --labels node=fargate

Now that it’s set up, we can configure our Pachyderm pipeline to use it. Here’s a version of our OpenCV example running that we’re modifying.

{
  "pipeline": {
    "name": "edges"
  },
  "description": "A pipeline that performs image edge detection by using the OpenCV library.",
  "metadata": {
    "labels": {
        "node": "fargate"
    }
  },
  "resource_requests": {
    "memory": "4Gi",
    "cpu": 1},
  "parallelism_spec": {
    "constant": 5
  },
  "autoscaling": true,
  "input": {
    "pfs": {
      "glob": "/*",
      "repo": "images"
    }
  },
  "transform": {
    "cmd": [ "python3", "/edges.py" ],
    "image": "pachyderm/opencv"
  }
}

Let’s walk through the few settings that are different from the example.

First, for metadata, we’ve added the label for the profile that we configured above. This tells EKS to schedule this pipeline on Fargate.

Next is the resource_requests. If you do not configure this, Fargate will assign you the smallest configuration (0.25CPU 500MB Memory) and it will often cause your workers to crash due to being out of memory. We’ve set a configuration that’s within the acceptable fargate sizes from here.

The next setting is the parallelism_spec which sets the number of pipeline workers, which we’ve set to 5. Finally, autoscaling is set to true, which means when there are no jobs, Pachyderm will scale the pipeline to zero workers and will scale it up to 5 workers as previously configured.

Deploying the pipeline

Here’s the pods in our Pachyderm cluster before we create the pipeline:

Once we create the pipeline, the first pod for it appears and shows as pending. It will take a bit of time for Fargate to spin up an instance to run the pod.

In this case, it took over 90 seconds before the pod first started running.

Once the job is completed after processing all the images, the pod is scaled back to zero and the Fargate instance is gone.

Let’s put some more data into the pipeline and see how it scales.

As we can see, the first master-worker starts up first like in the first example, but then we get more pods (and Fargate instances) added. Then, just like before, they get scaled down once the job is complete.