Building Experiment Tracking at Scale with Weights & Biases + Pachyderm

Meet Your Hosts

Jimmy Whitaker

Chief Scientist of AI @ Pachyderm

Andrea Parker

Growth ML Engineer @ Weights & Biases

Building experiments doesn’t just end once the model is deployed. Teams must monitor their models in production and use their findings to further iterate. Especially when dealing with tens to hundreds of models you need to monitor and automate at scale! 

Join us as we cover:

  • What is experiment tracking and why is it important for data scientists
  • How reproducibility and lineage factor into retraining your models
  • How you can use tools like Pachyderm and Weights & Biases to iterate at scale

Try Pachyderm Today

Webinar Transcript

Jimmy: Super excited to be talking about this today. My background is I used to be a machine learning engineer and then researcher and then director of applied machine learning for a while. So a lot of things that we're going to talk about today are near and dear to my heart, so I'm excited about that. I'm Jimmy Whitaker, so I'm Chief Scientist at Pachyderm. And what we focus on is how to scale not only data processing but also machine learning workflows by making them data driven. And with that, I'll hand it over to Andrea.

Andrea: Thanks, Jimmy. Hey, my name is Andrea. My background is in information retrieval and natural language processing. I've been an MLE and working in various data science roles for about a decade. I'm really eager to share with you today how Weights & Biases can become part of your machine learning workflow to enable scalability along with Pachyderm.

Jimmy: Great. And I think with that, I'll hand it back over to you, Andrea, to talk about experiment tracking.

Experiment Tracking: The Basics

Andrea: Thanks so much, Jimmy. So today my portion's going to be covering experiment tracking 101 and how Weights & Biases can enable you at the beginning of your experimental workflow all the way through the end and enable teamwork, collaboration, interpretability, and all those other good things that you need to make a solid performing machine learning model. So experiment tracking has several major components. We have our code versioning, our configuration versioning, our data set versioning, and our model versioning. And every team at your company has a set of tools that helps them do their job well. Weights & Biases aims to serve as a single system of record, cataloging your work, making it reproducible and reputable with only a few lines of code. And if you're new to ML, or if you're more seasoned, you may know that there's quite a gap in a tooling skill set, whereas for project managers, for CI/CD folks, there are all these tools. But a lot of the time as an MLE, we end up emailing our config files or some screenshots of our model performance curves to each other. And that's just a really convoluted, complicated way to do things.

To that end, Weights & Biases offers these six or so major features: data set versioning in the form of artifacts, then we have tables, our experiment tracking, which Weights & Biases is known for, Sweeps, which is your hyperparameter optimization, your evaluation section, and coming soon, production monitoring. As I alluded to before, in the before times, we had our log files that we were sharing with our colleagues, screenshots, and kind of lack of a centralized workflow to enable collaboration and efficient development as an MLE. To that end, once you import your Weights & Biases library with five lines of code and log your experiment with one line of code, we have a variety of callbacks for all the major machine learning libraries, PyTorch, TensorFlow, anything under the sun. You'll get these awesome performance curves over here. These update, so you can stash one of these performance curve charts into what is called a report, and it will be a dynamic chart. So as you run more runs of your model, as you get better performance, this chart will be dynamically reflected. So you can share a report with a colleague, and over time it will update as you have better configurations of your model.

Additionally, after you've kind of created that one chart of performance curve, you can log all kinds of things such as your images, any kind of dataset, or actually, any arbitrary file can be logged using what's called an artifact. And so those are then visualized into your workspace. Additionally, as you build and train your models and take your datasets and do some preprocessing and other work, you want to see what's going on in terms of an end-to-end process. So that's where logging an artifact and producing a graph like this comes into play. You can take your datasets, you can take your models, and you can log all kinds of, like I said, arbitrary forms of data and produce these graphs. And so this really helps you, or a colleague who's trying to replicate your work when you come back to it several months later, and you don't know what preprocessing steps you took on data. You don't know which model was most performant and why you selected that. And so for reproducibility purposes, this is amazing.

Like I mentioned before, there's a sweeps functionality in Weights & Biases. And you may have suffered through having to look at your hyperparameters, dozens or hundreds of them, and logged that performance by hand. So with one line of code, you can say, "Hey, Weights & Biases, run this hyperparameter sweep for me. And report on the most performant metrics." Again, this is logging all this to a dashboard. And there's even some handy functionality that you can see on the right there that allows you to look at positive and negative correlation of your hyperparameters, as well as the future importance. And so there's a little magic wand button, and you literally just click on that, and it will produce a nice rank list there for you to look at in terms of future performance, in terms of positive and negative correlation.

And so all of these functionalities end up fitting into what's called a report. And a report is what you can share with technical and nontechnical colleagues. They can comment, leave in-line comments, and probe and ask questions about what you've written. And so this really enables cross-functionality and teamwork with folks outside of your immediate MLE bubble. And like I mentioned, the charts are dynamic, so as you create new runs, as you find more performant versions of your model, these reports automatically update to reflect that. So you're not looking at version 1.2, and your colleague is looking at version 0.9 and comparing two different things. Again, so these reports' functionalities can be produced with all kinds of data in line. As you see here on the right, we have molecular data. We have self-driving car data. And this table on the left here, these charts can also be automatically generated inside Weights & Biases. So at the end, you basically have a report and a collateral left over from your experiment that can be used six months down the line, several years down the line to replicate your work and to share all of your findings with technical and nontechnical colleagues.

And so to conclude why experiment tracking with Weights & Biases is amazing for machine learning, your individual practitioners, so MLEs, we get to iterate faster on model building, and Weights & Biases serves as a scribe, automatically notetaking for you. Using experiment tracking as a team really boosts your productivity, your efficiency, and your collaboration. And you have decreased temporal and financial costs around model building, as you don't lose data. You don't lose intermediate artifacts. And experiment tracking, if you're writing papers for conferences or things like that is amazing because of reproducibility and then the ability for cross [inaudible] and collaboration. And so in conclusion, the three main functionalities I personally love about Weights & Biases as a MLE are that it allows me to rapidly iterate, to reproduce my models and reproduce the models of other people, and to collaborate with a wide range of technical and non-technical colleagues.

Experiment Tracking at Scale

Jimmy: Awesome. This is great, Andrea. So personally I've used Weights & Biases for a lot of different things in my work, and even as a team and everything. Now one thing that we usually see when we're using Weights & Biases is we have this amazing way to visualize all the different results that we have. We have amazing ways to track the different experiments that we've done, compare them, and a lot of the amazing things that Andrea just told us about. But then what I've run into myself is usually scaling up involves an additional level of concerns that Weights & Bases is able to track some of these things, but then once these artifacts start scaling out, or if you start trying to do some pretty aggressive data processing before you create the data set and these kinds of things, then scaling those things up can start to be a different type of a challenge. How are you scaling up those processing over a bunch of different data points? How are you dealing with changing files over time? And in particular, once you have your experiments completed, how do you start to automate the things that are being done in your experiments while also maintaining the tracking of everything going on? So basically what this means is when we start scaling things up, we end up scaling up usually three different components.

So the first component is our data. So for instance I used to be working a lot on speech recognition and NLP, and the size and quantity of the data changes rapidly. So it seems like we always try to start with a really small data set or something that's simple just to prove out that our modeling approach or our base use case can be solved. But then once we start applying it to the real world and start iterating on our data sets they can grow really rapidly. I think our data sets started with a few gigs of speech recognition data, audio and transcripts pairings. And then once we started really iterating and trying to make the model more powerful, we get to the terabyte range rather quickly. So versioning and actually being able to iterate on those data sets in a reliable fashion while also letting it grow and do experiment tracking all together was becoming quite a challenge when we started scaling up that data.

The second one is some of the compute costs associated with that. So whenever our data set would change we would typically have to reprocess a lot of that data to create this new data set. So for instance, again, in speech recognition or even in image processing, you may have all these images or all these audio files that you need to either sample or take out chunks of them, or we can imagine in computer vision, which we'll take a look at here in a second, where you have to create these bounding boxes or crops of things in order to create a reproducible data set. And to create all these things can be a pretty huge computational cost, especially as your data size and quantity is growing. Not to mention, the more experiments you're doing, if you're doing a lot of sweeps and things, for instance, in Weights & Biases, then you're adding a lot more compute costs. So you don't want to spend tons and tons of your compute just reprocessing your data sets again and again. Ideally, you'd like to spend that time actually training models and getting something of value. So being able to store that data set or have that data set be reproducible, and even incrementally iterate on that data set, where you don't have to reprocess everything all over again to create a data set, can be a huge, huge benefit.

And then the other one that Andrea even mentioned is as your team size grows, as you have additional people contributing to not only the code base, but even the quality of the data and even just how the number of experiments that are done. As this starts to increase, then not only is your data changing throughout time, not only your compute jobs and your pre-processing components changing, and you're iterating on those things, but just the number of people that are interacting with the code base can be a lot more difficult. And having transparency and even monitoring and even some of the reports that Andrea was mentioning, these things can be really, really crucial just for communication because once you start adding people, the bottleneck typically starts to become how quickly you can iterate on something and how you share that communication across the team.

And so these three components are usually, at least what I've seen, the changes that start becoming hard to wrangle once you start scaling up. These are kind of the three pillars that I see typically that scaling becomes the most difficult inside an organization. So when we talk about Pachyderm-- I'll describe it a little bit, and then we'll go into kind of how I was actually a user of Pachyderm before I joined and some of the things that we did, but also what Pachyderm actually solves. So Pachyderm itself is kind of two components. It's a data versioning system and a data pipeline system. And the data versioning system kind of has this Git like syntax for managing data artifacts. So for instance, if I had data sets, I could have version one of my data set and then create a new commit with a bunch image files for example, and that's the new version of my data set. But I still have a record and a history of everything else that's happened there. But basically, the key point there is everything is tracked inside of data repositories. The second component is our data pipelines. And so the data pipelines are interesting. There's a lot of different tools that have data pipelines, but what makes Pachyderm's pipelines different is that they are tightly coupled to the data versioning system or the data versioning repositories inside of Pachyderm. So what this means is that whenever our data artifacts change, whenever that artifact changes, or whenever our data set changes, it can basically notify the pipelines that they need to rerun. So we get this kind of event-driven, data-driven architecture that can be really powerful. So whenever I iterate on my data set, this automatically reruns my pipeline so that the output artifacts are kept in sync with the input artifacts. And so we get this full-- by combining these two things together, we get a full data lineage across everything that's happened in our system. And so this can be really, really powerful, especially as you're starting to try to automate some of not only the experimentation process, but as you're trying to productionize systems and work with things in that realm.

Now where Pachyderm fits in kind of in the overall structure of the stack is Pachyderm's built really abstractly. It's very much a platform that you can do a lot of different things on. And so what this means is we can process any data, so this is audio, video, images. All of these data types can be versioned and iterated on. And there's some amazing things inside of Pachyderm where you can deduplicate images that have come in or deduplicate any binary artifact and also get a lot of benefits of knowing what's changed between commits and things like that. Or we can even pull from different sources and version the outputs from those sources. But then how it's actually structured or what it's actually built on, it's built on top of Kubernetes and object storage. So this means not only can we scale the amount of data that's versioned inside of Pachyderm, but we can also scale the amount of compute that's on top of that version data inside of Pachyderm. And then particularly where this connects very nicely with Weights & Biases is that we have a number of integrations that tie into this platform. And they can tie into different places. They can tie into the data management layer or even the pipeline system.

And so this is really great. So for instance, with Weights & Biases - and we'll see this here in a second - what we can do is with the Python SDK for Weights & Biases, we can be able to run arbitrary jobs inside Pachyderm and log things to Weights & Biases. If we want to save an output artifact inside of Pachyderm but also have it referenced inside of the artifacts inside of Weights & Biases, we can do that. And there's plenty of other things that we can do. We can automate the generation of reports inside Weights & Biases. And anything that's being scaled out from a data processing standpoint can still all live in Pachyderm, you can scale it according to the data versioning there, but also log that inside of Weights & Biases and see everything represented there. And so this is really awesome. And not only that, we can actually tie the job run to the exact set of changes that happened in Pachyderm. So not only do you maintain reproducibility inside of the Weights & Biases realm, but you can also maintain that exact same correlation and reproducibility inside the Pachyderm realm.

So it's kind of a very general platform and the Weights & Biases integration is a really interesting way to show, how to do some of these things and get a lot of benefits from two amazing platforms out there. And at the end of the day, the goal really is to have reproducible anything. We want to be able to version things, we want to be able to collaborate, and we want to be able to produce things like data artifacts or models, materialize views from SQL queries, and then do some data processing on that, whatever that is. We want to be able to have all of that be reproducible and know what's changed in order to get there because just as we talked about, our data's always going to be changing, and our source code's always going to be changing. So how do we marry two things together and make sure that all the artifacts and everything that was reproduced in all the processing is reproducible while also not making the whole system more complex and also be able to work with the best tools in class? So this is really exciting. And then even when we started thinking about the broader team, how everyone contributing in these different layers, they can still all be reproduced and compared. And you have a reasonable system of lineage throughout everything. So this is what I'm super excited about, so we're really happy about that.

And here I just have a-- this is more on how things would look inside of Pachyderm or just to explain a little bit more how data is used as a triggering mechanism for computing jobs or kicking off runs. So if we're just looking horizontally at run one, then we see that we have version one of our raw data set, version one of our cleaning and our transformation code. This produces version one of our clean data, and then we have a model training job that's going to be able to compute once we have our clean data. And then this is going to produce some model artifacts. And so run one, everything is version one, and this is super simple. We know how to do this on our local system or inside a variety of environments, we kick off a run. But then where things get really interesting, and this is where Pachyderm really starts to shine, is when we have run two. So say we've iterated on our data cleaning code. If we redeploy a Pachyderm pipeline, Pachyderm pipelines kind of act as this living system because they are data driven. Then the iteration on that code actually will kick off a new job that will pull in version one of my data because that's the only data set that I have right now. It will produce version two of my clean data set, and then it will go through the training of the modal and then produce the same code, for example, in this case, the training of my modal and then produce my modal version two. And so this is really interesting because then we just have-- we've only updated our code. And everything else was automated to give us a reproducible fully end-to-end data tracking lineage throughout all the states.

But then if we iterate on our raw data set, which we'll do again here in a second-- when we iterate on that data set, then we actually see how everything flows. We go from version two of my raw data set, it kicks off a job for my data cleaning, which then kicks off the modal training aspect, and then I end up with version three of my modal with a full history and lineage of everything that's happened. So this is kind of a really nice way to understand and kind off visualize what ends up happening inside of Pachyderm and how we actually use data as these event triggers for pipelines and how you can kind of create this kind of full topology of data processing using Pachyderm. And what we'll see here in a second is combining this with Weights & Biases so that when our code runs we 're also getting all the benefit of being able to visualize, compare, or create reports or anything else inside of Weights & Biases.

Experiment Tracking Pipeline Example

Jimmy: All right. So with that, let's talk about how this actually works. So with Pachyderm and Weights & Biases the main way that we are going to see in this integration is it's very simple. So how we actually are going to do this is all we really need in order to do all these things is really just have our Weights & Biases key inside of the code that's going to execute. So Pachyderm because it's built on top of Kubernetes, all of the pipelines themselves are docker containers. And so we can see the configuration here for what a pipeline looks like inside of Pachyderm and what this actually entails. So I have my pipeline name, and this case it's going to be just model, so my modeling pipeline, and this is going to grab an input. In our case we're going to look at an object detection training pipeline. So this is going to take-- we're going to use a reduced version just for time's sake. It can take a really long time to trade these models. So we're going to use a reduced data version of our COCO data set, COCO128. And we have a data repository that's going to contain that data set.

I'll skip over the glob pattern. That's basically a way that we can parallelize out a lot of data preprocessing things, which we won't really looking to in too much detail, but there's some really powerful things we can do there. And then we have this transform, and so this transform will actually take a docker image-- because we're built on top of Kubernetes, all of our pipelines end up being docker containers that can be built with any type of code. So if you're using Python or Go or Java or Scala, anything can be contained in to Pachyderm pipeline. And what we're going to do is we're also we're going to pass a Pachyderm secret. So the secret is a secure way to keep your Weights & Biases key private and separate from everything so that there's no vulnerabilities or anything like that. And it uses Kubernetes Secrets manager under the hood to make sure that everything is safe and secure and doing the right stuff there. And we'll see how to actually create that Secret.

And then finally, what we are actually doing inside of the container when it runs is we're executing this command. So I think this is an old one. We were doing an object detection one instead of [inaudible], but basically, this command, it could kick off a Python process. It could kick off any type of a process depending on what's inside that container. In our case, it will actually kick off a Python process. We'll tell it what the data location is for our input data, which actually, our input PFS that we see here is actually going to get mapped into the file system inside of the Pachyderm pipeline. So /PFS/ [inaudible] is what I have here. I think I just have a typo in this slide. This should be /PFS/COCO128. And then that would be the location of our data set. And then we could also have the Weights & Biases project name where we're going to log in and do everything we want there.

So with that, I think we can shift over to seeing the demo in action.

All righty. I hope everyone can see my screen, but someone shout at me if that's not the case. All right. So what we have here is this object detection with Pachyderm and Weights & Biases. So also all this code is on our GitHub. I'll drop a link to that here in just a moment, and you can open it and run it and do whatever you like there to kind of see and understand how this works. So in addition to just having a JupyterLab notebook environment up and running. I also have my Weights & Biases, I guess, project up and running as well. So this is my account. And then I also have a Pachyderm cluster that's up and running. So this is the Pachyderm console. This is where we're going to see once I start creating some things where I'll see data repositories and pipelines and jobs, and we can kind of explore all the stuff that's happened in Pachyderm. But underneath the hood, this is built on top of, again, Kubernetes and object storage, and Pachyderm's handling all that stuff. This is kind of the UI view into what's going on inside there.

In addition to that, I have one other thing. So this is our Pachyderm mount extension. And so this just means I can actually interact with the data that's been versioned inside of Pachyderm. So I can see when I create data repositories, I can look through the data that's inside those data repositories just to understand, and even run examples of code against what I'm seeing inside of Pachyderm. So this is just a nice easy way to make it seem like the version data is on your system. So the first thing that we're going to do-- so this is going to be a short and simple data processing-- or, I'm sorry, object detection modeling pipeline. So like we talked about step zero, we're going to have our Pachyderm cluster and our Weights & Biases account. And the first thing I'm going to do is I'm going to download my data set, and then I'm going to create a data depository called COCO128, I can see it created a data repository, but there's no branches or no data in it yet. And then I'm going to upload my dataset to that data repository. Going to shrink that down. It might take a quick second to upload everything. And so what this it going to do is it's going to create a new commit inside my COCO128 data repository, and this new commit is going to contain all the data for my data set. I created it on COCO-- so, I'm using the Pachyderm CLI, so I'm just doing kind of some shell commands inside of this Notebook. But just using the put file, and then I'm referencing the data repository, the branch that I want that data repository to be, and then if I want any specific path locations, then I can do that.

So then now just to check and see everything that worked out all right, I can mount that data repository, and I can see what's inside of my data set and see I have a bunch of training images. And if I open one of these images, pretty uninteresting, but yeah, nice little room with some blinds in there.

So this is great, so I can see everything that's happened inside of my data set, and yeah, I can then unmount that data if I would like to. So the next thing that I'm going to do is I'm just going to check over here at my Pachyderm console, and I can see that I have one commit, and this commit has my full data set in here. So this is just a way that we 're just checking and seeing like, "Oh, okay, we have this commit ID." Everything is tracked inside of Pachyderm, and nothing's happened inside of Weights & Biases yet, for us, so. The next thing that I'm going to do is I'm going to create and deploy these Weights & Biases secret. So what the secret actually looks like under the hood is it's just a simple JSON file. So this is basically how my pipeline, once I create it, is going to be able to understand how to communicate with Weights & Biases. So that's the wrong thing. So if I look at my secret, this is essentially what the secret format is, and it's pretty simple. We'll just paste our Weights & Biases secret right in there-- oh sorry, our API key right in there. So once we do that, we're going to be able to create this Weights & Biases secret. I've already created it to test that this demo's going to work. And then finally, we're going to create our Weights & Biases model training pipeline. So if I look inside what this training configuration looks like for a Pachyderm pipeline, it's pretty straightforward. So this is just, in our case, another JSON file, but it has the input that we saw earlier. So the COCO128 dataset is our input. We can see the transformation that's happening, so it's running this Python train.py command, and we're passing it some file paths that are going to exist inside the container. We're giving it an image, a Docker image tag, which is where all my code is packaged up inside of it. So the fact that we're using Docker images means that we have a completely reproducible scenario for our code, not only just the code, but also everything, like the full system level dependencies, and operating system level dependencies, all contained in a single place. So everything will execute the exact same way every single time.

And then finally, we have our Weights & Biases secret, our API key that we paste in here as a secret so that my Weights & Biases code that exists inside of this Python file will know how to execute and what to execute. So after I've created that pipeline, I can go over here and I can see my model pipeline has been created. So if I look at this model pipeline-- I may have to refresh one thing real quick. So if I look at my modeling pipeline, I can see the spec that we just looked at inside of the JSON format, and then I can also see the jobs. So for instance, this job was just created. It'll take just a minute for it to download my Docker container and start executing. And I can see that I have this job ID associated with it as well. And eventually, once it actually kicks off, I'll be able to look at my logs and everything else that's going along with that. So while we're checking this up, we'll give it just a couple of minutes to kick off and start running. But we can just see, yeah, this is just a simple way for everything to kind of be able to run nice and neat. And so I have [done?] projects over here, but as soon as this kicks off, then we'll be able to see my new project ID. And let's see if I'm just out of sync. All right, so we're still waiting for it to download the container, and then once that container is kicked off, then we'll be able to see that it will start populating Weights & Biases with all the information.

And I must have forgotten to pray to the demo gods as soon as I started this because things are running a little slower as always than you would actually like in a live demo. There we go. All right. So things are running. Now if I go over to the pipeline, I can see that I have one running job that was created three minutes ago. Now if I look at the logs, I can see it's going to start spitting out a variety of things to process the dataset, and also we'll see it start doing some of the training code and everything else like that. If we pivot over to the Weights & Biases project, this may take a couple seconds for it to recognize that the job is running or to actually accumulate all the information, but we can see-- great. This created my object detection project here in Weights & Biases, and I can also see that I have one run here, and it's kind of just spinning up to train my model. But I have one run. And we'll see this semi-chaotic-looking hash value that's right here. But what's interesting is this hash value is actually the same one as the GlobaliD here inside of Pachyderm. So we can see the 1CF99 is the same as this 1CF99 here. So what this means is that we have this unique identifier that not only ties you to what's going on inside of Weights & Biases but also the exact same ID that will give you all the specifics of what that job means inside of Pachyderm. So for instance, we can actually do an interesting filter by GlobaliD, and when we apply that, we can see all the different changes in the state of our model training pipeline, as well as the state of our input dataset all at the same time.

So this will take a little bit of a-- a little bit of time to run. I can check out where we are as far as the training pipeline goes. I have to refresh these logs. So we're maybe a couple of epochs in-- two or three epochs into our training, and then once that's finished, then we'll be able to see the model artifacts and everything. Now in the interest of time, I'm not going to wait for the model to fully train or anything like that. We're just going to see that, yes, it's logging the information to Weights & Biases, which is great. Let's see. I'll wait for this to load so we can see the training arguments that are also coming in. So we have the global step, and then we'll have the loss values as well as, yeah, any of the training accuracies and everything else that we're populating inside of that training pipeline as we go. Now the other cool thing, remember when we created this pipeline inside of Pachyderm, we just did a create pipeline with our Weights & Biases pipeline specification. But what we can also do, or another thing that's important to do, or to note, is that we actually were deploying this pipeline during the process inside of Pachyderm because, as we recall, Pachyderm pipelines are data driven. We had already created a commit inside of our COCO128 data repository, which means that there was a commit to kick off this model training process. But this also means that any time we modify that dataset, it'll kick off another training job inside of Pachyderm because of the way we'd set up this job specification.

So this is really exciting and really cool. So this thing's going to be training, [laughter] and it's in the process of training. But if I actually modify and put a new image file into my data repository for the COCO128 dataset, we can go over here, and we can also see that my pipeline will start a new job. And then once this thing actually kicks off, it will create a new-- or we've already got a job ID associated with it, so this 71FC9, and once that's fully kicked off and everything, we'll see that populate here so that we can have a comparison for all the jobs that ended up getting created inside Pachyderm as they run and they're iterating. I think this is going to take a little bit longer for this pipeline to kick off. So we'll give that a second. But, yeah, that's essentially the main things that we're going to look at.

And so other things that can happen-- so we talked about sweeps, and we talked about-- or Andrea's talked about sweeps and also artifacts and those kinds of things. The exciting thing is all of the code that's written inside of our Pachyderm pipeline, so, for instance, if I look at the training.py file, I'm using, in this case, the Pytorch Lightning Weights & Biases logger, but any of the Weights & Biases code that you would potentially want to use, all of that is inside of this Docker image that's getting executed. So for instance, if I want to add any additional things like logging artifacts to Weights & Biases artifacts, or if I wanted to create a report or anything else, any code that you can write inside of Weights & Biases or for a Weights & Biases project, you can also write that inside of a Pachyderm pipeline. So things are super general. So anything that you want to execute, you can actually make. Essentially anything that you want to write, you can make it data driven where it's triggered by any events that happen inside of your data. So that's super exciting, and we love seeing that.

I'm trying to click over here one more time [laughter]. This is going to maybe take longer than I would like for the demo to kick this running off. I think it's waiting for this one to complete. So that's probably something that we're not going to have time to wait and see. But we'll see essentially one more-- yeah, we can see that my training is still going on for that one. But we'll see one additional line item here. And then we're in a comparable state between all the different runs of our project. And with that, I'll pull this back up as we're going through Q&A just so people can keep me honest and see that the things are correlating and everything like that. But I believe that kind of concludes our demo. And then we can go back to looking at our slides that we have here. Andrea, anything that you wanted to add during the process?

Andrea: I don't think so. That was great.

Jimmy: Awesome. Cool. So that takes care of it for our demo. I'll pull it up again at the end just so people can see the comparison of these two pipelines once that one finishes running. I'm using a pretty small cluster, so I think I'm just waiting for resources to be allocated for that other pipeline.