Machine Learning For Video Analysis Pipeline Case Study

Key Benefits

Automates and orchestrates multiple ML pipelines to solve complex challenges
Speeds debugging and output analysis through data versioning
Scales ML pipelines through automatic parallel processing to handle
Lowers compute and storage costs with automatic "diff-based" processing
Saves 1,000s of work hours per year through a combination of pipelines

We’ve tried solutions such as Argo Workflows or DVC that delivered some of the same capabilities, but it’s cumbersome to juggle different tools. The benefit of Pachyderm is that all of these features are highly coupled in one coherent platform, which works really well for us.

Business Challenge:
Create a Modular Approach to Complex AI

RTL Nederlands broadcasts to millions of daily TV viewers, along with delivering streaming content that garners hundreds of millions of monthly views online. Its parent, RTL Group, is Europe’s largest broadcaster and part of Bertelsmann, one of the world’s largest media conglomerates.

One of the key growth metrics for RTL Nederlands is viewership, but optimizing the value and discoverability of video assets is an extremely labor-intensive endeavor. That makes it ripe for automation, and the team applied machine learning to optimize key aspects of its video platform, like creating thumbnails and trailers, picking the right thumbnail for those trailers, and inserting ad content into video streams. The right thumbnail might not seem crucial but it makes all the difference in the world to whether someone clicks or passes a video by forever.

“We want to use artificial intelligence to make sure we optimally apply our human intelligence, so our teams can be more creative and connected,” says Vincent Koops, senior data scientist at RTL Nederlands. “The problem is that it’s not easy to apply AI to managing unstructured content; these content operations are computationally complex, making video AI challenging from a data science perspective.”

The team solved this problem by breaking complicated tasks into simpler subtasks, eliminating larger task-specific models in favor of an assembly of reusable modules. This modular approach to machine learning allowed the company to train the AI on various elements of the video stream across visual (frame extraction, shot segmentation, facial recognition), audio (tagging, speech identification, musical genre) and text (language detection, key phrases) subtasks.

Pachyderm's ability to weave multiple ML pipelines together to solve more complex use cases was critical for our usecase because we need to be very flexible to meet our business demands

Technical Challenge:
Scalable processing of Video Data

Pachyderm provides the data layer that allows machine learning teams to productionize and scale their machine learning lifecycle. With Pachyderm’s industry leading data versioning, pipelines and lineage, teams gain data-driven automation, petabyte scalability and end-to-end reproducibility. For RTL Nederlands, Pachyderm was the key to combining and orchestrating the various subtasks into a unified way to process videos at scale. Not only that, but video processing is resource intensive. Pachyderm’s incrementality allowed the team to only process new videos as they arrive or change, rather than reprocessing everything from scratch. This delivered tremendous speed to their approach, saving time and money.

RTL Nederlands also needed to track metadata and content extracted from video streams to assess and improve the effectiveness of its AI models. Pachyderm’s immutable lineage ensured this endto-end reproducibility. Even more importantly, Pachyderm allows teams to rewind to older versions of the code, data or models, so if a new thumbnail or clip wasn’t performing as well as a previous version they could roll back the change to the more highperformance versions.

Lastly, video data is petabyte-scale data, which made Pachyderm’s ability to scale to multiple petabytes crucial to meeting RTL Nederlands’ goals. Pachyderm’s inherent parallel processing and code agnosticism allowed the team to handle a huge volume of video without code changes, while its scalable data versioning optimized storage and computational costs.

Pachyderm’s pipelines are the building blocks that enable us to solve very complex problems. Effectively applying AI to just one of these tasks can save over a thousand hours a year.

Technical Challenge:
Orchestration based on Data Changes

With Pachyderm, RTL Nederlands can easily combine various AI models to solve much more complex video challenges, such as identifying the optimum point for ad insertion, or selecting clips for a compelling thumbnail.

Determining ad insertion can be tricky. Business rules dictate ad frequency, but just playing an ad at predetermined intervals will disrupt dialogue or ruin a key scene, substantially degrading the viewing experience. Ideally, the ad should appear between scene and dialogue transitions – something the team at RTL Nederlands has trained its AI to recognize.

In this instance, videos are housed in the cloud on Azure, S3 or a similar service, and imported into a video repository in Pachyderm, where pipelines extract the information necessary to detect an ideal ad insertion point, staying in line with business rules about ad frequency, content restrictions, and more. Boundary detection determines shot transitions with high probability based on frame-to-frame changes in color histograms. Finally, speech detection ensures a break in the dialogue so that speakers aren’t cut off mid-sentence by an unexpected ad. Combined together, all these models allow automated ad insertion without degrading the viewing experience.

Koops notes that this approach – creating simple subtasks that are orchestrated through Pachyderm to solve complex challenges – can apply across domains. It allows the company to use or adapt publicly available AI models to its unique needs. “This sort of orchestration works for any task that can be distilled down into smaller components; it’s much more efficient and flexible than creating monolithic models, and with Pachyderm it’s easy to recombine the components to solve other interesting challenges.”

One of the powerful features of Pachyderm is that data is treated as a first-class citizen. Automation is as simple as adding data to the input repository to trigger pipelines downstream, which creates a nice, traceable way of computing output.

The Future
Expanding Viewership by Increasing Value of Broadcast Assets

The team continues to focus on the their key growth metrics of viewership, and optimizing the value and discoverability of video and other broadcast assets. Letting the computer and ML to do this reduces this labor-intensive endeavor.

With their new scalable platform they are creating new offerings that are broader in scope and add additional services that their customers need.

Applying ML to Increase the Value of Media Assets

Key Benefits

Business Challenge:
Create a Modular Approach to Complex AI

Technical Challenge:
Scalable processing of Video Data

Technical Challenge:
Orchestration based on Data Changes

The Future
Expanding Viewership by Increasing Value of Broadcast Assets

Download the Case Study

Key Benefits

Business Challenge: Create a Modular Approach to Complex AI

Technical Challenge: Scalable processing of Video Data

Technical Challenge: Orchestration based on Data Changes

The Future Expanding Viewership by Increasing Value of Broadcast Assets

Download the Case Study

Business Challenge:
Create a Modular Approach to Complex AI

Technical Challenge:
Scalable processing of Video Data

Technical Challenge:
Orchestration based on Data Changes

The Future
Expanding Viewership by Increasing Value of Broadcast Assets