RTL Nederlands broadcasts to millions of daily TV viewers, along with delivering streaming content that garners hundreds of millions of monthly views online. Its parent, RTL Group, is Europe’s largest broadcaster and part of Bertelsmann, one of the world’s largest media conglomerates.
One of the key growth metrics for RTL Nederlands is viewership, but optimizing the value and discoverability of video assets is an extremely labor-intensive endeavor. That makes it ripe for automation, and the team applied machine learning to optimize key aspects of its video platform, like creating thumbnails and trailers, picking the right thumbnail for those trailers, and inserting ad content into video streams. The right thumbnail might not seem crucial but it makes all the difference in the world to whether someone clicks or passes a video by forever.
“We want to use artificial intelligence to make sure we optimally apply our human intelligence, so our teams can be more creative and connected,” says Vincent Koops, senior data scientist at RTL Nederlands. “The problem is that it’s not easy to apply AI to managing unstructured content; these content operations are computationally complex, making video AI challenging from a data science perspective.”
The team solved this problem by breaking complicated tasks into simpler subtasks, eliminating larger task-specific models in favor of an assembly of reusable modules. This modular approach to machine learning allowed the company to train the AI on various elements of the video stream across visual (frame extraction, shot segmentation, facial recognition), audio (tagging, speech identification, musical genre) and text (language detection, key phrases) subtasks.
Pachyderm provides the data layer that allows machine learning teams to productionize and scale their machine learning lifecycle. With Pachyderm’s industry leading data versioning, pipelines and lineage, teams gain data-driven automation, petabyte scalability and end-to-end reproducibility. For RTL Nederlands, Pachyderm was the key to combining and orchestrating the various subtasks into a unified way to process videos at scale. Not only that, but video processing is resource intensive. Pachyderm’s incrementality allowed the team to only process new videos as they arrive or change, rather than reprocessing everything from scratch. This delivered tremendous speed to their approach, saving time and money.
RTL Nederlands also needed to track metadata and content extracted from video streams to assess and improve the effectiveness of its AI models. Pachyderm’s immutable lineage ensured this endto-end reproducibility. Even more importantly, Pachyderm allows teams to rewind to older versions of the code, data or models, so if a new thumbnail or clip wasn’t performing as well as a previous version they could roll back the change to the more highperformance versions.
Lastly, video data is petabyte-scale data, which made Pachyderm’s ability to scale to multiple petabytes crucial to meeting RTL Nederlands’ goals. Pachyderm’s inherent parallel processing and code agnosticism allowed the team to handle a huge volume of video without code changes, while its scalable data versioning optimized storage and computational costs.
With Pachyderm, RTL Nederlands can easily combine various AI models to solve much more complex video challenges, such as identifying the optimum point for ad insertion, or selecting clips for a compelling thumbnail.
Determining ad insertion can be tricky. Business rules dictate ad frequency, but just playing an ad at predetermined intervals will disrupt dialogue or ruin a key scene, substantially degrading the viewing experience. Ideally, the ad should appear between scene and dialogue transitions – something the team at RTL Nederlands has trained its AI to recognize.
In this instance, videos are housed in the cloud on Azure, S3 or a similar service, and imported into a video repository in Pachyderm, where pipelines extract the information necessary to detect an ideal ad insertion point, staying in line with business rules about ad frequency, content restrictions, and more. Boundary detection determines shot transitions with high probability based on frame-to-frame changes in color histograms. Finally, speech detection ensures a break in the dialogue so that speakers aren’t cut off mid-sentence by an unexpected ad. Combined together, all these models allow automated ad insertion without degrading the viewing experience.
Koops notes that this approach – creating simple subtasks that are orchestrated through Pachyderm to solve complex challenges – can apply across domains. It allows the company to use or adapt publicly available AI models to its unique needs. “This sort of orchestration works for any task that can be distilled down into smaller components; it’s much more efficient and flexible than creating monolithic models, and with Pachyderm it’s easy to recombine the components to solve other interesting challenges.”
The team continues to focus on the their key growth metrics of viewership, and optimizing the value and discoverability of video and other broadcast assets. Letting the computer and ML to do this reduces this labor-intensive endeavor.
With their new scalable platform they are creating new offerings that are broader in scope and add additional services that their customers need.