Loading…
Attending this event?
WiFi -
  • SSID: Hyatt_Meeting
  • Password: Replay2024
Friday September 20, 2024 9:00am - 9:30am PDT
Media processing workflows are inherently complex, often requiring extensive state management and continuous updates to ensure media encodes remain current. At Netflix, we have developed the Plato media workflow platform, a key component of our larger Cosmos media processing system. This talk will delve into how we process millions of media workflow events daily, ensuring durability and scalability while enhancing the developer experience.


Enhancing RPC Durability in Media Workflows

Cosmos is a microservices-based platform, where each processing component is implemented as a microservice or as a serverless function. This necessitates the workflows to make RPC calls to execute the tasks at scale asynchronously. Plato’s unique approach to handling remote procedure calls (RPCs) using message-passing techniques makes the flaky RPC calls more durable and reliable. This adaptation allows our users to build on a resilient RPC client foundation, mitigating the impacts of potential failures on workflow continuity.


Scaling to Millions of Workflow Events

The media processing domain is characterized by its bursty nature of work, where the demand for producing encodes often exceeds available compute resources. To address this, Plato incorporates features like priority-based task queues, execution avoidance, and a combination of dynamic and static graph execution models. Together, these features enable us to process millions of workflow events daily. We will present real-world scenarios that showcase how these technologies allow Plato to efficiently scale up and durably execute millions of workflow events.


Prioritizing Developer Experience
While ensuring durability is crucial for our users, it cannot come at the cost of developer experience. The Plato platform allows users to seamlessly bring their own strongly typed data models. This feature ensures that workflow execution state can be stored and retrieved reliably, testing workflows with strong contracts, and lowering the barrier to entry for our users by enhancing the platform’s usability. We will highlight case studies that demonstrate how Plato provides a good developer experience and discuss some of the open challenges we are working on.


Conclusion

This talk will provide an overview of how Netflix implements durable executions to process media encodes at scale. Attendees will gain insights into the challenges and techniques that Netflix uses in the media processing space, with practical examples from the Plato platform that highlights our approach to durability, scalability, and developer experience.

References
  • For an overview of the underlying technologies and design principles of the Cosmos platform, please refer to our blog post here.
Speakers
avatar for Dmitry Vasilyev

Dmitry Vasilyev

Staff Engineer, Netflix
I'm a graduate of BSU in Minsk, Belarus. I spent 7 years building an online marketplace before joining Netflix in 2016. At Netflix I have been working on workflow orchestration distributed systems in the area of media processing. Currently my interests also include serverless multi... Read More →
avatar for Naveen Mareddy

Naveen Mareddy

Senior Staff Engineer - Content Infrastructure Solutions (CIS), Netflix
Naveen Mareddy is a Senior Staff Engineer in Netflix's Content Infrastructure Solutions (CIS) group, where he works at the intersection of media processing platforms and large-scale distributed cloud computing systems. His team is responsible for building and managing the infrastructure... Read More →
Friday September 20, 2024 9:00am - 9:30am PDT
Wanderers Stage | Evergreen Ballroom (A-C)

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link