Keynote Speakers

founder-CEO of Distance

Urho Konttori

Senior Director, Encoding Tech at Netflix

KEYNOTE TALKS

Anne Aaron - Engineering Large-Scale Live Video Streaming: Architecture and Encoding Challenges

Live video streaming at global scale presents a distinct set of research and engineering challenges, including massive, unpredictable audience spikes, and strict real-time constraints across heterogeneous devices and networks. Meeting these requirements while maintaining smooth playback and high visual quality for millions of people tuning in at once, requires rethinking traditional streaming architectures and encoding strategies.

In this keynote, we examine the end-to-end design of a large-scale live streaming system, from content production and cloud-based processing to content delivery networks and client playback. Using the Netflix Live platform as a concrete case study, we discuss the architectural choices behind a highly reliable Live Origin and share insights from past, ongoing, and future efforts to improve network efficiency and video quality under real-time constraints. The talk highlights open challenges and design trade-offs that are relevant to live streaming systems.

Pavlo Molchanov - Toward Generalized Image Processing with Foundation Models

Vision-language models are emerging as a new generalized interface for image processing, moving the field beyond separate pipelines for captioning, retrieval, recognition, and reasoning toward unified visual systems. These models consist of two core components: a language model for reasoning and a vision backbone that converts multimedia inputs into representations the language model can interpret. Vision backbones, or image foundation models, such as CLIP, DINOv2, SAM, and RADIO are shifting research and development toward generalized visual backbones that can support a wide range of image processing tasks with few-shot adaptation or minimal fine-tuning, while reducing the need for task-specific preprocessing. In this talk, I will discuss recent progress and remaining challenges in building such open models, with an emphasis on training and deployment efficiency. I will also highlight what remains unsolved for real-world deployment in robotics, autonomous vehicles, and general computer vision, including robustness, controllability, grounding, efficiency, and evaluation beyond closed benchmarks. The broader goal is to position VLMs not merely as a multimodal trend, but as a serious foundation for the next generation of generalized image processing systems.

Juha Alakarhu - Designing Cameras for Trust: Imaging Requirements and Challenges in Law Enforcement

Law-enforcement cameras are designed to document incidents and provide transparent, trustworthy evidence. Unlike consumer imaging devices, which are optimized primarily for subjective image quality, these systems must deliver true-to-life, tamper-resistant video capable of withstanding legal scrutiny. This talk examines the imaging design principles shared across law-enforcement camera platforms, from hardware choices to image-processing pipelines, and discusses the key requirements that shape their development. The role of AI is considered with particular emphasis on preserving evidentiary trust: improving usability and performance without introducing artifacts that could compromise authenticity. The talk also explores the relationship between human vision and camera capture, and why understanding the differences between the two is essential when interpreting recorded evidence. Finally, it outlines future challenges and opportunities in advancing law-enforcement imaging.