Keynote Speakers

Background


KEYNOTE TALKS


Urho Konttori - Beyond the Windshield: Imaging, Perception, and Augmented Vision for the Physical World

Modern imaging systems are evolving beyond image capture and display toward integrated perception systems that help people understand and act in the physical world. Advances in sensors, computational imaging, computer vision, foundation models, and real-time graphics are enabling new forms of augmented vision that can enhance situational awareness across applications ranging from transportation and industrial operations to defence and aviation.

This talk explores the emerging architecture of augmented vision systems, where information from multiple sensors, imaging pipelines, and AI models is combined into a coherent visual experience for a human operator. Drawing on experiences from the development of automotive, aerospace, and wearable visual technologies, the presentation discusses the challenges of latency, registration accuracy, visual trust, human factors, and deployment at scale. The talk also examines how recent progress in AI and multimodal perception is changing the role of image processing—from improving image quality to helping users understand complex environments in hard real time requirements.

Finally, the presentation outlines key research opportunities at the intersection of image processing, computer vision, human perception, and interactive systems, and argues that the next generation of imaging technologies will be defined not only by what machines can see, but by how effectively visual information can support human decision-making.


Anne Aaron - Engineering Large-Scale Live Video Streaming: Architecture and Encoding Challenges

Live video streaming at global scale presents a distinct set of research and engineering challenges, including massive, unpredictable audience spikes, and strict real-time constraints across heterogeneous devices and networks. Meeting these requirements while maintaining smooth playback and high visual quality for millions of people tuning in at once, requires rethinking traditional streaming architectures and encoding strategies.

In this keynote, we examine the end-to-end design of a large-scale live streaming system, from content production and cloud-based processing to content delivery networks and client playback. Using the Netflix Live platform as a concrete case study, we discuss the architectural choices behind a highly reliable Live Origin and share insights from past, ongoing, and future efforts to improve network efficiency and video quality under real-time constraints. The talk highlights open challenges and design trade-offs that are relevant to live streaming systems.


Pavlo Molchanov - Toward Generalized Image Processing with Foundation Models

Vision-language models are emerging as a new generalized interface for image processing, moving the field beyond separate pipelines for captioning, retrieval, recognition, and reasoning toward unified visual systems. These models consist of two core components: a language model for reasoning and a vision backbone that converts multimedia inputs into representations the language model can interpret. Vision backbones, or image foundation models, such as CLIP, DINOv2, SAM, and RADIO are shifting research and development toward generalized visual backbones that can support a wide range of image processing tasks with few-shot adaptation or minimal fine-tuning, while reducing the need for task-specific preprocessing. In this talk, I will discuss recent progress and remaining challenges in building such open models, with an emphasis on training and deployment efficiency. I will also highlight what remains unsolved for real-world deployment in robotics, autonomous vehicles, and general computer vision, including robustness, controllability, grounding, efficiency, and evaluation beyond closed benchmarks. The broader goal is to position VLMs not merely as a multimodal trend, but as a serious foundation for the next generation of generalized image processing systems.


Juha Alakarhu - Designing Cameras for Trust: Imaging Requirements and Challenges in Law Enforcement

Law-enforcement cameras are designed to document incidents and provide transparent, trustworthy evidence. Unlike consumer imaging devices, which are optimized primarily for subjective image quality, these systems must deliver true-to-life, tamper-resistant video capable of withstanding legal scrutiny. This talk examines the imaging design principles shared across law-enforcement camera platforms, from hardware choices to image-processing pipelines, and discusses the key requirements that shape their development. The role of AI is considered with particular emphasis on preserving evidentiary trust: improving usability and performance without introducing artifacts that could compromise authenticity. The talk also explores the relationship between human vision and camera capture, and why understanding the differences between the two is essential when interpreting recorded evidence. Finally, it outlines future challenges and opportunities in advancing law-enforcement imaging.