Special Sessions

ICIP 2026 features a diverse set of Special Sessions that spotlight emerging topics and innovative research directions in image processing. These sessions provide a focused forum for presenting cutting-edge work and fostering collaboration within specialized areas. Authors are invited to submit papers to any of the accepted Special Sessions listed below via the conference paper management system and contribute to advancing the frontiers of the field.

Multimodal Image and Video Processing for Human-Centric Understanding and Emotion-Aware Applications

Organizers:
– Imad Rida, Université de Technologie de Compiègne, Compiègne, France
– Xiaohan Yu, Macquarie University, Sydney, Australia
– Erik Cambria, Nanyang Technological University, Singapore, Singapore
– Xianxun Zhu, Shanghai University, Shanghai, China

Recent advances in sensing and generative modeling have led to a rapid growth of multimodal data, where visual streams (images, video) are tightly coupled with audio, language, physiological, and other sensor signals. However, most image and video processing pipelines at ICIP are still predominantly unimodal, limiting their robustness and ability to capture rich human-centric semantics such as intent, affect, and context. This special session aims to bring together researchers working at the intersection of image processing, multimodal learning, and real-world applications to advance multimodal image/video representation, fusion, and reasoning. The committed papers will cover methods for multimodal feature extraction from visual data and companion modalities; cross-modal alignment and contrastive learning; causal and robust fusion under noisy or missing modalities; efficient and privacy-preserving multimodal processing on edge devices; as well as evaluation protocols and benchmarks. Application domains include, but are not limited to, multimodal emotion and sentiment analysis, human–computer interaction, healthcare and assistive monitoring, autonomous driving and smart cities, remote sensing, and AR/VR. By focusing on image- and video-centric multimodal processing, the special session directly aligns with ICIP’s core themes while highlighting emerging topics such as foundation models for vision–language–audio, diffusion-based generative perception, and trustworthy, human-centered AI. The special session format is particularly suitable to showcase a coherent set of high-quality contributions spanning theory, algorithms, and applications, and to foster cross-pollination between the image processing, computer vision, affective computing, and multimodal learning communities.

Generative AI for Advanced Imaging: Restoration and Understanding

Organizers:
– Najib Ben Aoun, Al-Baha University, Saudi Arabia
– Imed Ben Dhaou, Dar Al-Hekma University, Saudi Arabia
– Sadique Ahmad, Prince Sultan University, Saudi Arabia

Recent advances in generative AI are transforming the field of image and video processing, enabling high-quality restoration, enhancement, and understanding across diverse imaging domains. Generative models, including diffusion-based architectures, GANs, and foundation models, are now capable of reconstructing images from degraded inputs, synthesizing realistic content, and producing interpretable representations for analysis. These capabilities are impacting a wide range of applications, from medical imaging and microscopy to remote sensing, mobile photography, and autonomous systems.

This special session aims to bring together contributions that leverage generative AI for advanced imaging tasks. Topics include, but are not limited to: image and video restoration, super-resolution, inpainting, denoising, multimodal fusion, cross-domain reconstruction, and scene understanding. Papers focusing on end-to-end generative imaging pipelines, self-supervised learning for restoration tasks, and novel evaluation metrics are also welcome.

By consolidating cutting-edge research in a dedicated session, we aim to foster interdisciplinary discussion and highlight the transformative potential of generative AI for imaging. The session provides a platform for presenting new algorithms, system-level designs, and high-impact applications, aligning with ICIP 2026’s mission to integrate deep domain knowledge with modern AI tools. It will appeal to both academic researchers and industry practitioners and is expected to drive further innovation in generative imaging technologies.

Algorithms and Applications of Embodied Image Processing

Organizers:
– Yang Liu, Tongji University, China & University of Toronto, Canada
– Jing Liu, University of British Columbia, Canada
– Dingkang Yang, Fudan University, China & Fysics AI, China
– Peng Sun, University of Ottawa, Canada

The integration of embodied intelligence with image processing is fundamentally transforming visual perception systems that operate in physical environments. To capture this evolution, the proposed special session focuses on embodied image processing, emphasizing algorithms specifically designed for systems that actively interact with real-world environments. It highlights a critical paradigm shift for robotic perception, autonomous systems, and human-centered applications. In contrast to traditional image processing in controlled settings, embodied systems face unique challenges including motion-induced artifacts, dynamic lighting, sensor mobility constraints, real-time requirements, and closed-loop perception-action coupling. Consequently, fundamentally new algorithmic approaches that are physically grounded and interaction-aware become necessary. While current IEEE ICIP regular tracks comprehensively address individual processing aspects, embodied systems require integrated solutions spanning multiple categories. Key areas include low-level processing for motion blur and adaptive exposure, mid-level algorithms for depth estimation and 3D reconstruction, and high-level understanding with physical plausibility constraints, all operating under strict real-time and resource constraints of mobile platforms. Given the cross-cutting nature of these challenges, a dedicated forum explicitly bridging sensing hardware, processing algorithms, and physical interaction requirements is essential.

The primary goal of the session is to advance the state-of-the-art in physically-grounded visual perception, resource-efficient processing for mobile robots, robust algorithms under motion and environmental variations, and 3D scene understanding for embodied agents. Furthermore, the organizing team brings proven expertise through successful special sessions at various venues, and leadership roles at top-tier conferences. Supported by six high-quality paper commitments from leading institutions and industry labs spanning diverse embodied vision challenges, the session aims to establish IEEE ICIP as the premier venue for image processing research bridging digital algorithms with physical world constraints.

Intelligent Brain Imaging for Healthcare

Organizers:
– Aiping Liu, University of Science and Technology of China
– Jiayue Cai, Shenzhen University, China
– Martin J. McKeown, University of British Columbia, Canada

Recent advances in neuroimaging have enabled unprecedented access to the structural, functional, and microstructural organization of the human brain. At the same time, the rapid evolution of artificial intelligence has created new opportunities to transform brain imaging into clinical knowledge. Despite this progress, significant challenges remain in translating complex brain imaging data into robust, interpretable, and clinically relevant biomarkers for diagnosis, prognosis, and treatment monitoring.

This special session focuses on Intelligent Brain Imaging for Healthcare, aiming to bring together cutting-edge research that integrates advanced image processing, machine learning, and domain-informed modeling for brain healthcare applications. This session will cover methodological advances across multiple imaging modalities, including MRI, fMRI, diffusion imaging, PET, and emerging neuroimaging techniques, as well as their applications in neurological and psychiatric disorders. Emphasis will be placed on intelligent analysis frameworks that address challenges including, but not limited to, high dimensionality, spatiotemporal complexity, inter-subject variability, limited annotation, multimodal integration and clinical interpretability. The papers featured in this session collectively address these challenges through novel methodological development. The expected impact of this special session is to highlight emerging trends, stimulate interdisciplinary collaboration, and accelerate the translation of intelligent brain imaging methods into real world healthcare applications.

Generative Visual Coding: Emerging Paradigms for Future Communication

Organizers:
– Jianhui Chang, China Telecom, China
– Hadi Amirpour, University of Klagenfurt, Austria
– Giuseppe Valenzise, Université Paris-Saclay, France

Generative Visual Coding (GVC) is an emerging paradigm that explores how generative models and structured visual representations can redefine visual communication. By integrating generative capabilities into the coding process, GVC enables new forms of representation, transmission, and reconstruction that enhance perceptual and semantic fidelity while improving communication efficiency. Beyond human-centric reconstruction, GVC supports machine- and task-oriented communication, where compact and semantically meaningful representations benefit downstream analysis and decision-making.

The paradigm also motivates theoretical study on how generative priors interact with information constraints, optimization objectives, and emerging concepts in semantic communication. As generative processes gain prominence, principled evaluation becomes increasingly essential, encouraging advances in quality assessment, distortion modeling, and the development of benchmark datasets for generative and hybrid codec systems. Efficiency remains central to deployment, underscoring the importance of model design, complexity optimization, and computational scalability.

GVC further extends to immersive and spatial communication, including three-dimensional and scene-level content. In these settings, generative models can infer geometry, semantics, and contextual relationships, enabling new modes of multi-view and interactive media delivery. Overall, GVC offers a unified framework that integrates generative modeling, visual coding, and intelligent communication, laying the groundwork for next-generation visual communication systems.

Coding Tools and Features in the new AV2 video coding standard

Organizers:
– Debargha Mukherjee, Google, USA
– Liang (Leo) Zhao, Tencent Media Lab, USA

Video on demand and conversational video together already account for nearly three-quarters of all Internet traffic today, and that share continues to grow. This trend drives ongoing research and development in academia and industry on video compression, to find even more compact representations of video. In 2015, the Alliance for Open Media (AOMedia) industry consortium was founded to develop and promote open media technologies, to foster fair competition and universal accessibility in media representation and delivery. Developing open video codecs was a big part of its charter. AOMedia’s first project was to develop and standardize a video codec called AV1 in 2018, and offer it under a royalty-free licensing model – leading to rapid adoption by content providers, software and hardware manufacturers, and various open-source projects. AOMedia is currently developing AV2, its next-generation video codec, which is anticipated to be finalized by the end of 2025. AV2 is projected to achieve approximately a 30% BD-rate reduction in terms of PSNR compared to AV1 with only a modest increase in decoding complexity. This significant improvement in coding efficiency is due to the adoption and integration of numerous new coding tools. These innovations span several key areas including block partitioning, intra prediction and intra mode coding, inter prediction, transforms, quantization, entropy coding, and loop filtering. The purpose of this Special Session is to disseminate detailed information for the first time on the advanced coding tools in each of these areas.

Visual information processing for human-centered immersive experiences

Organizers:
– Sara Baldoni, University of Padova, Italy
– Hadi Amirpour, University of Klagenfurt, Austria

Immersive systems such as Virtual and Extended Reality are becoming widespread thanks to the wide diffusion of relatively low-cost headsets and the increased immersivity and sense of presence they provide with respect to their 2D counterparts. However, the novelty of the involved technologies as well as the variety of available media types, together with the high number of applications, entail endless challenges for the research community. One key feature of immersive systems is that they inherently place users at the center of the experience, allowing them to actively explore, manipulate, and interact with content. As a result, immersive systems introduce new perceptual, behavioral, and interaction aspects that require dedicated investigation. This special session focuses on the role of visual information processing in enabling human-centered immersive experiences. The special session welcomes contributions about visual attention mechanisms, perceptual modelling, emerging media formats such as stereoscopic and omnidirectional imagery, light fields, point clouds, meshes, and gaussian splats. In addition, the special session calls for papers concerning the role of visual information processing in multimodal immersive applications and its contribution for the realization of high-quality experiences. Overall, the special session will provide complementary insights into how visual information plays a critical role in enhancing effectiveness, comfort, usability, and perceptual quality in next-generation immersive applications.

Task-Oriented Source Coding

Organizers:
– Anissa MOKRAOUI, Université Sorbonne Paris Nord, France
– Pierre DUHAMEL, Paris Saclay, CentraleSupélec, France

Recent advancements in machine vision have revealed a fundamental gap between traditional source coding, designed for human perception, and the needs of machine-centric inference tasks. This gap becomes particularly problematic in resource-constrained environments, such as edge devices, IoT vision systems, and autonomous platforms, where the mismatch leads to inefficient tradeoffs between bitrate, latency, and task performance. To address these challenges, Task-Oriented Source Coding (TOSC) has emerged as a powerful paradigm that shifts the focus from pixel-level distortion to task-level objectives. By combining representation learning, semantic compression, and task-aware optimization, TOSC provides an efficient way to optimize visual data for specific various vison tasks including classification, detection, segmentation, tracking, and action recognition.

This special session aims to advance task-oriented compression for machine vision by bridging information theory, deep learning, and system design. It will present recent advances in TOSC, ranging from learned compression architectures tailored to practical constraints to approaches addressing rate–distortion–task tradeoffs and their theoretical foundations. The outcomes are expected to contribute to more efficient visual communication strategies and to influence the development of next-generation vision systems as well as future standards for visual data transmission in real-time and resource-constrained environments.

Positioned within the IEEE ICIP framework, this special session provides a focused and technically rigorous forum that complements existing workshops. It enables a deeper and more integrated exploration of TOSC, fostering targeted cross-disciplinary exchanges among researchers in machine vision, image processing, and information theory, and contributing to the establishment of a unified foundation for task-driven compression.

Physics-Informed and Structured Neural Networks and their Bayesian Versions for Inverse Problems in Imaging Science

Organizers:
– Ali Mohammad-Djafari, CNRS, France
– Ozan Öktem, KTH, Stockholm, Sweden
– Li Wang, Central South University, Changsha, China
– Ning Chu, Ningbo Institute of Digital Twin, China

Physics-Informed Neural Networks (PINNs) and their Bayesian extensions (B-PINNs) have emerged as powerful approaches for solving challenging forward and inverse problems in imaging where physical constraints, measurement scarcity, and uncertainty quantification play critical roles. By embedding the governing partial differential equations (PDEs) directly into the training objective, PINNs mitigate the effects of noise, ill-posedness, and incomplete data while ensuring physically meaningful reconstructions. Bayesian PINNs further enhance this framework by introducing probabilistic priors, hierarchical structures, and variational or Monte Carlo inference schemes, thereby providing calibrated uncertainty estimates for both images and physical parameters.

Recent work by the organizers has demonstrated the effectiveness of Bayesian inference, regularization theory, and physics-informed deep learning for inverse problems and dynamical system identification. In particular, Bayesian PINNs for linear inverse problems, PINNs with unknown PDEs learned from multivariate time series, and digital twins for industrial applications provide concrete instances of this emerging paradigm. In parallel, structured neural networks inspired by variational formulations and iterative reconstruction, such as learned primal–dual schemes and unrolled optimization networks, have shown remarkable performance in tomographic and computational imaging. Their Bayesian and physics-informed extensions open new perspectives for robust and interpretable inverse imaging.

This Special Session aims to bring together leading researchers at the interface of inverse problems, physics-informed machine learning, and imaging science. The contributions span theoretical analysis of PINN optimization and identifiability, Bayesian formulations for uncertainty-aware inverse imaging, hybrid architectures combining deep networks with PDE-based regularization and structured NNs, and real-world applications in infrared thermography, nondestructive testing, tomographic reconstruction, and digital twins for industrial imaging systems. The novelty and cross-disciplinary relevance of PINNs, B-PINNs, and structured neural networks make this topic particularly timely for IEEE ICIP 2026.

Joint Advances in Radar Imaging, Remote Sensing and Computer Vision

Organizers:
– Abhijit Mahalanobis, University of Arizona, USA
– Saurabh Prasad, University of Houston, USA
– Emanuele Dalsasso, Centre Inria de l’Université Grenoble Alpes, FRA
– Banafsheh Saber Latibari, University of Arizona, USA

This special session invites researchers to submit papers on the applications of machine learning techniques for the processing of radar images, such as Synthetic Aperture Radar (SAR) and LiDAR. Recently, computer vision methods have been perfected for the analysis of photo- realistic color images. However, the extension of such methods to non-conventional imaging modalities, such as radar imaging, has yet to be fully investigated. On the methodological side, it encompasses generative AI for SAR images and LiDAR point clouds, vision-language models adapted to radar imaging, methods for analyzing SAR images using label-efficient learning via self-supervision, incorporation of physical priors, uncertainty estimation and zero/few-shot transfer learning across sensors, frequencies, polarizations, and acquisition modalities. On the applied side, applications can include rapid change detection and damage assessment for disasters, maritime and urban object analytics, and environmental monitoring, with growing emphasis on edge inference for time-critical response. By bringing together perspectives from academia, industry, and space agencies, the session aims to accelerate methods that generalize, explain, and operate under operational constraints, translating advances into measurable impact. The topics of interest include: multimodal fusion and retrieval involving SAR and/or LiDAR point cloud; SAR-tuned foundation/VLMs and open-vocabulary recognition; self-supervised pretraining, domain adaptation, detection, segmentation, and tracking under speckle and layover; multi-temporal modeling and change reasoning; 3D/height inference from interferometric/PolSAR cues; LiDAR point cloud classification and 3D reconstruction; active learning and efficient annotation; compression, pruning, and distillation for on-board deployment; benchmarks, metrics, and ethics for responsible SAR-enabled AI. Submissions spanning methodology, algorithms, datasets, and applications are encouraged.

QoE and Sustainability in Video Communications

Organizers:
– Christian Herglotz, Brandenburgische-Technische Universität Cottbus-Senftenberg, Germany
– Alexandre Mercat, Tampere University, Finland
– Steven Le Moan, NTNU, Gjøvik, Norway

Image and video processing technologies have become relevant contributors to global energy and resource consumption. At the same time, the demand for higher Quality of Experience (QoE) – through increased resolution, frame rate, dynamic range and fidelity– continues to rise, leading to significant computational and environmental costs. Consequently, research targeting the sustainable use of image and video processing technologies while delivering a high QoE is of high importance for the future of our planet.

This special session focuses on effective methods that reduce power and resource usage in video processing algorithms while maintaining or enhancing user-perceived quality. Applications of interest include (1) video compression, (2) prompt-based video generation, (3) video streaming, and (4) video enhancement and restoration.

Topics of interest include (but are not limited to):
• Standardization for sustainable video compression;
• Joint optimization of visual quality and energy consumption of video processing algorithms;
• Life-cycle assessment of video processing systems;
• Power-optimized hardware implementations;
• Metrics for high-quality and low-energy video assessment
• Sustainable and high-quality video streaming

Knowledge-Driven Multi-Modal Medical Image Analysis for Clinical Decision Support

Organizers:
– Guanghui Yue, School of Biomedical Engineering, Shenzhen University, China
– Dr. Wei Zhou, School of Computer Science and Informatics, Cardiff University, UK
– Weide Liu, College of Computing and Data Science (CCDS), Nanyang Technological University, Singapore
– Cheng Zhao, School of Biomedical Engineering, Shenzhen University, China

Medical imaging plays a fundamental role in disease diagnosis, treatment planning, and clinical decision-making. With the rapid advancement of imaging technologies, diverse modalities (such as ultrasound, CT, MRI, PET, and multimodal video sequences) are routinely employed to capture complementary anatomical, functional, and physiological information. In parallel, data-driven artificial intelligence (AI) methods have achieved impressive performance in tasks such as segmentation, detection, classification, and disease screening. Yet, most current AI-based medical image analysis approaches remain predominantly data-driven, relying heavily on large-scale annotated datasets and end-to-end learning. In real-world clinical environments, these methods often face challenges including limited labeled data, distribution shifts across institutions and devices, imaging artifacts and noise, modality discrepancies, and heterogeneous patient populations. As a result, purely data-driven models frequently exhibit limited generalization, reduced robustness, and weak interpretability — hindering their deployment in safety-critical clinical workflows. By contrast, routine clinical practice is inherently knowledge-driven. Radiologists and clinicians rely on rich domain knowledge such as anatomical and physiological priors, imaging physics, disease progression patterns, temporal consistency across follow-up scans, and established diagnostic guidelines or scoring systems. These forms of knowledge are still far from being fully exploited in current learning pipelines. Knowledge-driven multi-modal medical image analysis aims to close this gap by tightly integrating medical domain knowledge with multi-modal imaging data and advanced AI models. By embedding structured knowledge (e.g., anatomical atlases, ontologies, causal relationships), prior constraints, uncertainty modeling, and clinical reasoning into learning frameworks knowledge-driven approaches promise enhanced robustness, better sample efficiency, improved interpretability, and more clinically trustworthy decision support. This direction is inherently interdisciplinary, spanning medical imaging, AI and machine learning, computer vision, knowledge representation and reasoning, biomedical engineering, and clinical medicine. This session aims to bring together researchers and practitioners from these communities to share recent advances, novel methodologies, and real-world applications in knowledge-driven multi-modal medical image analysis, and to foster its translation into practical clinical systems.

The main topics of interest include, but are not limited to:
• Knowledge representation and modeling for medical image analysis (e.g., anatomical priors, physiological constraints, clinical rules);
• Knowledge-driven learning frameworks for multi-modal medical images and videos;
• Integration of medical domain knowledge with deep learning and foundation models;
• Knowledge-guided multi-modal image fusion, segmentation, and detection methods;
• Cross-modal and cross-view consistency modeling based on medical knowledge;
• Knowledge-driven disease diagnosis, prognosis, and clinical decision support systems;
• Interpretable and explainable AI methods for multi-modal medical imaging;
• Applications of knowledge-driven medical image analysis in real-world clinical scenarios.