Publications

Published

You Only Train Once: Differentiable Subset Selection for Omics Data

Daphné Chopard, Jorge da Silva Gonçalves, Irene Cannistraci, Thomas M. Sutter, Julia E Vogt

TMLR · 2026

Abstract

Selecting compact and informative gene subsets from single-cell transcriptomic data is essential for biomarker discovery, improving interpretability, and cost-effective profiling. However, most existing feature selection approaches either operate as multi-stage pipelines or rely on post hoc feature attribution, making selection and prediction weakly coupled. In this work, we present YOTO (you only train once), an end-to-end framework that jointly identifies discrete gene subsets and performs prediction within a single differentiable architecture. In our model, the prediction task directly guides which genes are selected, while the learned subsets, in turn, shape the predictive representation. This closed feedback loop enables the model to iteratively refine both what it selects and how it predicts during training. Unlike existing approaches, YOTO enforces sparsity so that only the selected genes contribute to inference, eliminating the need to train additional downstream classifiers. Through a multi-task learning design, the model learns shared representations across related objectives, allowing partially labeled datasets to inform one another, and discovering gene subsets that generalize across tasks without additional training steps. We evaluate YOTO on two representative single-cell RNA-seq datasets, showing that it consistently outperforms state-of-the-art baselines. These results demonstrate that sparse, end-to-end, multi-task gene subset selection improves predictive performance and yields compact and meaningful gene subsets, advancing biomarker discovery and single-cell analysis.

Published

Structure is Supervision: Multiview Masked Autoencoders for Radiology

Sonia Laguna, Andrea Agostini, Alain Ryser, Samuel Ruiperez-Campillo, Irene Cannistraci, Moritz Vandenhirtz, Stephan Mandt, Nicolas Deperrois, Farhad Nooralahzadeh, Michael Krauthammer, Thomas M. Sutter, Julia E. Vogt

TMLR · 2026

Abstract

Building robust medical machine learning systems requires pretraining strategies that exploit the intrinsic structure present in clinical data. We introduce Multiview Masked Autoencoder (MVMAE), a self-supervised framework that leverages the natural multi-view organization of radiology studies to learn view-invariant and disease-relevant representations. MVMAE combines masked image reconstruction with cross-view alignment, transforming clinical redundancy across projections into a powerful self-supervisory signal. We further extend this approach with MVMAE-V2T, which incorporates radiology reports as an auxiliary text-based learning signal to enhance semantic grounding while preserving fully vision-based inference. Evaluated on a downstream disease classification task on three large-scale public datasets, MIMIC-CXR, CheXpert, and PadChest, MVMAE consistently outperforms supervised and vision-language baselines. Furthermore, MVMAE-V2T provides additional gains, particularly in low-label regimes where structured textual supervision is most beneficial. Together, these results establish the importance of structural and textual supervision as complementary paths toward scalable, clinically grounded medical foundation models.

Workshop

Beyond Independent Frames: Latent Attention Masked Autoencoders for Multi-View Echocardiography

Simon Böhi, Irene Cannistraci, Sergio Muñoz Gonzalez, Moritz Vandenhirtz, Sonia Laguna, Samuel Ruiperez-Campillo, Max Krähenmann, Andrea Agostini, Ece Ozkan, Thomas M. Sutter, Julia E Vogt

ICLR 2026 Workshop on Foundation Models for Science: Real-World Impact and Science-First Design · 2026

Abstract

Echocardiography is a widely used modality for cardiac assessment due to its non-invasive and cost-effective nature, but the sparse and heterogeneous spatiotemporal views of the heart pose distinct challenges. Existing masked autoencoder (MAE) approaches typically process images or short clips independently, failing to capture the inherent multi-view structure required for coherent cardiac representation. We introduce Latent Attention Masked Autoencoder (LAMAE), a foundation model architecture tailored to the multi-view nature of medical imaging. LAMAE augments the standard MAE with a latent attention module that enables information exchange across frames and views directly in latent space. This allows the model to aggregate variable-length sequences and distinct views, reconstructing a holistic representation of cardiac function from partial observations. We pretrain LAMAE on MIMIC-IV-ECHO, a large-scale, uncurated dataset reflecting real-world clinical variability. To the best of our knowledge, we present the first results for predicting ICD-10 codes from MIMIC-IV-ECHO videos. Furthermore, we empirically demonstrate that representations learned from adult data transfer effectively to pediatric cohorts despite substantial anatomical differences. These results provide evidence that incorporating structural priors, such as multi-view attention, yields significantly more robust and transferable representations.

Workshop

From Leads to Latents: Attention-Driven Masked Autoencoder for ECG Times Series

Samuel Ruiperez-Campillo, Moritz Vandenhirtz, Simon Böhi, Sonia Laguna, Irene Cannistraci, Andrea Agostini, Ece Ozkan, Thomas M. Sutter, Julia E. Vogt

ICLR 2026 Workshop on Geometry-grounded Representation Learning and Generative Modeling · 2026

Under Review

Rethinking Machine Unlearning: Models Designed to Forget via Key Deletion

Sonia Laguna, Jorge da Silva Gonçalves, Moritz Vandenhirtz, Alain Ryser, Julia E Vogt*, Irene Cannistraci*

Under Review (*Equal senior authorship) · 2026

Under Review

TOAST: Transformer Optimization using Adaptive and Simple Transformations

Irene Cannistraci, Simone Antonelli, Emanuele Palumbo, Thomas Sutter, Emanuele Rodolà, Bastian Rieck, Julia Vogt

Under Review · 2025

Abstract

Foundation models achieve State-of-the-Art (SOTA) performance across different tasks, but their size and computational demands raise concerns about accessibility and sustainability. Existing efficiency methods often require additional retraining or fine-tuning, limiting their practicality. Recent findings suggest that deep neural networks exhibit internal representation similarities. While such similarities across different models have been exploited for enabling techniques such as model stitching and merging, intra-network redundancy remains underexplored as a source for efficiency gains. In this paper, we introduce TOAST (Transformer Optimization using Adaptive and Simple Transformations), a framework that exploits these redundancies to approximate entire transformer blocks with lightweight closed-form mappings, such as linear transformation or even the identity, without any additional training. Across SOTA pretrained vision models (e.g., ViT, DINOv2, DeiT) and datasets ranging from MNIST to ImageNet-1k, TOAST reduces parameters and computation while preserving, and in some cases improving, downstream performance. These results show that large portions of transformer depth can be replaced by trivial functions, opening a new perspective on efficient foundation models.

Published Spotlight · Top 5%

From Bricks to Bridges: Product of Invariances to Enhance Latent Space Communication

Irene Cannistraci, Luca Moschella, Marco Fumero, Valentino Maiorca, Emanuele Rodolà

ICLR 2024 · 2024

Abstract

It has been observed that representations learned by distinct neural networks conceal structural similarities when the models are trained under similar inductive biases. From a geometric perspective, identifying the classes of transformations and the related invariances that connect these representations is fundamental to unlocking applications, such as merging, stitching, and reusing different neural modules. However, estimating task-specific transformations a priori can be challenging and expensive due to several factors. To this end, we introduce a versatile method to directly incorporate a set of invariances into the representations, constructing a product space of invariant components on top of the latent representations without requiring prior knowledge about the optimal invariance to infuse.

Published

MV-MS-FETE: Multi-view multi-scale feature extractor and transformer encoder for stenosis recognition in echocardiograms

Danilo Avola, Irene Cannistraci, Marco Cascio, Luigi Cinque, Alessio Fagioli, Gian Luca Foresti, Emanuele Rodolà, Luca Solito

Computer Methods and Programs in Biomedicine · 2024

Abstract

Background: aortic stenosis is a common heart valve disease that mainly affects older people in developed countries. Its early detection is crucial to prevent the irreversible disease progression and, eventually, death. A typical screening technique to detect stenosis uses echocardiograms; however, variations introduced by other tissues, camera movements, and uneven lighting can hamper the visual inspection, leading to misdiagnosis. To address these issues, effective solutions involve employing deep learning algorithms to assist clinicians in detecting and classifying stenosis by developing models that can predict this pathology from single heart views. Although promising, the visual information conveyed by a single image may not be sufficient for an accurate diagnosis, especially when using an automatic system; thus, this indicates that different solutions should be explored. Methodology: following this rationale, this paper proposes a novel deep learning architecture, composed of a multi-view, multi-scale feature extractor, and a transformer encoder (MV-MS-FETE) to predict stenosis from parasternal long and short-axis views. In particular, starting from the latter, the designed model extracts relevant features at multiple scales along its feature extractor component and takes advantage of a transformer encoder to perform the final classification. Results: experiments were performed on the recently released Tufts medical echocardiogram public dataset, which comprises 27,788 images split into training, validation, and test sets. Due to the recent release of this collection, tests were also conducted on several state-of-the-art models to create multi-view and single-view benchmarks. For all models, standard classification metrics were computed (e.g., precision, F1-score). The obtained results show that the proposed approach outperforms other multi-view methods in terms of accuracy and F1-score and has more stable performance throughout the training procedure. Furthermore, the experiments also highlight that multi-view methods generally perform better than their single-view counterparts. Conclusion: this paper introduces a novel multi-view and multi-scale model for aortic stenosis recognition, as well as three benchmarks to evaluate it, effectively providing multi-view and single-view comparisons that fully highlight the model's effectiveness in aiding clinicians in performing diagnoses while also producing several baselines for the aortic stenosis recognition task.

Published

Lob-based deep learning models for stock price trend prediction: a benchmark study

Marco Prata, Giuliano Masi, Lorenzo Berti, Valerio Arrigoni, Alessandro Coletta, Irene Cannistraci, Stanislav Vyetrenko, Paola Velardi, Novella Bartolini

Artificial Intelligence Review · 2024

Abstract

The recent advancements in Deep Learning (DL) research have notably influenced the finance sector. We examine the robustness and generalizability of fifteen state-of-the-art DL models focusing on Stock Price Trend Prediction (SPTP) based on Limit Order Book (LOB) data. To carry out this study, we developed LOBCAST, an open-source framework that incorporates data preprocessing, DL model training, evaluation, and profit analysis. Our extensive experiments reveal that all models exhibit a significant performance drop when exposed to new data, thereby raising questions about their real-world market applicability. Our work serves as a benchmark, illuminating the potential and the limitations of current approaches and providing insight for innovative solutions.

Published

Bootstrapping Parallel Anchors for Relative Representations

Irene Cannistraci, Luca Moschella, Valentino Maiorca, Marco Fumero, Antonio Norelli, Emanuele Rodolà

ICLR 2023, Tiny Papers · 2023

Abstract

The use of relative representations for latent embeddings has shown potential in enabling latent space communication and zero-shot model stitching across a wide range of applications. Nevertheless, relative representations rely on a certain amount of parallel anchors to be given as input, which can be impractical to obtain in certain scenarios. To overcome this limitation, we propose an optimization-based method to discover new parallel anchors from a limited number of seeds.

Published

From Charts to Atlas: Merging Latent Spaces into One

Donato Crisostomi, Irene Cannistraci, Luca Moschella, Pietro Barbiero, Marco Ciccone, Pietro Liò, Emanuele Rodolà

NeurIPS 2023, NeurReps Workshop · 2023

Abstract

Models trained on semantically related datasets and tasks exhibit comparable inter-sample relations within their latent spaces. We investigate in this study the aggregation of such latent spaces to create a unified space encompassing the combined information. To this end, we introduce Relative Latent Space Aggregation, a two-step approach that first renders the spaces comparable using relative representations, and then aggregates them via a simple mean.

Published

Real-Time GAN-Based Model for Underwater Image Enhancement

Danilo Avola, Irene Cannistraci, Marco Cascio, Luigi Cinque, Anxhelo Diko, Damiano Distante, Gian Luca Foresti, Alfredo Mecca, Ivan Scagnetto

ICIAP 2023 · 2023

Abstract

Enhancing image quality is crucial for achieving an accurate and reliable image analysis in vision-based automated tasks. Underwater imaging encounters several challenges that can negatively impact image quality, including limited visibility, color distortion, contrast sensitivity issues, and blurriness. Among these, depending on how the water filters out the different light colors at different depths, the color distortion results in a loss of color information and a blue or green tint to the overall image, making it difficult to identify different underwater organisms or structures accurately. Improved underwater image quality can be crucial in marine biology, oceanography, and oceanic exploration. Therefore, this paper proposes a novel Generative Adversarial Network (GAN) architecture for underwater image enhancement, restoring good perceptual quality to obtain a more precise and detailed image. The effectiveness of the proposed method is evaluated on the EUVP dataset, which comprises underwater image samples of various visibility conditions, achieving remarkable results. Moreover, the trained network is run on the RPi4B as an embedded system to measure the time required to enhance the images with limited computational resources, simulating a practical underwater investigation setting. The outcome demonstrates the presented method applicability in real-world underwater exploration scenarios.

Workshop

Infusing invariances in neural representations

Irene Cannistraci, Marco Fumero, Luca Moschella, Valentino Maiorca, Emanuele Rodolà

ICML 2023, TAG-ML Workshop · 2023

Abstract

It has been observed that inner representations learned by different neural networks conceal structural similarities when the networks are trained under similar inductive biases. Exploring the geometric structure of latent spaces within these networks offers insights into the underlying similarity among different neural models and facilitates reasoning about the transformations that connect them. Identifying and estimating these transformations presents a challenging task, but it holds significant potential for various downstream tasks, including merging and stitching different neural architectures for model reuse. In this study, drawing on the geometrical structure of latent spaces, we show how it is possible to define representations that incorporate invariances to the targeted transformations in a single framework. We experimentally analyze how inducing different invariances in the representations affects downstream performances on classification and reconstruction tasks, suggesting that the classes of transformations that relate independent latent spaces depend on the task at hand. We analyze models in a variety of settings including different initializations, architectural changes, and trained on multiple modalities (e.g., text, images), testing our framework on 8 different benchmarks.

Published

A Novel GAN-Based Anomaly Detection and Localization Method for Aerial Video Surveillance at Low Altitude

Danilo Avola, Irene Cannistraci, Marco Cascio, Luigi Cinque, Anxhelo Diko, Alessio Fagioli, Gian Luca Foresti, Romeo Lanzino, Marco Mancini, Alfredo Mecca, Daniele Pannone

Remote Sensing · 2022

Abstract

The last two decades have seen an incessant growth in the use of Unmanned Aerial Vehicles (UAVs) equipped with HD cameras for developing aerial vision-based systems to support civilian and military tasks, including land monitoring, change detection, and object classification. To perform most of these tasks, the artificial intelligence algorithms usually need to know, a priori, what to look for, identify. or recognize. Actually, in most operational scenarios, such as war zones or post-disaster situations, areas and objects of interest are not decidable a priori since their shape and visual features may have been altered by events or even intentionally disguised (e.g., improvised explosive devices (IEDs)). For these reasons, in recent years, more and more research groups are investigating the design of original anomaly detection methods, which, in short, are focused on detecting samples that differ from the others in terms of visual appearance and occurrences with respect to a given environment. In this paper, we present a novel two-branch Generative Adversarial Network (GAN)-based method for low-altitude RGB aerial video surveillance to detect and localize anomalies. We have chosen to focus on the low-altitude sequences as we are interested in complex operational scenarios where even a small object or device can represent a reason for danger or attention. The proposed model was tested on the UAV Mosaicking and Change Detection (UMCD) dataset, a one-of-a-kind collection of challenging videos whose sequences were acquired between 6 and 15 m above sea level on three types of ground (i.e., urban, dirt, and countryside). Results demonstrated the effectiveness of the model in terms of Area Under the Receiving Operating Curve (AUROC) and Structural Similarity Index (SSIM), achieving an average of 97.2% and 95.7%, respectively, thus suggesting that the system can be deployed in real-world applications.

Preprint

AI-based Data Preparation and Data Analytics in Healthcare: The Case of Diabetes

Mattia Maranghi, Aris Anagnostopoulos, Irene Cannistraci, Ioannis Chatzigiannakis, Francesco Croce, Giuseppe Di Teodoro, Matteo Gentile, Giorgia Grani, Maurizio Lenzerini, Stefano Leonardi, et al.

arXiv preprint · 2022

Abstract

The Associazione Medici Diabetologi (AMD) collects and manages one of the largest worldwide-available collections of diabetic patient records, also known as the AMD database. This paper presents the initial results of an ongoing project whose focus is the application of Artificial Intelligence and Machine Learning techniques for conceptualizing, cleaning, and analyzing such an important and valuable dataset, with the goal of providing predictive insights to better support diabetologists in their diagnostic and therapeutic choices.