publications | Reza Esfandiarpoor

2025

Beyond Contrastive Learning: Synthetic Data Enables List-wise Training with Multiple Levels of Relevance

Reza Esfandiarpoor^*, George Zerveas^*, Ruochen Zhang, Macton Mgonzo, Carsten Eickhoff, and Stephen H. Bach

arXiv preprint, 2025

Abs PDF

Recent advancements in large language models (LLMs) have allowed the augmentation of information retrieval (IR) pipelines with synthetic data in various ways. Yet, the main training paradigm remains: contrastive learning with binary relevance labels and the InfoNCE loss, where one positive document is compared against one or more negatives. This objective treats all documents that are not explicitly annotated as relevant on an equally negative footing, regardless of their actual degree of relevance, thus (a) missing subtle nuances that are useful for ranking and (b) being susceptible to annotation noise. To overcome this limitation, in this work we forgo real training documents and annotations altogether and use open-source LLMs to directly generate synthetic documents that answer real user queries according to several different levels of relevance. This fully synthetic ranking context of graduated relevance, together with an appropriate list-wise loss (Wasserstein distance), enables us to train dense retrievers in a way that better captures the ranking task. Experiments on various IR datasets show that our proposed approach outperforms conventional training with InfoNCE by a large margin. Without using any real documents for training, our dense retriever significantly outperforms the same retriever trained through self-supervision. More importantly, it matches the performance of the same retriever trained on real, labeled training documents of the same dataset, while being more robust to distribution shift and clearly outperforming it when evaluated zero-shot on the BEIR dataset collection.
An Adaptive Method for Weak Supervision with Drifting Data

Alessio Mazzetto, Reza Esfandiarpoor, Akash Singirikonda, Eli Upfal, and Stephen H. Bach

AISTATS, 2025

Abs PDF

We introduce an adaptive method with formal quality guarantees for weak supervision in a non-stationary setting. Our goal is to infer the unknown labels of a sequence of data by using weak supervision sources that provide independent noisy signals of the correct classification for each data point. This setting includes crowdsourcing and programmatic weak supervision. We focus on the non-stationary case, where the accuracy of the weak supervision sources can drift over time, e.g., because of changes in the underlying data distribution. Due to the drift, older data could provide misleading information to infer the label of the current data point. Previous work relied on a priori assumptions on the magnitude of the drift to decide how much data to use from the past. In contrast, our algorithm does not require any assumptions on the drift, and it adapts based on the input by dynamically varying its window size. In particular, at each step, our algorithm estimates the current accuracies of the weak supervision sources by identifying a window of past observations that guarantees a near-optimal minimization of the trade-off between the error due to the variance of the estimation and the error due to the drift. Experiments on synthetic and real-world labelers show that our approach adapts to the drift.

2024

If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions

Reza Esfandiarpoor, Cristina Menghini, and Stephen H. Bach

EMNLP, 2024

Abs PDF

Recent works often assume that Vision-Language Model (VLM) representations are based on visual attributes like shape. However, it is unclear to what extent VLMs prioritize this information to represent concepts. We propose Extract and Explore (EX2), a novel approach to characterize textual features that are important for VLMs. EX2 uses reinforcement learning to align a large language model with VLM preferences and generates descriptions that incorporate features that are important for the VLM. Then, we inspect the descriptions to identify features that contribute to VLM representations. Using EX2, we find that spurious descriptions have a major role in VLM representations despite providing no helpful information, e.g., Click to enlarge photo of CONCEPT. More importantly, among informative descriptions, VLMs rely significantly on non-visual attributes like habitat (e.g., North America) to represent visual concepts. Also, our analysis reveals that different VLMs prioritize different attributes in their representations. Overall, we show that VLMs do not simply match images to scene descriptions and that non-visual or even spurious descriptions significantly influence their representations.
Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image Classification

Reza Esfandiarpoor and Stephen H. Bach

ICLR, 2024

Abs PDF

A promising approach for improving the performance of vision-language models like CLIP for image classification is to extend the class descriptions (i.e., prompts) with related attributes, e.g., using brown sparrow instead of sparrow. However, current zero-shot methods select a subset of attributes regardless of commonalities between the target classes, potentially providing no useful information that would have helped to distinguish between them. For instance, they may use color instead of bill shape to distinguish between sparrows and wrens, which are both brown. We propose Follow-up Differential Descriptions (FuDD), a zero-shot approach that tailors the class descriptions to each dataset and leads to additional attributes that better differentiate the target classes. FuDD first identifies the ambiguous classes for each image, and then uses a Large Language Model (LLM) to generate new class descriptions that differentiate between them. The new class descriptions resolve the initial ambiguity and help predict the correct label. In our experiments, FuDD consistently outperforms generic description ensembles and naive LLM-generated descriptions on 12 datasets. We show that differential descriptions are an effective tool to resolve class ambiguities, which otherwise significantly degrade the performance. We also show that high quality natural language class descriptions produced by FuDD result in comparable performance to few-shot adaptation methods.

2021

Extended Few-Shot Learning: Exploiting Existing Resources for Novel Tasks

Reza Esfandiarpoor, Amy Pu, Mohsen Hajabdollahi, and Stephen H. Bach

arXiv, 2021

Abs PDF

In many practical few-shot learning problems, even though labeled examples are scarce, there are abundant auxiliary datasets that potentially contain useful information. We propose the problem of extended few-shot learning to study these scenarios. We then introduce a framework to address the challenges of efficiently selecting and effectively using auxiliary data in few-shot image classification. Given a large auxiliary dataset and a notion of semantic similarity among classes, we automatically select pseudo shots, which are labeled examples from other classes related to the target task. We show that naive approaches, such as (1) modeling these additional examples the same as the target task examples or (2) using them to learn features via transfer learning, only increase accuracy by a modest amount. Instead, we propose a masking module that adjusts the features of auxiliary data to be more similar to those of the target classes. We show that this masking module performs better than naively modeling the support examples and transfer learning by 4.68 and 6.03 percentage points, respectively.

2020

Simplification of neural networks for skin lesion image segmentation using color channel pruning

Mohsen Hajabdollahi, Reza Esfandiarpoor, Pejman Khadivi, Sayed Mohammad Reza Soroushmehr, Nader Karimi, and Shadrokh Samavi

Computerized Medical Imaging and Graphics, 2020

Abs PDF

Automatic analysis of skin abnormality is an effective way for medical experts to facilitate diagnosis procedures and improve their capabilities. Efficient and accurate methods for analysis of the skin abnormalities such as convolutional neural networks (CNNs) are typically complex. Hence, the implementation of such complex structures in portable medical instruments is not feasible due to power and resource limitations. CNNs can extract features from the skin abnormality images automatically. To reduce the burden of the network for feature extraction, which can lead to the network simplicity, proper input color channels could be selected. In this paper, a pruning framework is proposed to simplify these complex structures through the selection of most informative color channels and simplification of the network. Moreover, hardware requirements of different network structures are identified to analyze the complexity of different networks. Experimental results are conducted for segmentation of images from two publicly available datasets of both dermoscopy and non-dermoscopy images. Simulation results show that using the proposed color channel selection method, simple and efficient neural network structures can be applied for segmentation of skin abnormalities.
Multiple abnormality detection for automatic medical image diagnosis using bifurcated convolutional neural network

Mohsen Hajabdollahi, Reza Esfandiarpoor, Elyas Sabeti, Nader Karimi, SM Reza Soroushmehr, and Shadrokh Samavi

Biomedical Signal Processing and Control, 2020

Abs PDF

Automating classification and segmentation process of abnormal regions in different body organs has a crucial role in most of medical imaging applications such as funduscopy, endoscopy, and dermoscopy. Detecting multiple abnormalities in each type of images is necessary for better and more accurate diagnosis procedure and medical decisions. In recent years portable medical imaging devices such as capsule endoscopy and digital dermatoscope have been introduced and made the diagnosis procedure easier and more efficient. However, these portable devices have constrained power resources and limited computational capability. To address this problem, we propose a bifurcated structure for convolutional neural networks performing both classification and segmentation of multiple abnormalities simultaneously. The proposed network is first trained by each abnormality separately. Then the network is trained using all abnormalities. In order to reduce the computational complexity, the network is redesigned to share some features which are common among all abnormalities. Later, these shared features are used in different settings (directions) to segment and classify the abnormal region of the image. Finally, results of the classification and segmentation directions are fused to obtain the classified segmentation map. Proposed framework is simulated using four frequent gastrointestinal abnormalities as well as three dermoscopic lesions and for evaluation of the proposed framework the results are compared with the corresponding ground truth map. Properties of the bifurcated network like low complexity and resource sharing make it suitable to be implemented as a part of portable medical imaging devices.

2019

Segmentation of bleeding regions in wireless capsule endoscopy for detection of informative frames

Mohsen Hajabdollahi, Reza Esfandiarpoor, Pejman Khadivi, SM Reza Soroushmehr, Nader Karimi, Kayvan Najarian, and Shadrokh Samavi

Biomedical Signal Processing and Control, 2019

Abs PDF

Wireless capsule endoscopy (WCE) is an effective means for diagnosis of gastrointestinal disorders. Detection of informative scenes in WCE video could reduce the length of transmitted videos and help the diagnosis procedure. In this paper, we investigate the problem of simplification of neural networks for automatic bleeding region segmentation inside capsule endoscopy device. Suitable color channels are selected as neural networks inputs, and image classification is conducted using a multi-layer perceptron (MLP) and a convolutional neural network (CNN) separately. Both CNN and MLP structures are simplified to reduce the number of computational operations. Performances of two simplified networks are evaluated on a WCE bleeding image dataset using the DICE score. Simulation results show that applying simplification methods on both MLP and CNN structures reduces the number of computational operations significantly with AUC-ROC greater than 0.97. Although CNN performs better in comparison with the simplified MLP, the simplified MLP segments bleeding regions with a significantly smaller number of computational operations. Concerning the importance of having a simple structure or a more accurate model, each of the designed structures could be selected for inside capsule implementation.
Low complexity cnn structure for automatic bleeding zone detection in wireless capsule endoscopy imaging

Mohsen Hajabdollahi, Reza Esfandiarpoor, Kayvan Najarian, Nader Karimi, Shadrokh Samavi, and SM Reza Soroushmehr

2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2019

Abs PDF

Wireless capsule endoscopy (WCE) is a swallowable device used for screening different parts of the human digestive system. Automatic WCE image analysis methods reduce the duration of the screening procedure and alleviate the burden of manual screening by medical experts. Recent studies widely employ convolutional neural networks (CNNs) for automatic analysis of WCE images; however, these studies do not consider CNN’s structural and computational complexities. In this paper, we address the problem of simplifying the CNN’s structure. A low complexity CNN structure for bleeding zone detection is proposed which takes a single patch as input and then outputs a segmented patch of the same size. The proposed network is inspired by the FCN paradigm with a simplified structure. Since it is based on image patches, the resulting network benefits from moderate-sized intermediate feature maps. Moreover, the problem of redundant computations in patch-based methods is circumvented by non-overlapping patch processing. The proposed method is evaluated using the publicly available KID dataset for WCE image analysis. Experimental results show that the proposed network has better accuracy and AUC than previous structures while requiring less computational operations.
Hierarchical pruning for simplification of convolutional neural networks in diabetic retinopathy classification

Mohsen Hajabdollahi, Reza Esfandiarpoor, Kayvan Najarian, Nader Karimi, Shadrokh Samavi, and SM Reza Soroushmehr

2019 41st annual international conference of the IEEE engineering in medicine and biology society (EMBC), 2019

Abs PDF

Convolutional neural networks (CNNs) are widely used in automatic detection and analysis of diabetic retinopathy (DR). Although CNNs have proper detection performance, their structural and computational complexity is troublesome. In this study, the problem of reducing CNN’s structural complexity for DR analysis is addressed by proposing a hierarchical pruning method. The original VGG16-Net is modified to have fewer parameters and is employed for DR classification. To have an appropriate feature extraction, pre-trained model parameters on Image-Net dataset are used. Hierarchical pruning gradually eliminates the connections, filter channels, and filters to simplify the network structure. The proposed pruning method is evaluated using the Messidor image dataset which is a public dataset for DR classification. Simulation results show that by applying the proposed simplification method, 35% of the feature maps are pruned resulting in only 1.89% accuracy drop. This simplification could make CNN suitable for implementation inside medical diagnostic devices.

2018

Low complexity convolutional neural network for vessel segmentation in portable retinal diagnostic devices

Mohsen Hajabdollahi, Reza Esfandiarpoor, Kayvan Najarian, Nader Karimi, Shadrokh Samavi, and SM Reza-Soroushmeh

2018 25th IEEE International Conference on Image Processing (ICIP), 2018

Abs PDF

Retinal vessel information is helpful in retinal disease screening and diagnosis. Retinal vessel segmentation provides useful information about vessels and can be used by physicians during intraocular surgery and retinal diagnostic operations. Convolutional neural networks (CNNs) are powerful tools for classification and segmentation of medical images. Complexity of CNNs makes it difficult to implement them in portable devices such as binocular indirect ophthalmoscopes. In this paper a simplification approach is proposed for CNNs based on combination of quantization and pruning. Fully connected layers are quantized and convolutional layers are pruned to have a simple and efficient network structure. Experiments on images of the STARE dataset show that our simplified network is able to segment retinal vessels with acceptable accuracy and low complexity.
Simplified neural network based on auxiliary layer and adaptive pruning rate

Reza Esfandiarpoor, Mohsen Hajabdollahi, and Nader Karimi

Electrical Engineering (ICEE), Iranian Conference on, 2018

Abs PDF

Using of neural networks has increased in many applications by its increasing power to model, analysis and solve the problems in many different areas. The performance of neural networks is growing and so is the complexity of their structure. Although in recent years pruning and other similar works have been introduced as an appropriate solution to decrease their complexity, they are still very complex. In this paper, a method is proposed to reduce the complexity of neural networks by inserting an auxiliary layer and employing adaptive pruning rate. By insertion of an auxiliary layer after pruning, the network can be fitted on the given problem with less computational nodes. Simulation results on MNIST dataset represent a reduction of %22 in the number of network weights in comparison with the base pruning method. Also, an energy analysis performed on this dataset shows that it can be implemented with low energy consumption.