Cross-Modal Retrieval for Alzheimer's Disease Diagnosis Using CLIP-Enhanced Dual Deep Hashing
Presentation Time: 09:00 AM - 09:15 AM
Abstract Keywords: Clinical Decision Support, Information Retrieval, Machine Learning, Deep Learning
Working Group: Clinical Decision Support Working Group
Primary Track: Foundations
Programmatic Theme: Clinical Informatics
Cross-modal data retrieval is crucial for effectively utilizing the vast amount of multimodal data available in healthcare. However, existing methods often fail to capture the intricate semantic relationships between modalities, limiting their retrieval accuracy. In this study, we propose an Unsupervised Dual Deep Hashing (UDDH) method with a CLIP (Contrastive Language-Image Pre-training) mechanism to align semantic meanings across modalities for enhanced cross-modal retrieval. The UDDH framework employs a dual hashing scheme consisting of a semantic index (head code) and content codes (tail code) to capture both high-level semantics and modality-specific details. We also integrate CLIP loss directly between the modality-specific content codes to improve semantic coherence and retrieval precision. The proposed model is evaluated on the Wikipedia image-text dataset for an object retrieval task and the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset for a patient diagnosis task. By incorporating category contrastive loss during supervised fine-tuning, our UDDH-clip-ccl model achieves a map@50 score of 0.7984 and an accuracy of 0.7958. The retrieved patient exemplars are used in a Weighted K-nearest neighbors classifier to provide interpretable diagnostic insights based on similar cases. Our approach demonstrates the importance of semantic alignment in cross-modal retrieval and its potential for enhancing patient diagnosis, outcome prediction, and treatment planning by leveraging multimodal data.
Speaker(s):
Xiang Li, PhD
Massachusetts General Hospital and Harvard Medical School
Author(s):
Xiaoke Huang, Master Degree; Bin Zhang, PhD - Guangdong Institute of Intelligence Science and Technology, Zhuhai, Guangdong, China; Wenxiong Liao, PhD - South China University of Technology; Fang Zeng, PhD - Massachusetts General Hospital, Boston, MA, USA; Hui Ren, MD PhD MPH - Massachusetts General Hospital; Zhengliang Liu, PhD - University of Georgia, Athens, GA, USA; Haixing Dai, PhD - University of Georgia, Athens, GA, USA; Zihao Wu, PhD - University of Georgia, Athens, GA, USA; Tianming Liu, PhD - University of Georgia, Athens, GA, USA; Hongmin Cai, PhD - South China University of Technology, Guangzhou, Guangdong, China; Xiang Li, PhD - Massachusetts General Hospital and Harvard Medical School;
Presentation Time: 09:00 AM - 09:15 AM
Abstract Keywords: Clinical Decision Support, Information Retrieval, Machine Learning, Deep Learning
Working Group: Clinical Decision Support Working Group
Primary Track: Foundations
Programmatic Theme: Clinical Informatics
Cross-modal data retrieval is crucial for effectively utilizing the vast amount of multimodal data available in healthcare. However, existing methods often fail to capture the intricate semantic relationships between modalities, limiting their retrieval accuracy. In this study, we propose an Unsupervised Dual Deep Hashing (UDDH) method with a CLIP (Contrastive Language-Image Pre-training) mechanism to align semantic meanings across modalities for enhanced cross-modal retrieval. The UDDH framework employs a dual hashing scheme consisting of a semantic index (head code) and content codes (tail code) to capture both high-level semantics and modality-specific details. We also integrate CLIP loss directly between the modality-specific content codes to improve semantic coherence and retrieval precision. The proposed model is evaluated on the Wikipedia image-text dataset for an object retrieval task and the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset for a patient diagnosis task. By incorporating category contrastive loss during supervised fine-tuning, our UDDH-clip-ccl model achieves a map@50 score of 0.7984 and an accuracy of 0.7958. The retrieved patient exemplars are used in a Weighted K-nearest neighbors classifier to provide interpretable diagnostic insights based on similar cases. Our approach demonstrates the importance of semantic alignment in cross-modal retrieval and its potential for enhancing patient diagnosis, outcome prediction, and treatment planning by leveraging multimodal data.
Speaker(s):
Xiang Li, PhD
Massachusetts General Hospital and Harvard Medical School
Author(s):
Xiaoke Huang, Master Degree; Bin Zhang, PhD - Guangdong Institute of Intelligence Science and Technology, Zhuhai, Guangdong, China; Wenxiong Liao, PhD - South China University of Technology; Fang Zeng, PhD - Massachusetts General Hospital, Boston, MA, USA; Hui Ren, MD PhD MPH - Massachusetts General Hospital; Zhengliang Liu, PhD - University of Georgia, Athens, GA, USA; Haixing Dai, PhD - University of Georgia, Athens, GA, USA; Zihao Wu, PhD - University of Georgia, Athens, GA, USA; Tianming Liu, PhD - University of Georgia, Athens, GA, USA; Hongmin Cai, PhD - South China University of Technology, Guangzhou, Guangdong, China; Xiang Li, PhD - Massachusetts General Hospital and Harvard Medical School;
Cross-Modal Retrieval for Alzheimer's Disease Diagnosis Using CLIP-Enhanced Dual Deep Hashing
Category
Podium Abstract