LaVCa: LLM-assisted Visual Cortex Captioning

📅 2025-02-19

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Interpreting the semantic meaning of voxel-wise responses in the visual cortex remains challenging due to the opacity and semantic agnosticism of conventional fMRI decoding models. Method: We propose the first cross-modal framework integrating large language models (LLMs)—such as GPT or LLaMA—with image encoders and customized prompt engineering to generate natural-language descriptions of activated fMRI voxels. Contribution/Results: Our approach enables fine-grained, multi-concept semantic characterization of neural selectivity at both single-voxel and inter-voxel levels, overcoming the limitations of black-box encoding models. Experiments demonstrate significant improvements in descriptive accuracy and semantic richness over state-of-the-art methods. Moreover, we uncover— for the first time—functional fine-grained differentiation within visual cortical regions of interest (ROIs) and voxel-level co-representation of multiple semantic concepts. These findings advance the understanding of human perceptual mechanisms and establish a novel paradigm for interpretable, brain-inspired modeling.

Technology Category

Application Category

📝 Abstract

Understanding the property of neural populations (or voxels) in the human brain can advance our comprehension of human perceptual and cognitive processing capabilities and contribute to developing brain-inspired computer models. Recent encoding models using deep neural networks (DNNs) have successfully predicted voxel-wise activity. However, interpreting the properties that explain voxel responses remains challenging because of the black-box nature of DNNs. As a solution, we propose LLM-assisted Visual Cortex Captioning (LaVCa), a data-driven approach that uses large language models (LLMs) to generate natural-language captions for images to which voxels are selective. By applying LaVCa for image-evoked brain activity, we demonstrate that LaVCa generates captions that describe voxel selectivity more accurately than the previously proposed method. Furthermore, the captions generated by LaVCa quantitatively capture more detailed properties than the existing method at both the inter-voxel and intra-voxel levels. Furthermore, a more detailed analysis of the voxel-specific properties generated by LaVCa reveals fine-grained functional differentiation within regions of interest (ROIs) in the visual cortex and voxels that simultaneously represent multiple distinct concepts. These findings offer profound insights into human visual representations by assigning detailed captions throughout the visual cortex while highlighting the potential of LLM-based methods in understanding brain representations. Please check out our webpage at https://sites.google.com/view/lavca-llm/

Problem

Research questions and friction points this paper is trying to address.

Interpret voxel responses in the brain

Generate accurate image captions for voxels

Reveal fine-grained functional differentiation in visual cortex

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-assisted captioning

voxel selectivity analysis

fine-grained functional differentiation

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs