Understanding Large Language Models' Ability on Interdisciplinary Research

📅 2025-07-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing research lacks a dedicated benchmark for evaluating large language models’ (LLMs) ability to generate high-quality, interdisciplinary research (IDR) ideas. Method: We introduce IDRBench—the first benchmark specifically designed for IDR idea generation—comprising a multidimensional, expert-annotated dataset spanning six disciplines and a progressive three-stage task framework (identify–integrate–recommend) that mirrors real-world interdisciplinary inquiry. Our semantic integration evaluation framework is built upon ArXiv’s multilingual, cross-domain scholarly corpus and incorporates expert-defined dimensions: novelty, feasibility, and cross-domain depth. Contribution/Results: Experiments across ten state-of-the-art LLMs reveal that while current models exhibit rudimentary interdisciplinary awareness, they remain substantially limited in generating ideas that simultaneously satisfy originality and practical feasibility. IDRBench establishes a novel, scalable paradigm for assessing LLMs’ IDR-oriented ideation capabilities.

Technology Category

Application Category

📝 Abstract

Recent advancements in Large Language Models (LLMs) have revealed their impressive ability to perform multi-step, logic-driven reasoning across complex domains, positioning them as powerful tools and collaborators in scientific discovery while challenging the long-held view that inspiration-driven ideation is uniquely human. However, the lack of a dedicated benchmark that evaluates LLMs' ability to develop ideas in Interdisciplinary Research (IDR) settings poses a critical barrier to fully understanding their strengths and limitations. To address this gap, we introduce IDRBench -- a pioneering benchmark featuring an expert annotated dataset and a suite of tasks tailored to evaluate LLMs' capabilities in proposing valuable research ideas from different scientific domains for interdisciplinary research. This benchmark aims to provide a systematic framework for assessing LLM performance in complex, cross-domain scientific research. Our dataset consists of scientific publications sourced from the ArXiv platform covering six distinct disciplines, and is annotated by domain experts with diverse academic backgrounds. To ensure high-quality annotations, we emphasize clearly defined dimensions that characterize authentic interdisciplinary research. The design of evaluation tasks in IDRBench follows a progressive, real-world perspective, reflecting the natural stages of interdisciplinary research development, including 1) IDR Paper Identification, 2) IDR Idea Integration, and 3) IDR Idea Recommendation. Using IDRBench, we construct baselines across 10 LLMs and observe that despite fostering some level of IDR awareness, LLMs still struggle to produce quality IDR ideas. These findings could not only spark new research directions, but also help to develop next-generation LLMs that excel in interdisciplinary research.

Problem

Research questions and friction points this paper is trying to address.

Lack of benchmark for evaluating LLMs in interdisciplinary research

Need to assess LLMs' ability to generate cross-domain research ideas

Current LLMs struggle to produce quality interdisciplinary research ideas

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing IDRBench benchmark for interdisciplinary research evaluation

Expert annotated dataset from six ArXiv disciplines

Progressive tasks: identification, integration, recommendation

🔎 Similar Papers

No similar papers found.

Authors to Follow