🤖 AI Summary
To address the challenge of discovering and integrating numerous fragmented, heterogeneous biological databases that lack FAIR (Findable, Accessible, Interoperable, Reusable) compliance, this paper introduces FAIRBridge—the first framework leveraging generative AI to autonomously interpret natural-language query intent and dynamically construct cross-source semantic access paths. Methodologically, it integrates large language model–driven intent understanding, literature-based automated database relationship mapping, lightweight semantic scheduling, and quality-controlled crowdsourced collaborative optimization. Its key contribution lies in replacing labor-intensive manual curation with end-to-end automation—from natural-language queries to executable data-access plans. Experiments demonstrate significant improvements: +32.7% in data discovery accuracy and 58% reduction in query latency. FAIRBridge enables researchers to rapidly test hypotheses via natural-language queries and substantially enhances the usability and reusability of non-FAIR biological data.
📝 Abstract
Life sciences research increasingly requires identifying, accessing, and effectively processing data from an ever-evolving array of information sources on the Linked Open Data (LOD) network. This dynamic landscape places a significant burden on researchers, as the quality of query responses depends heavily on the selection and semantic integration of data sources --processes that are often labor-intensive, error-prone, and costly. While the adoption of FAIR (Findable, Accessible, Interoperable, and Reusable) data principles has aimed to address these challenges, barriers to efficient and accurate scientific data processing persist.
In this paper, we introduce FAIRBridge, an experimental natural language-based query processing system designed to empower scientists to discover, access, and query biological databases, even when they are not FAIR-compliant. FAIRBridge harnesses the capabilities of AI to interpret query intents, map them to relevant databases described in scientific literature, and generate executable queries via intelligent resource access plans. The system also includes robust tools for mitigating low-quality query processing, ensuring high fidelity and responsiveness in the information delivered.
FAIRBridge's autonomous query processing framework enables users to explore alternative data sources, make informed choices at every step, and leverage community-driven crowd curation when needed. By providing a user-friendly, automated hypothesis-testing platform in natural English, FAIRBridge significantly enhances the integration and processing of scientific data, offering researchers a powerful new tool for advancing their inquiries.