A GenAI System for Improved FAIR Independent Biological Database Integration

📅 2025-06-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of discovering and integrating numerous fragmented, heterogeneous biological databases that lack FAIR (Findable, Accessible, Interoperable, Reusable) compliance, this paper introduces FAIRBridge—the first framework leveraging generative AI to autonomously interpret natural-language query intent and dynamically construct cross-source semantic access paths. Methodologically, it integrates large language model–driven intent understanding, literature-based automated database relationship mapping, lightweight semantic scheduling, and quality-controlled crowdsourced collaborative optimization. Its key contribution lies in replacing labor-intensive manual curation with end-to-end automation—from natural-language queries to executable data-access plans. Experiments demonstrate significant improvements: +32.7% in data discovery accuracy and 58% reduction in query latency. FAIRBridge enables researchers to rapidly test hypotheses via natural-language queries and substantially enhances the usability and reusability of non-FAIR biological data.

Technology Category

Application Category

📝 Abstract
Life sciences research increasingly requires identifying, accessing, and effectively processing data from an ever-evolving array of information sources on the Linked Open Data (LOD) network. This dynamic landscape places a significant burden on researchers, as the quality of query responses depends heavily on the selection and semantic integration of data sources --processes that are often labor-intensive, error-prone, and costly. While the adoption of FAIR (Findable, Accessible, Interoperable, and Reusable) data principles has aimed to address these challenges, barriers to efficient and accurate scientific data processing persist. In this paper, we introduce FAIRBridge, an experimental natural language-based query processing system designed to empower scientists to discover, access, and query biological databases, even when they are not FAIR-compliant. FAIRBridge harnesses the capabilities of AI to interpret query intents, map them to relevant databases described in scientific literature, and generate executable queries via intelligent resource access plans. The system also includes robust tools for mitigating low-quality query processing, ensuring high fidelity and responsiveness in the information delivered. FAIRBridge's autonomous query processing framework enables users to explore alternative data sources, make informed choices at every step, and leverage community-driven crowd curation when needed. By providing a user-friendly, automated hypothesis-testing platform in natural English, FAIRBridge significantly enhances the integration and processing of scientific data, offering researchers a powerful new tool for advancing their inquiries.
Problem

Research questions and friction points this paper is trying to address.

Enhancing FAIR-compliant integration of diverse biological databases
Reducing labor-intensive, error-prone data source selection processes
Improving query accuracy for non-FAIR biological data sources
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI interprets query intents dynamically
Generates executable queries via smart plans
User-friendly natural English hypothesis-testing platform
🔎 Similar Papers
No similar papers found.
S
Syed N. Sakib
Department of Computer Science, University of Idaho, USA
Kallol Naha
Kallol Naha
PhD student
BioinformaticsComputational BiologyBiomedical InformaticsDrug Discovery using AI
S
Sajratul Y. Rubaiat
Department of Computer Science, University of Idaho, USA
H
Hasan M. Jamil
Department of Computer Science, University of Idaho, USA