JOINT: Join Optimization and Inference via Network Traversal

📅 2025-09-08

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Traditional relational databases rely on exact column-name and value matching, making them ill-suited for cross-heterogeneous database integration where column naming inconsistencies and data fragmentation prevail. To address this, we propose an end-to-end fuzzy join framework: it constructs a weighted graph model wherein edge weights integrate column-name semantic embedding similarity and row-level fuzzy value overlap—quantified via a negative-log-transformed Jaccard score. Multi-hop join paths are then discovered through graph traversal, enabling automated, indirect, and non-equi joins. Unlike conventional single-hop equi-joins, our approach supports complex, semantics-aware linkage across heterogeneous schemas. Evaluated on synthetic healthcare databases, the method accurately recovers correct join relationships under column-name obfuscation and partial value mismatches, significantly improving data connectability and integration efficiency in complex heterogeneous environments.

Technology Category

Application Category

📝 Abstract

Traditional relational databases require users to manually specify join keys and assume exact matches between column names and values. In practice, this limits joinability across fragmented or inconsistently named tables. We propose a fuzzy join framework that automatically identifies joinable column pairs and traverses indirect (multi-hop) join paths across multiple databases. Our method combines column name similarity with row-level fuzzy value overlap, computes edge weights using negative log-transformed Jaccard scores, and performs join path discovery via graph traversal. Experiments on synthetic healthcare-style databases demonstrate the system's ability to recover valid joins despite fuzzified column names and partial value mismatches. This research has direct applications in data integration.

Problem

Research questions and friction points this paper is trying to address.

Automating join key identification across fragmented databases

Handling fuzzy column name and value mismatches in joins

Discovering multi-hop join paths via graph traversal methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuzzy join framework for automatic column matching

Combines name similarity with value overlap scoring

Graph traversal for multi-hop join path discovery

🔎 Similar Papers

Color: A Framework for Applying Graph Coloring to Subgraph Cardinality Estimation