ALINC: Active Learning for Inductive Node Classification via Graph Sampling

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses a critical limitation of existing active learning methods, which focus on node selection within a single large graph and are thus ill-suited for inductive node classification scenarios where annotations must be acquired at the whole-graph level. To bridge this gap, we propose ALINC, a novel framework that extends active learning from node-level selection to graph-level sampling. ALINC elevates node informativeness measures to graph-level selection criteria through multiple aggregation mechanisms, including CoreSet, TypiClust, and BADGE. Extensive experiments across four datasets demonstrate the effectiveness of ALINC, identify the optimal aggregation strategy, and showcase its practical utility in molecular metabolic site prediction and automated circuit board design, thereby establishing the first principled approach to graph-level active learning in multi-graph inductive settings.

📝 Abstract

Active learning (AL) for node classification typically focuses on selecting the most informative nodes for annotation within one or a few large graphs (e.g., in social network analysis). However, in other domains, such as molecular chemistry or electronic design automation, datasets consist of thousands of independent graphs. In many of these inductive settings, annotating an individual node requires a full-graph analysis, which effectively yields the remaining node labels on-the-fly. Therefore, these scenarios require AL strategies that select entire graphs instead of single nodes, a problem which has not been tackled in the literature so far. Thus, we introduce ALINC, an AL framework for inductive node classification via graph sampling. It bridges the existing methodological gap by elevating node-level utility measures to graph-level selection criteria through various aggregation mechanisms. In an extensive benchmark including ten strategies, three aggregation methods, and four datasets, we identify CoreSet, TypiClust, and BADGE as the top-performing graph sampling strategies. Our detailed analysis further reveals that the choice of the aggregation method is pivotal, as it substantially affects model performance and annotation costs. Finally, we demonstrate the effectiveness of ALINC in two use case studies: site-of-metabolism prediction in molecules and design automation of printed circuit board schematics.

Problem

Research questions and friction points this paper is trying to address.

active learning

inductive node classification

graph sampling

graph-level selection

annotation cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

active learning

inductive node classification

graph sampling