🤖 AI Summary
Existing methods struggle to comprehensively and accurately quantify paper-level innovation, often neglecting full-text context and suffering from limited generalizability and interpretability. This paper proposes HSPIM, a training-free hierarchical scientific paper innovation metric, which enables context-aware, fine-grained assessment via a three-level decomposition: “full paper → section → question-answer.” Methodologically, HSPIM integrates zero-shot LLM prompting, section segmentation and classification, QA generation, and confidence-weighted aggregation. Its core contributions are: (1) the first novelty-weighted scoring mechanism enhanced by section-level QA generation; and (2) a two-tier structured prompting template optimized via genetic algorithms to balance domain-agnostic utility and field-specific adaptability. Evaluated on multiple top-tier conference paper datasets, HSPIM achieves state-of-the-art performance in effectiveness, generalizability, and interpretability.
📝 Abstract
Measuring scientific paper innovation is both important and challenging. Existing content-based methods often overlook the full-paper context, fail to capture the full scope of innovation, and lack generalization. We propose HSPIM, a hierarchical and training-free framework based on large language models (LLMs). It introduces a Paper-to-Sections-to-QAs decomposition to assess innovation. We segment the text by section titles and use zero-shot LLM prompting to implement section classification, question-answering (QA) augmentation, and weighted novelty scoring. The generated QA pair focuses on section-level innovation and serves as additional context to improve the LLM scoring. For each chunk, the LLM outputs a novelty score and a confidence score. We use confidence scores as weights to aggregate novelty scores into a paper-level innovation score. To further improve performance, we propose a two-layer question structure consisting of common and section-specific questions, and apply a genetic algorithm to optimize the question-prompt combinations. Comprehensive experiments on scientific conference paper datasets show that HSPIM outperforms baseline methods in effectiveness, generalization, and interpretability.