MisSynth: Improving MISSCI Logical Fallacies Classification with Synthetic Data

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Identifying logical fallacies in scientific misinformation is both challenging and critical due to their subtle reasoning flaws and high societal impact; however, current large language models (LLMs) exhibit limited performance on the MISSCI benchmark. To address the scarcity of annotated fallacy data, this paper proposes MisSynth—a retrieval-augmented generation (RAG)-based framework that automatically constructs high-quality synthetic fallacy samples while preserving logical structure authenticity and diversity. MisSynth integrates RAG tightly with synthetic data generation and employs lightweight fine-tuning to endow LLaMA-3.1-8B with zero-shot fallacy classification capability. The key contribution lies in this deep coupling of RAG and synthetic data synthesis, enabling robust generalization without extensive human annotation. Experimental results show an absolute 35.2% improvement in F1 score on the MISSCI test set over strong baselines, demonstrating significant gains in both effectiveness and cross-domain generalizability for scientific misinformation detection.

Technology Category

Application Category

📝 Abstract

Health-related misinformation is very prevalent and potentially harmful. It is difficult to identify, especially when claims distort or misinterpret scientific findings. We investigate the impact of synthetic data generation and lightweight fine-tuning techniques on the ability of large language models (LLMs) to recognize fallacious arguments using the MISSCI dataset and framework. In this work, we propose MisSynth, a pipeline that applies retrieval-augmented generation (RAG) to produce synthetic fallacy samples, which are then used to fine-tune an LLM model. Our results show substantial accuracy gains with fine-tuned models compared to vanilla baselines. For instance, the LLaMA 3.1 8B fine-tuned model achieved an over 35% F1-score absolute improvement on the MISSCI test split over its vanilla baseline. We demonstrate that introducing synthetic fallacy data to augment limited annotated resources can significantly enhance zero-shot LLM classification performance on real-world scientific misinformation tasks, even with limited computational resources. The code and synthetic dataset are available on https://github.com/mxpoliakov/MisSynth.

Problem

Research questions and friction points this paper is trying to address.

Improving misinformation classification using synthetic data generation

Enhancing LLM fallacy detection with retrieval-augmented generation

Addressing limited annotated resources for scientific misinformation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generating synthetic fallacy samples using retrieval-augmented generation

Fine-tuning LLMs with synthetic data for classification

Enhancing zero-shot performance on misinformation detection

🔎 Similar Papers

A Logical Fallacy-Informed Framework for Argument Generation