LoRA-BAM: Input Filtering for Fine-tuned LLMs via Boxed Abstraction Monitors over LoRA Layers

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fine-tuning large language models (LLMs) often yields unreliable responses to out-of-distribution (OoD) queries. To address this, we propose LoRA-BAM—a lightweight, interpretable framework that embeds a box-based abstract monitor into LoRA adaptation layers for real-time input capability boundary assessment. Our key contributions are: (1) an OoD detection mechanism leveraging feature-space clustering and axis-aligned bounding boxes (AABBs); (2) semantic consistency regularization to enhance robustness against paraphrased inputs; and (3) an adaptive boundary expansion strategy incorporating intra-cluster variance to jointly optimize coverage and precision. Evaluated across multiple benchmarks, LoRA-BAM achieves >92% OoD identification accuracy, <3% false rejection rate, negligible inference overhead (<0.5% latency increase), and zero degradation in downstream task performance—significantly outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Fine-tuning large language models (LLMs) improves performance on domain-specific tasks but can lead to overfitting, making them unreliable on out-of-distribution (OoD) queries. We propose LoRA-BAM - a method that adds OoD detection monitors to the LoRA layer using boxed abstraction to filter questions beyond the model's competence. Feature vectors from the fine-tuning data are extracted via the LLM and clustered. Clusters are enclosed in boxes; a question is flagged as OoD if its feature vector falls outside all boxes. To improve interpretability and robustness, we introduce a regularization loss during fine-tuning that encourages paraphrased questions to stay close in the feature space, and the enlargement of the decision boundary is based on the feature variance within a cluster. Our method complements existing defenses by providing lightweight and interpretable OoD detection.
Problem

Research questions and friction points this paper is trying to address.

Detects out-of-distribution queries in fine-tuned LLMs
Uses boxed abstraction monitors on LoRA layers
Improves interpretability with regularization and clustering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Boxed abstraction monitors for OoD detection
Regularization loss for feature space closeness
Decision boundary enlargement based on variance
🔎 Similar Papers
No similar papers found.