🤖 AI Summary
Fine-tuning large language models (LLMs) often yields unreliable responses to out-of-distribution (OoD) queries. To address this, we propose LoRA-BAM—a lightweight, interpretable framework that embeds a box-based abstract monitor into LoRA adaptation layers for real-time input capability boundary assessment. Our key contributions are: (1) an OoD detection mechanism leveraging feature-space clustering and axis-aligned bounding boxes (AABBs); (2) semantic consistency regularization to enhance robustness against paraphrased inputs; and (3) an adaptive boundary expansion strategy incorporating intra-cluster variance to jointly optimize coverage and precision. Evaluated across multiple benchmarks, LoRA-BAM achieves >92% OoD identification accuracy, <3% false rejection rate, negligible inference overhead (<0.5% latency increase), and zero degradation in downstream task performance—significantly outperforming state-of-the-art approaches.
📝 Abstract
Fine-tuning large language models (LLMs) improves performance on domain-specific tasks but can lead to overfitting, making them unreliable on out-of-distribution (OoD) queries. We propose LoRA-BAM - a method that adds OoD detection monitors to the LoRA layer using boxed abstraction to filter questions beyond the model's competence. Feature vectors from the fine-tuning data are extracted via the LLM and clustered. Clusters are enclosed in boxes; a question is flagged as OoD if its feature vector falls outside all boxes. To improve interpretability and robustness, we introduce a regularization loss during fine-tuning that encourages paraphrased questions to stay close in the feature space, and the enlargement of the decision boundary is based on the feature variance within a cluster. Our method complements existing defenses by providing lightweight and interpretable OoD detection.