TRUSTVIS: A Multi-Dimensional Trustworthiness Evaluation Framework for Large Language Models

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Assessing the trustworthiness of large language models (LLMs) in terms of safety and robustness remains a critical yet challenging problem. To address this, we propose a multidimensional automated evaluation framework that systematically probes model vulnerabilities. Our method integrates diverse adversarial perturbation techniques—including AutoDAN—with a multi-metric majority-voting mechanism to enhance detection reliability. Additionally, we design an interactive visualization frontend enabling fine-grained vulnerability localization, dynamic result interpretation, and user-guided diagnostic analysis. Extensive experiments on mainstream models—including Vicuna-7B, LLaMA2-7B, and GPT-3.5—demonstrate that the framework effectively identifies safety violations and robustness failures. Crucially, it delivers interpretable, actionable insights for model refinement, thereby significantly advancing the automation level and practical utility of LLM trustworthiness assessment.

Technology Category

Application Category

📝 Abstract

As Large Language Models (LLMs) continue to revolutionize Natural Language Processing (NLP) applications, critical concerns about their trustworthiness persist, particularly in safety and robustness. To address these challenges, we introduce TRUSTVIS, an automated evaluation framework that provides a comprehensive assessment of LLM trustworthiness. A key feature of our framework is its interactive user interface, designed to offer intuitive visualizations of trustworthiness metrics. By integrating well-known perturbation methods like AutoDAN and employing majority voting across various evaluation methods, TRUSTVIS not only provides reliable results but also makes complex evaluation processes accessible to users. Preliminary case studies on models like Vicuna-7b, Llama2-7b, and GPT-3.5 demonstrate the effectiveness of our framework in identifying safety and robustness vulnerabilities, while the interactive interface allows users to explore results in detail, empowering targeted model improvements. Video Link: https://youtu.be/k1TrBqNVg8g

Problem

Research questions and friction points this paper is trying to address.

Evaluating trustworthiness vulnerabilities in large language models

Providing comprehensive safety and robustness assessment framework

Automating multi-dimensional trustworthiness evaluation with interactive visualization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated framework for comprehensive LLM trustworthiness assessment

Interactive interface with intuitive visualization of trustworthiness metrics

Integrates perturbation methods and majority voting for reliability

🔎 Similar Papers

No similar papers found.

Authors to Follow