Improving Methodologies for Agentic Evaluations Across Domains: Leakage of Sensitive Information, Fraud and Cybersecurity Threats

📅 2026-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the critical gap in evaluating autonomous AI systems within real-world interactions, where inadequate assessment methods can lead to cross-domain risks such as sensitive data leakage, fraud, and cybersecurity breaches. To tackle this challenge, the work presents the first unified agent safety evaluation framework that explicitly incorporates cultural and linguistic diversity, developed through international collaboration. Leveraging publicly available agent benchmark tasks, the framework systematically evaluates the risk-handling capabilities of diverse large language models—both open- and closed-source—in authentic scenarios. The research identifies fundamental methodological shortcomings in current agent evaluation practices and advocates a paradigm shift from comparative performance metrics toward scientifically rigorous, standardized, and reproducible assessment protocols, thereby laying the groundwork for a robust, cross-domain AI agent safety evaluation ecosystem.

Technology Category

Application Category

📝 Abstract
The rapid rise of autonomous AI systems and advancements in agent capabilities are introducing new risks due to reduced oversight of real-world interactions. Yet agent testing remains nascent and is still a developing science. As AI agents begin to be deployed globally, it is important that they handle different languages and cultures accurately and securely. To address this, participants from The International Network for Advanced AI Measurement, Evaluation and Science, including representatives from Singapore, Japan, Australia, Canada, the European Commission, France, Kenya, South Korea, and the United Kingdom have come together to align approaches to agentic evaluations. This is the third exercise, building on insights from two earlier joint testing exercises conducted by the Network in November 2024 and February 2025. The objective is to further refine best practices for testing advanced AI systems. The exercise was split into two strands: (1) common risks, including leakage of sensitive information and fraud, led by Singapore AISI; and (2) cybersecurity, led by UK AISI. A mix of open and closed-weight models were evaluated against tasks from various public agentic benchmarks. Given the nascency of agentic testing, our primary focus was on understanding methodological issues in conducting such tests, rather than examining test results or model capabilities. This collaboration marks an important step forward as participants work together to advance the science of agentic evaluations.
Problem

Research questions and friction points this paper is trying to address.

agentic evaluations
sensitive information leakage
fraud
cybersecurity threats
AI safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic evaluation
methodological framework
cross-national collaboration
sensitive information leakage
AI safety testing
🔎 Similar Papers
No similar papers found.
E
Ee Wei Seah
Yongsen Zheng
Yongsen Zheng
Nanyang Technological University / Sun Yat-sen University
Recommender SystemHuman-AI Dialogue SystemNatural Language ProcessingTrustworthy AIAI Safety
N
Naga Nikshith
M
Mahran Morsidi
G
Gabriel Waikin Loh Matienzo
N
Nigel Gay
A
Akriti Vij
B
Benjamin Chua
E
En Qi Ng
S
Sharmini Johnson
V
Vanessa Wilfred
W
Wan Sie Lee
A
Anna Davidson
C
Catherine Devine
E
Erin Zorer
G
Gareth Holvey
Harry Coppock
Harry Coppock
Imperial College London
Deep LearningSignal ProcessingAudioRepresentation LearningQuantisation
J
James Walpole
J
Jerome Wynee
M
Magda Dubois
M
Michael Schmatz
P
Patrick Keane
S
Sam Deverett
B
Bill Black
B
Bo Yan
Bushra Sabir
Bushra Sabir
Research Scientist at CSIRO's Data61
Adversarial Machine learningDeep learningCyber-security
F
Frank Sun
H
Hao Zhang
H
Harriet Farlow
H
Helen Zhou
L
Li-ping Dong
Qinghua Lu
Qinghua Lu
Group Leader, Software Systems Research Group, CSIRO's Data61
AI EngineeringSE4AISoftware ArchitectureAI SafetyResponsible AI
S
Seung Jang
S
Sharif Abuadbba
Simon O'Callaghan
Simon O'Callaghan
Principal Researcher at Gradient Institute
Ethical AIMachine LearningRobotics
Suyu Ma
Suyu Ma
CSIRO's Data61
Software engineering
T
Tom Howroyd
C
Cyrus Fung
F
Fatemeh Azadi
Isar Nejadgholi
Isar Nejadgholi
National Research Council Canada, University of Ottawa
Natural Language ProcessingAI for Social ImpactResponsible AIAI Safety
Krishnapriya Vishnubhotla
Krishnapriya Vishnubhotla
University of Toronto
Natural Language Processing
P
Pulei Xiong
S
S. Lohrasbi
Scott Buffett
Scott Buffett
National Research Council Canada
machine learningdata miningmultiagent systems
Shahrear Iqbal
Shahrear Iqbal
Research Officer, National Research Council (NRC) Canada
Security and Privacy
Sowmya Vajjala
Sowmya Vajjala
National Research Council, Canada
Natural Language Processing
A
Anna Safont-Andreu
L
L. Massarelli
O
O. V. D. Wal
S
Simon Moller
A
Agnès Delaborde
J
Joris Dugu'ep'eroux
N
Nicolas Rolin
R
Romane Gallienne
S
Sarah Behanzin
T
Tom Seimandi
Akiko Murakami
Akiko Murakami
IBM Research Tokyo
Natural Language ProcessingSocial Analytics
T
Takayuki Semitsu
T
Teresa Tsukiji
A
Angela Kinuthia
M
Michael Michie
S
Stephanie Kasaon
J
Jean Wangari
H
Hankyul Baek
J
Jaewon Noh
K
Kihyuk Nam
S
Sang Seo
S
Sungpil Shin
T
Taewhi Lee
Y
Yongsu Kim