MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Existing closed-source multi-agent systems (MAS) inadequately address multimodal interaction barriers faced by users with disabilities in digital environments, lacking customization and fine-grained task parsing capabilities. Method: We propose ModCon-Task-Identifier—a novel open-source multimodal translation system integrating large language models (LLMs), lightweight machine learning classifiers, and an extensible multi-agent coordination framework to enable real-time image-text-speech conversion. The system supports local deployment, privacy-preserving operation, and integration with heterogeneous hardware and institutional infrastructures (e.g., digital healthcare platforms). Contribution/Results: Evaluated on a curated task identification benchmark, ModCon-Task-Identifier significantly outperforms state-of-the-art LLMs and statistical baselines. The end-to-end system demonstrates high real-time performance, robust adaptability across diverse modalities and domains, and strong security guarantees. All code and datasets are publicly released to facilitate reproducibility and deployment across real-world accessibility scenarios.

Technology Category

Application Category

📝 Abstract

Accessibility remains a critical concern in today's society, as many technologies are not developed to support the full range of user needs. Existing multi-agent systems (MAS) often cannot provide comprehensive assistance for users in need due to the lack of customization stemming from closed-source designs. Consequently, individuals with disabilities frequently encounter significant barriers when attempting to interact with digital environments. We introduce MATE, a multimodal accessibility MAS, which performs the modality conversions based on the user's needs. The system is useful for assisting people with disabilities by ensuring that data will be converted to an understandable format. For instance, if the user cannot see well and receives an image, the system converts this image to its audio description. MATE can be applied to a wide range of domains, industries, and areas, such as healthcare, and can become a useful assistant for various groups of users. The system supports multiple types of models, ranging from LLM API calling to using custom machine learning (ML) classifiers. This flexibility ensures that the system can be adapted to various needs and is compatible with a wide variety of hardware. Since the system is expected to run locally, it ensures the privacy and security of sensitive information. In addition, the framework can be effectively integrated with institutional technologies (e.g., digital healthcare service) for real-time user assistance. Furthermore, we introduce ModCon-Task-Identifier, a model that is capable of extracting the precise modality conversion task from the user input. Numerous experiments show that ModCon-Task-Identifier consistently outperforms other LLMs and statistical models on our custom data. Our code and data are publicly available at https://github.com/AlgazinovAleksandr/Multi-Agent-MATE.

Problem

Research questions and friction points this paper is trying to address.

Addresses accessibility gaps in digital environments for disabled users

Overcomes limitations of closed-source multi-agent systems via customization

Ensures privacy and flexibility with local multimodal conversion support

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-powered multi-agent system for accessibility

Modality conversion based on user needs

Local execution ensures privacy and security

🔎 Similar Papers

Conversational SimulMT: Efficient Simultaneous Translation with Large Language Models