Exploring the Zero-Shot Capabilities of LLMs Handling Multiple Problems at once

📅 2024-06-16
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the multi-problem parallel solving (MPP) capability of large language models (LLMs), focusing on zero-shot multi-task performance. Methodologically, it evaluates 6 classification and 12 reasoning benchmarks using parallel multi-problem prompting, cross-benchmark performance analysis, and behavioral attribution to probe underlying mechanisms. The study provides the first empirical evidence that mainstream LLMs possess robust native MPP ability; identifies instruction tuning as a critical driver—yielding substantial quantitative gains in MPP performance; and reveals fundamental limitations in index localization and cross-source hybrid reasoning. By establishing MPP as a novel paradigm for assessing LLMs’ multi-task generalization, the work delineates its empirical validity boundaries and identifies concrete optimization pathways.

Technology Category

Application Category

📝 Abstract
Recent studies have proposed placing multiple problems in a single prompt to improve input token utilization for a more efficient LLM inference. We call this MPP, in contrast to conventional SPP that prompts an LLM with a single problem at a time. While MPP has been shown to work comparably well or even better than SPP under few-shot settings, its zero-shot performance is underexplored, which better reveals the innate multiple problem handling capabilities of LLMs. To address that, we study the zero-shot MPP performance of various LLMs on 6 classification and 12 reasoning benchmarks and confirm that LLMs are competent zero-shot multi-problem solvers. We also examine the conditions of effectiveness of zero-shot MPP and explore several model-level factors that may enable MPP. We observe that LLMs consistently perform worse with selecting indices of texts of a given class label and with multiple mixed-source reasoning problems, indicating a lack of true understanding. We also find that instruction tuning is an important factor than enhances MPP.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs with multiple problems simultaneously
Introducing ZeMPE benchmark for zero-shot multi-problem assessment
Analyzing conditions where LLMs fail in multi-problem handling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-problem evaluation (MPE) paradigm
ZeMPE benchmark with 53,100 prompts
Analyzes 13 LLMs across 5 families
🔎 Similar Papers
No similar papers found.