🤖 AI Summary
This study investigates whether prosodic features extracted from post-match tennis interview speech can reliably indicate match outcomes. We propose a classification framework that integrates conventional acoustic features—such as pitch variability and intensity dynamics—with self-supervised speech representations (Wav2Vec 2.0 and HuBERT), operating solely on raw audio to predict win/loss status. Experimental results demonstrate that prosodic cues, particularly pitch variability, exhibit significant statistical association with victory-related affective states. Moreover, self-supervised representations consistently outperform handcrafted features in both cross-sample generalization and discriminative accuracy, achieving >85% classification accuracy across multiple independent datasets. This work constitutes the first systematic validation that victory- and defeat-related emotional states embedded in post-competition speech are computationally identifiable. It establishes a novel, audio-only paradigm for inferring competitive outcomes through prosodic analysis, advancing the intersection of affective computing, sports analytics, and spoken language processing.
📝 Abstract
This study examines the prosodic characteristics associated with winning and losing in post-match tennis interviews. Additionally, this research explores the potential to classify match outcomes solely based on post-match interview recordings using prosodic features and self-supervised learning (SSL) representations. By analyzing prosodic elements such as pitch and intensity, alongside SSL models like Wav2Vec 2.0 and HuBERT, the aim is to determine whether an athlete has won or lost their match. Traditional acoustic features and deep speech representations are extracted from the data, and machine learning classifiers are employed to distinguish between winning and losing players. Results indicate that SSL representations effectively differentiate between winning and losing outcomes, capturing subtle speech patterns linked to emotional states. At the same time, prosodic cues -- such as pitch variability -- remain strong indicators of victory.