Bias Attribution in Filipino Language Models: Extending a Bias Interpretability Metric for Application on Agglutinative Languages

📅 2025-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of interpretability in bias attribution for agglutinative languages—exemplified by Filipino—within language models. We adapt information-theoretic bias attribution scores to accommodate agglutinative morphology, integrating Filipino tokenization and morphological analysis, inter-layer gradient computation, and cross-lingual model comparisons. We systematically evaluate bias sources across monolingual Filipino and three multilingual models. Results reveal that bias in Filipino models stems predominantly from entity-type tokens (persons, objects, relations), contrasting sharply with English models where action-oriented themes dominate—highlighting fundamental cross-linguistic differences in bias mechanisms. This work extends the linguistic applicability of interpretable bias attribution methods and introduces an entity-driven paradigm for non-English bias analysis. It further provides a reusable methodological framework for fairness research in low-resource languages.

Technology Category

Application Category

📝 Abstract
Emerging research on bias attribution and interpretability have revealed how tokens contribute to biased behavior in language models processing English texts. We build on this line of inquiry by adapting the information-theoretic bias attribution score metric for implementation on models handling agglutinative languages, particularly Filipino. We then demonstrate the effectiveness of our adapted method by using it on a purely Filipino model and on three multilingual models: one trained on languages worldwide and two on Southeast Asian data. Our results show that Filipino models are driven towards bias by words pertaining to people, objects, and relationships, entity-based themes that stand in contrast to the action-heavy nature of bias-contributing themes in English (i.e., criminal, sexual, and prosocial behaviors). These findings point to differences in how English and non-English models process inputs linked to sociodemographic groups and bias.
Problem

Research questions and friction points this paper is trying to address.

Extend bias attribution metric for agglutinative languages like Filipino
Analyze bias sources in Filipino vs. English language models
Compare bias themes in monolingual and multilingual Filipino models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapting bias attribution score for agglutinative languages
Testing method on Filipino and multilingual models
Identifying entity-based bias themes in Filipino models
🔎 Similar Papers
No similar papers found.
L
Lance Calvin Lim Gamboa
School of Computer Science, University of Birmingham; Department of Information Systems and Computer Science, Ateneo de Manila University
L
Lance Calvin Lim Gamboa
School of Computer Science, University of Birmingham; Department of Information Systems and Computer Science, Ateneo de Manila University
Y
Yue Feng
School of Computer Science, University of Birmingham
Mark Lee
Mark Lee
University of Birmingham
Computer ScienceNatural Language Processing