π€ AI Summary
Ambiguity in item difficulty calibration undermines measurement validity and test reusability in data visualization literacy assessment.
Method: This paper proposes DRIVE-Tβa methodology grounded in semiotic theoryβs three-layer framework (syntactic, semantic, pragmatic) to model latent literacy constructs. It integrates structured task annotation, independent multi-rater scoring with inter-rater consistency validation, and the Many-Facets Rasch Model (MFRM) to empirically calibrate item difficulty and precisely align items with examinee ability.
Contribution/Results: DRIVE-T enables automated selection of high-discriminating, representative items and supports scalable item bank development. Pilot validation demonstrates significant improvements in structural validity, expressive power, and cross-context reusability of assessments. The framework provides a generalizable methodological and technical foundation for formative literacy evaluation.
π Abstract
The underspecification of progressive levels of difficulty in measurement constructs design and assessment tests for data visualization literacy may hinder the expressivity of measurements in both test design and test reuse. To mitigate this problem, this paper proposes DRIVE-T (Discriminating and Representative Items for Validating Expressive Tests), a methodology designed to drive the construction and evaluation of assessment items. Given a data vizualization, DRIVE-T supports the identification of task-based items discriminability and representativeness for measuring levels of data visualization literacy. DRIVE-T consists of three steps: (1) tagging task-based items associated with a set of data vizualizations; (2) rating them by independent raters for their difficulty; (3) analysing raters' raw scores through a Many-Facet Rasch Measurement model. In this way, we can observe the emergence of difficulty levels of the measurement construct, derived from the discriminability and representativeness of task-based items for each data vizualization, ordered into Many-Facets construct levels. In this study, we show and apply each step of the methodology to an item bank, which models the difficulty levels of a measurement construct approximating a latent construct for data visualization literacy. This measurement construct is drawn from semiotics, i.e., based on the syntax, semantics and pragmatics knowledge that each data visualization may require to be mastered by people. The DRIVE-T methodology operationalises an inductive approach, observable in a post-design phase of the items preparation, for formative-style and practice-based measurement construct emergence. A pilot study with items selected through the application of DRIVE-T is also presented to test our approach.