Fiducial Confidence Intervals for Agreement Measures Among Raters Under a Generalized Linear Mixed Effects Model

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of quantifying inter-rater agreement for three-level hierarchical data—such as longitudinal imaging assessments—involving multiple raters, multiple time points, and repeated measurements. Within the generalized linear mixed-effects model (GLMM) framework, we propose an extended concordance correlation coefficient (CCC) and its confidence interval estimation method. Departing from conventional Fisher’s Z transformation, we introduce fiducial inference—adapted here for the first time to multilevel mixed-effects models—and integrate it with a model linearization approximation to yield robust CCC interval estimates. Monte Carlo simulations demonstrate that our method substantially improves empirical coverage probabilities (achieving near-nominal levels) and reduces expected interval width under moderate sample sizes. We validate the approach in two clinical applications: MRI-based osteoarthritis scoring and diffusion MRI tractography assessment. Furthermore, we extend it to evaluate consistency among AI-based assessments. The proposed methodology provides a generalizable statistical tool for complex longitudinal, multi-rater studies.

Technology Category

Application Category

📝 Abstract
A generalization of the classical concordance correlation coefficient (CCC) is considered under a three-level design where multiple raters rate every subject over time, and each rater is rating every subject multiple times at each measuring time point. The ratings can be discrete or continuous. A methodology is developed for the interval estimation of the CCC based on a suitable linearization of the model along with an adaptation of the fiducial inference approach. The resulting confidence intervals have satisfactory coverage probabilities and shorter expected widths compared to the interval based on Fisher Z-transformation, even under moderate sample sizes. Two real applications available in the literature are discussed. The first application is based on a clinical trial to determine if various treatments are more effective than a placebo for treating knee pain associated with osteoarthritis. The CCC was used to assess agreement among the manual measurements of the joint space widths on plain radiographs by two raters, and the computer-generated measurements of digitalized radiographs. The second example is on a corticospinal tractography, and the CCC was once again applied in order to evaluate the agreement between a well-trained technologist and a neuroradiologist regarding the measurements of fiber number in both the right and left corticospinal tracts. Other relevant applications of our general approach are highlighted in many areas including artificial intelligence.
Problem

Research questions and friction points this paper is trying to address.

Estimating agreement measures among multiple raters over time.
Developing confidence intervals for concordance correlation coefficient (CCC).
Assessing rater agreement in clinical and AI applications.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized linear mixed effects model adaptation
Fiducial inference for confidence intervals
Improved coverage probabilities and widths
🔎 Similar Papers
No similar papers found.
Soumya Sahu
Soumya Sahu
University of Illinois at Chicago
biostatistics
Thomas Mathew
Thomas Mathew
Professor of Statistics, University of Maryland Baltimore County
Statistics
D
D. Bhaumik
Department of Epidemiology and Biostatistics, Department of Psychiatry, University of Illinois Chicago