🤖 AI Summary
Social science survey data are vulnerable to tampering and lack integrity verification mechanisms between collection and archival. To address this, we propose a lightweight, column-level hash authentication method for questionnaire data. The method computes SHA-256 hashes for each data column, binds them with metadata for secure storage, and employs a differential comparison algorithm to detect column insertions, deletions, or value modifications. It integrates seamlessly with Qualtrics and SurveyCTO APIs, supports privacy-preserving preprocessing, and enables reproducible integrity verification. As the first platform-adapted column-level hashing scheme, it bridges the integrity verification gap across the “collection–archival” pipeline. Operating in zero-trust environments, it supports independent third-party validation and achieves 100% detection accuracy for column-structure alterations. The method has been embedded into existing replication and archival workflows.
📝 Abstract
To safeguard against data fabrication and enhance trust in quantitative social science, we present Data Non-Manipulation Authentication Digest (Data-NoMAD). Data-NoMAD is a tool that allows researchers to certify, and others to verify, that a dataset has not been inappropriately manipulated between the point of data collection and the point at which a replication archive is made publicly available. Data-NoMAD creates and stores a column hash digest of a raw dataset upon initial download from a survey platform (the current version works with Qualtrics and SurveyCTO), but before it is subject to appropriate manipulations such as anonymity-preserving redactions. Data-NoMAD can later be used to verify the integrity of a publicly archived dataset by identifying columns that have been deleted, added, or altered. Data-NoMAD complements existing efforts at ensuring research integrity and integrates seamlessly with extant replication practices.