Data-NoMAD: A Tool for Boosting Confidence in the Integrity of Social Science Survey Data

📅 2025-01-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Social science survey data are vulnerable to tampering and lack integrity verification mechanisms between collection and archival. To address this, we propose a lightweight, column-level hash authentication method for questionnaire data. The method computes SHA-256 hashes for each data column, binds them with metadata for secure storage, and employs a differential comparison algorithm to detect column insertions, deletions, or value modifications. It integrates seamlessly with Qualtrics and SurveyCTO APIs, supports privacy-preserving preprocessing, and enables reproducible integrity verification. As the first platform-adapted column-level hashing scheme, it bridges the integrity verification gap across the “collection–archival” pipeline. Operating in zero-trust environments, it supports independent third-party validation and achieves 100% detection accuracy for column-structure alterations. The method has been embedded into existing replication and archival workflows.

Technology Category

Application Category

📝 Abstract
To safeguard against data fabrication and enhance trust in quantitative social science, we present Data Non-Manipulation Authentication Digest (Data-NoMAD). Data-NoMAD is a tool that allows researchers to certify, and others to verify, that a dataset has not been inappropriately manipulated between the point of data collection and the point at which a replication archive is made publicly available. Data-NoMAD creates and stores a column hash digest of a raw dataset upon initial download from a survey platform (the current version works with Qualtrics and SurveyCTO), but before it is subject to appropriate manipulations such as anonymity-preserving redactions. Data-NoMAD can later be used to verify the integrity of a publicly archived dataset by identifying columns that have been deleted, added, or altered. Data-NoMAD complements existing efforts at ensuring research integrity and integrates seamlessly with extant replication practices.
Problem

Research questions and friction points this paper is trying to address.

Social Science Questionnaire Data
Data Integrity Verification
Trust Enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-NoMAD
Social Science Data Integrity
Privacy Protection
🔎 Similar Papers
No similar papers found.
S
Sanford C. Gordon
Wilf Family Department of Politics, New York University
Cyrus Samii
Cyrus Samii
Professor of Politics, New York University
MethodologyPolitical EconomyPolitical Science
Z
Zhihao Su
Center for Data Science, New York University