🤖 AI Summary
In highly regulated domains such as finance, data quality control (QC) is often fragmented into isolated preprocessing steps, undermining end-to-end trustworthy AI pipelines. To address this, we propose the first AI-driven DataOps framework that embeds QC as a system-level core component. Our framework deeply integrates rule-based engines, statistical analysis, and custom AI-powered anomaly detection across the entire data lifecycle—from ingestion and transformation to model deployment—enabling dynamic remediation, policy-configurable workflows, and end-to-end auditability. Technically, it unifies data profiling, stream processing, cloud-native storage interfaces, and a proprietary AI detection module. Evaluated in a real-world financial production environment, the framework achieves significantly improved anomaly recall, reduces manual intervention by 42%, and ensures audit completeness and full data traceability under high-throughput conditions, fully satisfying regulatory compliance requirements.
📝 Abstract
In regulated domains such as finance, the integrity and governance of data pipelines are critical - yet existing systems treat data quality control (QC) as an isolated preprocessing step rather than a first-class system component. We present a unified AI-driven Data QC and DataOps Management framework that embeds rule-based, statistical, and AI-based QC methods into a continuous, governed layer spanning ingestion, model pipelines, and downstream applications. Our architecture integrates open-source tools with custom modules for profiling, audit logging, breach handling, configuration-driven policies, and dynamic remediation. We demonstrate deployment in a production-grade financial setup: handling streaming and tabular data across multiple asset classes and transaction streams, with configurable thresholds, cloud-native storage interfaces, and automated alerts. We show empirical gains in anomaly detection recall, reduction of manual remediation effort, and improved auditability and traceability in high-throughput data workflows. By treating QC as a system concern rather than an afterthought, our framework provides a foundation for trustworthy, scalable, and compliant AI pipelines in regulated environments.