Generating Privacy Stories From Software Documentation

📅 2025-06-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Privacy is frequently misconstrued as a security concern or post-hoc remediation, leading to the omission of privacy requirements during early development stages and escalating regulatory compliance risks. Method: This paper introduces the first privacy requirement elicitation framework spanning the entire software lifecycle. It uniquely integrates Chain-of-Thought (CoT) prompting with In-Context Learning (ICL) to guide large language models (LLMs)—specifically GPT-4o and Llama 3—to automatically identify privacy-related behaviors from early- and mid-phase artifacts (e.g., requirements documents, design specifications) and generate structured privacy requirements in user-story format. Contribution/Results: Unlike conventional compliance-auditing approaches, our framework embeds privacy engineering proactively. Empirical evaluation shows that mainstream LLMs achieve F1-scores exceeding 0.8 on both privacy behavior identification and user-story generation tasks; fine-tuning further improves performance, demonstrating the method’s effectiveness, robustness, and scalability.

Technology Category

Application Category

📝 Abstract
Research shows that analysts and developers consider privacy as a security concept or as an afterthought, which may lead to non-compliance and violation of users' privacy. Most current approaches, however, focus on extracting legal requirements from the regulations and evaluating the compliance of software and processes with them. In this paper, we develop a novel approach based on chain-of-thought prompting (CoT), in-context-learning (ICL), and Large Language Models (LLMs) to extract privacy behaviors from various software documents prior to and during software development, and then generate privacy requirements in the format of user stories. Our results show that most commonly used LLMs, such as GPT-4o and Llama 3, can identify privacy behaviors and generate privacy user stories with F1 scores exceeding 0.8. We also show that the performance of these models could be improved through parameter-tuning. Our findings provide insight into using and optimizing LLMs for generating privacy requirements given software documents created prior to or throughout the software development lifecycle.
Problem

Research questions and friction points this paper is trying to address.

Extracting privacy behaviors from software documentation
Generating privacy requirements as user stories
Improving LLM performance for privacy compliance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses chain-of-thought prompting for privacy extraction
Applies in-context-learning with Large Language Models
Generates privacy user stories from software documents
🔎 Similar Papers
No similar papers found.
W
Wilder Baldwin
School of Computing and Information Science, University of Maine
S
Shashank Chintakuntla
School of Computing and Information Science, University of Maine
S
Shreyah Parajuli
School of Computing and Information Science, University of Maine
A
Ali Pourghasemi
School of Computing and Information Science, University of Maine
R
Ryan Shanz
School of Computing and Information Science, University of Maine
Sepideh Ghanavati
Sepideh Ghanavati
Associate Professor, University of Maine
Usable PrivacySoftware EngineeringRequirements and SoftwareDeep LearningAI Ethics