Supporting the Comprehension of Data Analysis Scripts

📅 2026-04-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

129K/year
🤖 AI Summary
This work addresses the challenges of poor readability, reproducibility, and maintainability in R scripts, which are exacerbated by the lack of effective tooling. To tackle this, the authors propose flowR, a plugin integrated into Positron and VS Code that innovatively combines incremental interprocedural data-flow and control-flow analysis to construct a unified data-flow graph accommodating R’s dynamic semantics. The system offers interactive visualization, static backward program slicing, inline value annotations, and linting capabilities, all built upon a modular, extensible architecture. Experimental results demonstrate that flowR constructs complete data-flow graphs in an average of 576 milliseconds, enabling near real-time feedback and substantially enhancing script understandability and maintainability.

Technology Category

Application Category

📝 Abstract
A lot of research relies on data analysis scripts to process, clean, and visualize data. However, recent studies show that these scripts are often hard to comprehend and maintain, hindering reproducibility and reuse, accompanied by a lack of tool support for handling such scripts. In this work, we focus on the R programming language, addressing this problem by presenting flowR as an extension for the common data analysis IDEs Positron and VS Code. Alongside a previously presented static backward program slicer, flowR provides an overview of data analysis scripts, interactive graph visualizations, linting, and inline value annotations to support data analysts. FlowR incrementally analyzes R projects by intertwining interprocedural data- and control-flow analyses to build a comprehensive dataflow graph, incorporating R's dynamic and explorative features. Additionally, flowR offers a plugin system and interfaces, allowing the integration of further analyses, such as new linting rules or custom visualizations. Requiring an average of 576ms to calculate the full dataflow graph of real-world projects, this enables near real-time feedback. The demonstration video is available at https://youtu.be/hJzr-r-NmMg . For the full source code and extensive documentation, refer to https://github.com/flowr-analysis/flowr . To try the docker image, use `docker run --rm -it eagleoutice/flowr`.
Problem

Research questions and friction points this paper is trying to address.

data analysis scripts
comprehensibility
maintainability
reproducibility
tool support
Innovation

Methods, ideas, or system contributions that make the work stand out.

dataflow analysis
program slicing
interactive visualization
incremental analysis
R programming
🔎 Similar Papers
No similar papers found.