When Are Reactive Notebooks Not Reactive?

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

141K/year

🤖 AI Summary

Existing reactive notebook systems (e.g., Ipyflow, Marimo, Observable) lack consensus on the definition of “reactivity” and frequently suffer from state synchronization failures under common code edits, hindering users’ ability to form reliable mental models. This paper introduces Rex, the first fine-grained testing framework for reactive notebooks, which integrates static program analysis with mutation testing to systematically evaluate reactivity across representative editing patterns. Rex quantitatively assesses three major systems and uncovers multiple cross-system reactivity breakage patterns—enabling the first reproducible, comparable evaluation of semantic reactivity consistency. Beyond exposing critical design flaws and failure boundaries of current systems, this work establishes the first standardized assessment paradigm for reactivity reliability, providing a foundational methodology for the design, verification, and improvement of next-generation reactive notebooks.

Technology Category

Application Category

📝 Abstract

Computational notebooks are convenient for programmers, but can easily become confusing and inconsistent due to the ability to incrementally edit a program that is running. Recent reactive notebook systems, such as Ipyflow, Marimo and Observable, strive to keep notebook state in sync with the current cell code by re-executing a minimal set of cells upon modification. However, each system defines reactivity a different way. Additionally, within any definition, we find simple notebook modifications that can break each system. Overall, these inconsistencies make it difficult for users to construct a mental model of their reactive notebook's implementation. This paper proposes Rex, a fine-grained test suite to discuss and assess reactivity capabilities within reactive notebook systems. We evaluate Rex on three existing reactive notebook systems and classify their failures with the aims of (i) helping programmers understand when reactivity fails and (ii) helping notebook implementations improve.

Problem

Research questions and friction points this paper is trying to address.

Reactive notebooks lack consistent reactivity definitions

Simple modifications can break existing reactive notebook systems

Users struggle to build mental models of reactive implementations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rex test suite assesses reactivity capabilities

Fine-grained evaluation of existing reactive notebook systems

Classifies failures to improve understanding and implementation

🔎 Similar Papers

No similar papers found.