Why do Machine Learning Notebooks Crash? An Empirical Study on Public Python Jupyter Notebooks

📅 2024-11-25

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Machine learning (ML) Jupyter notebooks frequently crash, hindering developer productivity and reliability. Method: We conduct a large-scale empirical study on 64,031 publicly available Python ML notebooks, systematically analyzing 92,542 crash events to construct the first large-scale ML notebook crash dataset. We propose an ML-specific error taxonomy—identifying tensor shape mismatches, data-value violations, and visualization errors as novel, domain-specific crash categories—and quantitatively assess root causes. Contribution/Results: We find that notebook semantics (e.g., out-of-order cell execution, residual state) combined with API misuse account for over 40% of crashes; more than 70% occur during data preparation, training, or evaluation; and TensorFlow/Keras and PyTorch are the most crash-prone libraries. This work provides foundational empirical evidence to guide the design of robust ML development environments and intelligent debugging tools.

Technology Category

Application Category

📝 Abstract

Jupyter notebooks have become central in data science, integrating code, text and output in a flexible environment. With the rise of machine learning (ML), notebooks are increasingly used for prototyping and data analysis. However, due to their dependence on complex ML libraries and the flexible notebook semantics that allow cells to be run in any order, notebooks are susceptible to software bugs that may lead to program crashes. This paper presents a comprehensive empirical study focusing on crashes in publicly available Python ML notebooks. We collect 64,031 notebooks containing 92,542 crashes from GitHub and Kaggle, and manually analyze a sample of 746 crashes across various aspects, including crash types and root causes. Our analysis identifies unique ML-specific crash types, such as tensor shape mismatches and dataset value errors that violate API constraints. Additionally, we highlight unique root causes tied to notebook semantics, including out-of-order execution and residual errors from previous cells, which have been largely overlooked in prior research. Furthermore, we identify the most error-prone ML libraries, and analyze crash distribution across ML pipeline stages. We find that over 40% of crashes stem from API misuse and notebook-specific issues. Crashes frequently occur when using ML libraries like TensorFlow/Keras and Torch. Additionally, over 70% of the crashes occur during data preparation, model training, and evaluation or prediction stages of the ML pipeline, while data visualization errors tend to be unique to ML notebooks.

Problem

Research questions and friction points this paper is trying to address.

Analyzing crash causes in Python ML Jupyter notebooks

Identifying ML-specific crash types like tensor mismatches

Examining notebook-specific issues like out-of-order execution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes 64,031 ML notebooks for crash patterns

Identifies ML-specific crashes like tensor mismatches

Links 40% crashes to API misuse issues

🔎 Similar Papers

Predicting the Understandability of Computational Notebooks through Code Metrics Analysis

2024-06-16Empirical Software EngineeringCitations: 0

💼 Related Jobs

Machine Learning Engineer