Discovering Software Parallelization Points Using Deep Neural Networks

šŸ“… 2025-09-05
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
This paper addresses the challenge of accurately identifying parallelization opportunities in complex loops using static analysis. To tackle this, we propose a deep learning–based code parallelism prediction framework. Methodologically, we design a genetic algorithm to automatically generate diverse loop code samples—covering both clearly parallelizable cases and those with ambiguous data dependencies—and construct a manually annotated training dataset. We then employ both deep neural networks (DNNs) and convolutional neural networks (CNNs) to model and classify tokenized code sequences. Experimental results show that CNNs achieve marginally higher average accuracy, while both models demonstrate robust performance. Our key contributions are threefold: (1) the first integration of generative genetic algorithms with deep learning for parallelism prediction; (2) effective mitigation of data scarcity and ambiguity in dependency analysis; and (3) empirical validation that training data diversity critically enhances model generalization—establishing a novel paradigm for automated parallel optimization.

Technology Category

Application Category

šŸ“ Abstract
This study proposes a deep learning-based approach for discovering loops in programming code according to their potential for parallelization. Two genetic algorithm-based code generators were developed to produce two distinct types of code: (i) independent loops, which are parallelizable, and (ii) ambiguous loops, whose dependencies are unclear, making them impossible to define if the loop is parallelizable or not. The generated code snippets were tokenized and preprocessed to ensure a robust dataset. Two deep learning models - a Deep Neural Network (DNN) and a Convolutional Neural Network (CNN) - were implemented to perform the classification. Based on 30 independent runs, a robust statistical analysis was employed to verify the expected performance of both models, DNN and CNN. The CNN showed a slightly higher mean performance, but the two models had a similar variability. Experiments with varying dataset sizes highlighted the importance of data diversity for model performance. These results demonstrate the feasibility of using deep learning to automate the identification of parallelizable structures in code, offering a promising tool for software optimization and performance improvement.
Problem

Research questions and friction points this paper is trying to address.

Identifying parallelizable loops in code using deep learning
Classifying ambiguous loops with unclear dependencies automatically
Automating software optimization through neural network-based analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning classifies code loops for parallelization
Genetic algorithms generate parallelizable and ambiguous loops
Convolutional Neural Network slightly outperforms Deep Neural Network
šŸ”Ž Similar Papers
No similar papers found.
I
Izavan dos S. Correia
Federal Rural University of Pernambuco, Recife, Brazil
H
Henrique C. T. Santos
Federal Institute of Pernambuco, Recife, Brazil
Tiago A. E. Ferreira
Tiago A. E. Ferreira
Full Professor of Statistical and Informatics Department - Federal Rural University of Pernambuco
intelligent computationTime Series Analysis and ForecastingQuantum ComputationComputational