🤖 AI Summary
Recognizing bases and exponents jointly in mathematical expression images under realistic degradations—such as noise, font scaling, and blur—remains challenging for conventional serial or single-task models.
Method: This paper proposes a lightweight multi-output convolutional neural network (CNN) that performs end-to-end simultaneous prediction of bases and exponents. It introduces a novel single-model dual-branch architecture: one branch jointly regresses bounding boxes and classifies base symbols, while the other does the same for exponents. To enhance generalization, synthetic data augmentation—including additive noise, multi-scale rendering, and Gaussian blur—is systematically applied.
Contribution/Results: Evaluated on 10,900 degraded images, the method achieves high accuracy while significantly reducing model complexity and training resource consumption. It demonstrates superior robustness to common imaging degradations and strong practicality for real-world deployment in mathematical OCR systems.
📝 Abstract
The use of neural networks and deep learning techniques in image processing has significantly advanced the field, enabling highly accurate recognition results. However, achieving high recognition rates often necessitates complex network models, which can be challenging to train and require substantial computational resources. This research presents a simplified yet effective approach to predicting both the base and exponent from images of mathematical expressions using a multi-output Convolutional Neural Network (CNN). The model is trained on 10,900 synthetically generated images containing exponent expressions, incorporating random noise, font size variations, and blur intensity to simulate real-world conditions. The proposed CNN model demonstrates robust performance with efficient training time. The experimental results indicate that the model achieves high accuracy in predicting the base and exponent values, proving the efficacy of this approach in handling noisy and varied input images.