🤖 AI Summary
Tire strategy optimization in Formula 1 (F1) suffers from poor interpretability and limited generalizability across circuits, hindering trust and adoption by race engineers.
Method: We propose RSRL—an interpretable reinforcement learning framework that integrates feature importance analysis, decision-tree surrogate modeling, and counterfactual reasoning to jointly optimize tire selection and pit-stop timing—without modifying vehicle hardware.
Contribution/Results: RSRL enhances strategy transparency and engineer interpretability compared to black-box RL and heuristic baselines. In a high-fidelity simulation of the 2023 Bahrain Grand Prix, RSRL achieves an average finishing position of P5.33—outperforming the best baseline (P5.63) by +0.3 positions—demonstrating both efficacy and practical utility. To our knowledge, this is the first interpretable RL paradigm explicitly designed for real-world F1 operations, establishing a novel pathway for deploying trustworthy AI policies in safety-critical, high-stakes decision-making domains.
📝 Abstract
In Formula One, teams compete to develop their cars and achieve the highest possible finishing position in each race. During a race, however, teams are unable to alter the car, so they must improve their cars' finishing positions via race strategy, i.e. optimising their selection of which tyre compounds to put on the car and when to do so. In this work, we introduce a reinforcement learning model, RSRL (Race Strategy Reinforcement Learning), to control race strategies in simulations, offering a faster alternative to the industry standard of hard-coded and Monte Carlo-based race strategies. Controlling cars with a pace equating to an expected finishing position of P5.5 (where P1 represents first place and P20 is last place), RSRL achieves an average finishing position of P5.33 on our test race, the 2023 Bahrain Grand Prix, outperforming the best baseline of P5.63. We then demonstrate, in a generalisability study, how performance for one track or multiple tracks can be prioritised via training. Further, we supplement model predictions with feature importance, decision tree-based surrogate models, and decision tree counterfactuals towards improving user trust in the model. Finally, we provide illustrations which exemplify our approach in real-world situations, drawing parallels between simulations and reality.