Bitboard version of Tetris AI

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiencies of traditional Tetris simulations—such as suboptimal state representations, inadequate evaluation strategies, and high training overhead—that hinder large-scale reinforcement learning research. To overcome these limitations, the authors propose an efficient Tetris AI framework that restructures core game logic using bitboards and bitwise operations for accelerated computation, introduces a lightweight afterstate-based policy network, and refines the sampling and update mechanisms of the Proximal Policy Optimization (PPO) algorithm. Experimental results demonstrate that the proposed approach achieves an average score of 3,829 within three minutes on a 10×10 grid, yielding a 53-fold speedup over the OpenAI Gym-Tetris implementation while substantially improving sample efficiency and computational lightness.
📝 Abstract
The efficiency of game engines and policy optimization algorithms is crucial for training reinforcement learning (RL) agents in complex sequential decision-making tasks, such as Tetris. Existing Tetris implementations suffer from low simulation speeds, suboptimal state evaluation, and inefficient training paradigms, limiting their utility for large-scale RL research. To address these limitations, this paper proposes a high-performance Tetris AI framework based on bitboard optimization and improved RL algorithms. First, we redesign the Tetris game board and tetrominoes using bitboard representations, leveraging bitwise operations to accelerate core processes (e.g., collision detection, line clearing, and Dellacherie-Thiery Features extraction) and achieve a 53-fold speedup compared to OpenAI Gym-Tetris. Second, we introduce an afterstate-evaluating actor network that simplifies state value estimation by leveraging Tetris afterstate property, outperforming traditional action-value networks with fewer parameters. Third, we propose a buffer-optimized Proximal Policy Optimization (PPO) algorithm that balances sampling and update efficiency, achieving an average score of 3,829 on 10x10 grids within 3 minutes. Additionally, we develop a Python-Java interface compliant with the OpenAI Gym standard, enabling seamless integration with modern RL frameworks. Experimental results demonstrate that our framework enhances Tetris's utility as an RL benchmark by bridging low-level bitboard optimizations with high-level AI strategies, providing a sample-efficient and computationally lightweight solution for scalable sequential decision-making research.
Problem

Research questions and friction points this paper is trying to address.

Tetris
reinforcement learning
simulation speed
state evaluation
training efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

bitboard
afterstate
Proximal Policy Optimization
Tetris AI
reinforcement learning
🔎 Similar Papers
No similar papers found.
X
Xingguo Chen
School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, Jiangsu, 210023, China
P
Pingshou Xiong
Z
Zhenyu Luo
M
Mengfei Hu
X
Xinwen Li
Y
Yongzhou Lü
G
Guang Yang
C
Chao Li
Shangdong Yang
Shangdong Yang
Nanjing University of Posts and Telecommunications
Reinforcement LearningMulti-agent SystemsMulti-armed Bandits