Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
Selected publications include 'APTQ+: Attention-FFN-aware Post Quantization for Layerwise LLM Acclerator on FPGA' (submitted to IEEE TCAD), 'KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation' (EMNLP 2025 Main Conference), 'LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models' (DAC 2025 poster), 'A Highly Energy-Efficient Binary BERT Model on Group Vector Systolic CIM Accelerator' (DAC 2025 poster), 'APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models' (DAC 2024 Oral), 'An Isotropic Shift-Pointwise Network for Crossbar-Efficient Neural Network Design' (DATE 2024), 'A Video-based Fall Detection Network by Spatio-temporal Joint-point Model on Edge Devices' (DATE 2021).
Research Experience
AI Infra Researcher at ByteDance — Seed Infra, Heterogeneous Computing Group (since Oct 2025). Previously worked at Huawei Hong Kong Research Center (Nov 2024 – Sep 2025) on KG-RAG GUI Test Agents, enhancing multi-platform mobile app testing via retrieval-augmented reasoning.
Education
Ph.D. Candidate at The University of Hong Kong (HKU) (expected to graduate in Nov 2025), supervised by Dr. Ngai Wong and Prof. Graziano Chesi; Bachelor's degree from the School of Microelectronics at Southern University of Science and Technology (graduated in 2021), supervised by Prof. Hao Yu.
Background
Research interests include LLM compression & optimization, LLM agents, and hardware-efficient neural networks. Focuses on end-to-end acceleration for domestic AI chips, including KV-cache compression, post-training quantization/pruning/sparsity, training-time acceleration, and RL-friendly quantized inference.
Miscellany
You can contact me via Email or WeChat: Wx555328778.