Ziyi Guan
Scholar

Ziyi Guan

Google Scholar ID: TtFFJL8AAAAJ
The University of Hong Kong
Large Language Model (LLM)Efficient LLMsAI InfraQuantizationLLM Agent
Citations & Impact
All-time
Citations
78
 
H-index
5
 
i10-index
2
 
Publications
11
 
Co-authors
11
list available
Publications
11 items
Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
  • Selected publications include 'APTQ+: Attention-FFN-aware Post Quantization for Layerwise LLM Acclerator on FPGA' (submitted to IEEE TCAD), 'KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation' (EMNLP 2025 Main Conference), 'LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models' (DAC 2025 poster), 'A Highly Energy-Efficient Binary BERT Model on Group Vector Systolic CIM Accelerator' (DAC 2025 poster), 'APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models' (DAC 2024 Oral), 'An Isotropic Shift-Pointwise Network for Crossbar-Efficient Neural Network Design' (DATE 2024), 'A Video-based Fall Detection Network by Spatio-temporal Joint-point Model on Edge Devices' (DATE 2021).
Research Experience
  • AI Infra Researcher at ByteDance — Seed Infra, Heterogeneous Computing Group (since Oct 2025). Previously worked at Huawei Hong Kong Research Center (Nov 2024 – Sep 2025) on KG-RAG GUI Test Agents, enhancing multi-platform mobile app testing via retrieval-augmented reasoning.
Education
  • Ph.D. Candidate at The University of Hong Kong (HKU) (expected to graduate in Nov 2025), supervised by Dr. Ngai Wong and Prof. Graziano Chesi; Bachelor's degree from the School of Microelectronics at Southern University of Science and Technology (graduated in 2021), supervised by Prof. Hao Yu.
Background
  • Research interests include LLM compression & optimization, LLM agents, and hardware-efficient neural networks. Focuses on end-to-end acceleration for domestic AI chips, including KV-cache compression, post-training quantization/pruning/sparsity, training-time acceleration, and RL-friendly quantized inference.
Miscellany
  • You can contact me via Email or WeChat: Wx555328778.