Scholar

Ziyi Guan

Google Scholar ID: TtFFJL8AAAAJ

The University of Hong Kong

Large Language Model (LLM)Efficient LLMsAI InfraQuantizationLLM Agent

Citations & Impact

All-time

Citations

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

11 items

Browse publications on Google Scholar (top-right) ↗

Resume (English only)

Academic Achievements

Selected publications include 'APTQ+: Attention-FFN-aware Post Quantization for Layerwise LLM Acclerator on FPGA' (submitted to IEEE TCAD), 'KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation' (EMNLP 2025 Main Conference), 'LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models' (DAC 2025 poster), 'A Highly Energy-Efficient Binary BERT Model on Group Vector Systolic CIM Accelerator' (DAC 2025 poster), 'APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models' (DAC 2024 Oral), 'An Isotropic Shift-Pointwise Network for Crossbar-Efficient Neural Network Design' (DATE 2024), 'A Video-based Fall Detection Network by Spatio-temporal Joint-point Model on Edge Devices' (DATE 2021).

Research Experience

AI Infra Researcher at ByteDance — Seed Infra, Heterogeneous Computing Group (since Oct 2025). Previously worked at Huawei Hong Kong Research Center (Nov 2024 – Sep 2025) on KG-RAG GUI Test Agents, enhancing multi-platform mobile app testing via retrieval-augmented reasoning.

Education

Ph.D. Candidate at The University of Hong Kong (HKU) (expected to graduate in Nov 2025), supervised by Dr. Ngai Wong and Prof. Graziano Chesi; Bachelor's degree from the School of Microelectronics at Southern University of Science and Technology (graduated in 2021), supervised by Prof. Hao Yu.

Background

Research interests include LLM compression & optimization, LLM agents, and hardware-efficient neural networks. Focuses on end-to-end acceleration for domestic AI chips, including KV-cache compression, post-training quantization/pruning/sparsity, training-time acceleration, and RL-friendly quantized inference.

Miscellany