Trung Thanh Nguyen
Scholar

Trung Thanh Nguyen

Google Scholar ID: QSV452QAAAAJ
Nagoya University
Deep LearningComputer VisionMultimodal RecognitionFederated Learning
Citations & Impact
All-time
Citations
86
 
H-index
5
 
i10-index
2
 
Publications
15
 
Co-authors
6
list available
Publications
15 items
Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
  • Our paper “Hierarchical Global-Local Fusion for One-stage Open-vocabulary Temporal Action Detection” has been accepted to ACM TOMM (IF: 6.0) journal.
  • I was selected as a Rising Star for the Freiburg Rising Stars Academy, Universität Freiburg, Germany.
  • I was selected to present my PhD research at the Doctoral Symposium of ACM MMAsia, Malaysia.
  • Our paper “Q-Adapter: Visual Query Adapter for Extracting Textually-related Features in Video Captioning” has been accepted to ACM MMAsia, Malaysia.
  • Our paper “Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation” has been accepted to NeurIPS, United States.
  • I was awarded a research grant from Murata Foundation (est. 1970), Japan.
  • I was awarded a research grant from THERS (National University Corporation), Japan.
  • We presented 2 papers (IS3-038, IS3-148) at MIRU2025, Japan.
  • I received a Letter of Appreciation from RIKEN in recognition of outstanding research achievements.
  • I received a Certificate of Achievement from Academia Sinica, Taiwan.
Research Experience
  • Currently a student researcher at RIKEN National Science Institute, working on the Guardian Robot Project, focusing on open-world action detection and multi-view multi-modal action recognition. Also, in charge at the Center for Artificial Intelligence, Mathematical and Data Science, collaborating with Japanese corporations to develop practical AI solutions.
Education
  • PhD Candidate @ Nagoya University | Student Researcher @ RIKEN
Background
  • PhD candidate at Nagoya University, specializing in the Department of Intelligent Systems. Research focuses on vision-language models, multimodal recognition, and video captioning, with applications in solving real-world problems.