Published multiple papers on topics such as programmatic VLM evaluation, language model faithfulness, contextually faithful LLMs, high-fidelity text-to-video synthesis, open large multimodal models, zero-shot personalized image generation, diffusion model alignment, 3D generation from a single image, challenges of continuous self-supervised learning, effectiveness of pre-trained vision models for control, functional correspondence problem, audio-visual floorplan reconstruction, demystifying contrastive self-supervised learning, aligning videos in space and time, and task-driven modular networks for zero-shot compositional learning.
Research Experience
Member of Technical Staff at OpenAI; has extensive research experience in Computer Vision, Robotics, and NLP.
Education
Ph.D. in Robotics from Carnegie Mellon University, advised by Prof. Abhinav Gupta.
Background
Research interests include Computer Vision and NLP. Recent research focuses on large multimodal models, improving the fidelity and controllability of visual generation models, enhancing NLP models for RAG systems, minimizing hallucinations in LLMs and multimodal models, and creating tailored language and multimodal solutions for enterprise applications.
Miscellany
Personal interests and other information not provided