Senior Applied Scientist, Agentic AI

Zillow Group
Remote / U.S. employees may live in any of the 50 United States, with limited exceptions2026-03-13Full time

About the job

Zillow is looking for a Senior Applied Scientist on the Applied Reasoning team within the Agentic AI organization to research and develop advanced LLM-powered reasoning systems specialized in the real estate domain. In this role, you'll take a hands-on approach to prototyping, evaluating, and deploying agents capable of deep, multi-step reasoning, accurate use of tools/skills, and contextual decision-making tailored to real estate use cases.

Responsibilities

Design, prototype, and build advanced agentic systems capable of highly autonomous, context-aware, and adaptive interactions across diverse real estate use cases

Apply test-time scaling and post-training techniques to develop agents that can reason, collaborate, compete, or negotiate in dynamic, goal-driven environments to fulfill user needs.

Define and refine evaluation and experimentation processes for LLM-driven applications.

Stay at the forefront of agentic AI research and innovation, bringing emerging techniques into practical application to shape product direction.

Contribute to the broader scientific community through publications, conference presentations, and internal knowledge sharing

Mentor and guide junior scientists and engineers, promoting best practices in applied research, scalable agentic architecture, and responsible AI development.

Qualifications

Minimum

A Ph.D. or equivalent experience in Computer Science or a related field, with a focus on LLM, Agentic System, or Machine Learning

2+ years of experience in building large-scale, high-impact ML solutions, particularly in areas such as NLP, agent-based systems, or multi-agent collaboration or similar paradigms

Experience in developing and working with LLM reasoning models or AI agents capable of multi-step reasoning and context-rich decision-making

Strong background in rigorous experimental design and evaluation, including benchmark creation, ablation studies, qualitative and quantitative analysis, and principled measurement of reasoning quality, generalization, and safety

Preferred

Track record of publishing high-impact research in top AI/ML venues