About the job
Join the elite team behind AWS Neuron—the software stack powering AWS's next-generation AI accelerators Inferentia and Trainium. As a Senior Software Engineer in our Machine Learning Applications team, you'll be at the forefront of deploying and optimizing some of the world's most sophisticated AI models at unprecedented scale.
Responsibilities
Pioneer distributed inference solutions for industry-leading LLMs such as GPT, Llama, Qwen
Optimize breakthrough language and vision generative AI models
Collaborate directly with silicon architects and compiler teams to push the boundaries of AI acceleration
Drive performance benchmarking and tuning that directly impacts millions of inference calls globally
Spearhead distributed inference architecture for PyTorch and JAX using XLA
Engineer breakthrough performance optimizations for AWS Trainium and Inferentia
Develop ML tools to enhance LLM accuracy and efficiency
Transform complex tensor operations into highly optimized hardware implementations
Pioneer benchmarking methodologies that shape next-gen AI accelerator design
Qualifications
Minimum
Deep expertise in Python and ML framework internals
Strong understanding of distributed systems and ML optimization
Passion for performance tuning and system architecture
Preferred
Master's degree in computer science or equivalent
Master's degree in machine learning or equivalent
Experience with accuracy debugging & tooling, performance benchmarking of AI accelerators
Experience in developing CUDA kernels, HPC and inference optimization, tensors operations