✨ AI Insights & Summary
Join the pioneering ML Observability team at Datadog as a Staff Engineer and help build the essential tools for monitoring, explaining, and optimizing AI systems, particularly those powered by LLMs and generative AI. This role is at the cutting edge of AI development, offering you the chance to shape product direction, lead experimentation, and apply your expertise in AI systems and software engineering to solve complex, open-ended problems. If you are passionate about making AI systems observable, understandable, and reliable in production, this is your opportunity to make a profound impact.
About the Role
The ML Observability team develops state-of-the-art tools for monitoring, explaining, and enhancing AI systems in production, with a specific focus on Large Language Models (LLMs) and generative AI. We provide robust, scalable observability for AI workloads, including drift detection, model evaluation, and behavior tracing, empowering customers to deploy AI with confidence. As a Staff Engineer, you will lead the development of new features and foundational capabilities within Datadog’s LLM Observability product. You will influence product direction, drive experimentation, and leverage your deep understanding of AI systems and software engineering to address complex challenges in the rapidly evolving AI landscape. Your contributions will directly impact how our customers monitor, troubleshoot, and optimize LLM-based applications in production.
Join us in building the foundational tools that make AI systems observable, understandable, and reliable in the real world.
What You’ll Do
- Drive the design and implementation of LLM observability features.
- Ideate, prototype, and scale new product features to provide insights and drive improvements for generative AI systems.
- Collaborate cross-functionally with engineering teams, product management, UX, and applied science to iterate quickly and achieve product-market fit.
- Develop and extend tools for tracing, evaluating, and debugging LLMs.
- Influence architectural decisions and mentor engineers to build resilient, high-performance systems.
- Stay attuned to customer pain points and use this feedback to guide product and engineering priorities.
- Keep up-to-date with industry trends and advancements in machine learning and observability, fostering innovation within the team.
Who You Are
- BS/MS/PhD in Computer Science, Engineering, or a related scientific field, or equivalent experience.
- Deep understanding of distributed systems and scalable backend architectures.
- Hands-on experience building and shipping LLM-powered or GenAI applications.
- Familiarity with model internals, inference pipelines, evaluation techniques, and prompt engineering.
- Ability to thrive in ambiguous, fast-changing environments with a product-oriented mindset.
- Eagerness to shape the next generation of AI observability tools from the ground up.
- Strong communication skills, rigorous thinking, and a commitment to clean, maintainable code.
- Experience with observability tools and platforms.
Benefits and Growth
- Build and use tools for software engineers, accelerating development.
- Significant influence on product direction and business impact.
- Work with skilled, knowledgeable, and kind teammates who are happy to teach and learn.
- Competitive global benefits.
- Continuous professional development.
Salary Range
$234,000 - $300,000 USD
Datadog operates as a hybrid workplace, valuing office culture while ensuring work-life harmony. We encourage applications from individuals of all backgrounds and experiences. Benefits and Growth details may vary by location. Please note that the salary range provided is an estimate and actual compensation will depend on various factors.