✨ AI Insights & Summary
This is a critical role within Reddit's Machine Learning Platform team, focusing on building and scaling the infrastructure that powers core ML functionalities like recommendations and content discovery. As a Senior ML Infrastructure Engineer, you'll design end-to-end MLOps pipelines, develop a graph ML platform, and optimize large-scale distributed training environments. The role requires deep technical expertise in ML infrastructure, cloud technologies (GCP), MLOps tools, and distributed systems, coupled with strong organizational and communication skills. This is a high-impact position for an experienced engineer looking to shape the future of ML at one of the internet's largest communities, directly influencing key product areas.
Senior ML Infrastructure Engineer
About Reddit
Reddit is a community of communities, built on shared interests, passion, and trust. It hosts the most open and authentic conversations online, with users submitting, voting, and commenting on topics they care about daily. With over 100,000 active communities and approximately 126 million daily active unique visitors, Reddit is a significant source of information online. Learn more at www.redditinc.com.
Who We Are: The Machine Learning Platform Team
The Machine Learning Platform team at Reddit is a high-impact group responsible for the infrastructure powering recommendations, content discovery, and user/content quantification. We directly impact teams such as Growth, Ads, Feeds, and Core Machine Learning.
What You’ll Do
As a Senior ML Infrastructure Engineer, you will lead the development of a platform for large-scale ML models at Reddit.
- MLOps Design: Design end-to-end model lifecycle patterns (MLOps) to accelerate ML engineer development, covering data preparation, model management, experiment tracking, and more.
- Graph ML Platform: Lead zero-to-one development and support of a graph ML codebase and platform that abstracts common patterns, enabling greater model scalability and iteration.
- Performance Tuning: Collaborate with ML engineers to optimize performance, including improving model training time, efficiency, and GPU training costs in a large, distributed ML training environment.
- Data Processing Optimization: Optimize batch data processing within a data warehouse using tools like Apache Beam, Apache Spark, and Ray Data.
- Graph Data Architecture: Architect pipelines to build and maintain massive graph data structures, potentially on the order of billions of nodes and tens of billions of edges.
Who You Might Be
- 5+ years of experience in ML infrastructure, including model training and deployments.
- Hands-on experience with ML optimization, including memory and GPU profiling.
- Deep experience with cloud-based technologies for ML platforms, including GCP BigQuery, Google Cloud Storage, and infrastructure-as-code (Terraform).
- Hands-on experience administering and integrating MLOps tools for experiment tracking, model serving, and model registries (e.g., MLflow, Wandb).
- Proficiency with common ML programming languages and frameworks like Python, PyTorch, and TensorFlow.
- Deep experience working with distributed training frameworks such as Ray and Kubernetes.
- A strong focus on scalability, reliability, performance, and ease of use. You are a dedicated advocate for platform users and possess a deep intuition for the machine learning development lifecycle.
- Strong organizational and communication skills.
- Experience with graph databases (Neo4j, JanusGraph, TigerGraph) is a significant plus.
- Experience with graph neural networks (GNNs) and associated graph ML frameworks (PyTorch Geometric, Deep Graph Library) is a significant plus.
Pay Transparency
This job posting may span more than one career level. In addition to base salary, this job is eligible to receive equity and potentially a commission. Reddit offers comprehensive benefits, including medical, dental, and vision insurance, a 401(k) with employer match, and generous time off.
The base salary range for this position in the US is $216,700 - $303,400 USD. Final offer amounts depend on various factors including skills, experience, and credentials.
Additional Information
In select roles and locations, interviews may be recorded, transcribed, and summarized by AI. Candidates will have the option to opt out prior to interviews. Personal information collected (including audio/video recordings) will be used solely for evaluating employment applications and will be deleted promptly after a hiring decision. Please refer to our Candidate Privacy Policy for more details.
Reddit is an equal opportunity employer committed to diversity and inclusion. Reasonable accommodations are provided for qualified individuals with disabilities. Please inform your recruiter if accommodation is needed during the interview process.