Is the Staff Research Engineer, Post-training & Evaluation position remote, hybrid, or on-site?

The Staff Research Engineer, Post-training & Evaluation role at Reddit is a remote opportunity. The location specified by the employer is Remote - United States.

What is the salary range for the Staff Research Engineer, Post-training & Evaluation role at Reddit?

The salary for Staff Research Engineer, Post-training & Evaluation at Reddit is not explicitly stated, but is competitive and based on experience.

How do I apply for the Staff Research Engineer, Post-training & Evaluation position?

You can apply directly by visiting the dynamic application link on FutureTalent at: https://www.futuretalent.online/jobs/8774-staff-research-engineer-post-training-and-evaluation-reddit.

✨ AI Insights & Summary

Join Reddit's groundbreaking AI Engineering team and shape the future of large language models. This is a unique opportunity to define and own the core evaluation science for Reddit-native LLMs, establishing the internal benchmark that drives model quality and alignment. If you are a seasoned ML researcher with deep expertise in LLM evaluation and a passion for building robust, scalable AI systems, this role offers the chance to make a significant impact on a platform used by millions daily.

Staff Research Engineer, Post-Training & Evaluation Science

Reddit is a vibrant community of communities, built on shared interests, passion, and trust. As a globally growing company, Reddit offers remote-friendly opportunities within the United States, with optional office access in San Francisco, Los Angeles, New York City, and Chicago.

About the Role

The AI Engineering team at Reddit is at the forefront of building proprietary, Reddit-native foundational Large Language Models (LLMs). This role is crucial for defining the "feedback loop" in our model development, focusing on how we measure and improve model safety, intelligence, and cultural relevance. You will be instrumental in establishing the "Reddit Benchmark," our internal standard for rigorous model quality, and leading the evaluation science that underpins all subsequent iterations.

Responsibilities

Define the "Reddit Benchmark" evaluation standard: Own the methodology for measuring model quality across Safety, Reasoning, representation/retrieval, and Reddit-specific knowledge. Set the standard for what "Reddit-native" means in measurable terms.
Own evaluation reliability and statistical rigor: Establish the scientific basis for trustworthy evaluations, addressing variance, multi-sample scoring, inter-rater agreement, sampling effects, and automated judge calibration. Drive evaluation as a release gate.
Design model-as-a-judge methodology: Own judge selection, prompt design, calibration, and reliability for automated evaluations using frontier models.
Set post-training recipes and strategy: Design SFT recipes to convert base models into helpful, well-aligned endpoints and partner with engineering for scaling.
Evaluate base and CPT checkpoints: Design checkpoint-selection methodology to identify the optimal base models before committing significant compute.
Drive synthetic data generation strategy: Define and curate high-quality instruction and evaluation sets to improve generalization where human data is scarce.
Partner with Safety Engineering: Translate safety policy into concrete classification metrics, probe sets, and CI/CD unit tests.
Diagnose post-training instability: Analyze loss curves and eval logs to identify alignment tax and capability degradation, recommending fixes.
Lead research direction: Set technical direction for evaluation and post-training, mentor team members, and represent the work internally and externally.

Required Qualifications

6+ years of professional ML experience (or PhD + 4+) with a direct focus on LLM post-training and evaluation.
PhD or MS in CS, ML, NLP, IR, or a related quantitative field, or equivalent industry research experience.
Deep expertise in evaluation reliability: judge/sample variance, multi-sample scoring, calibration, statistical significance, and automated evaluation failure modes.
Strong experience building custom, domain-specific evaluation harnesses (e.g., lm-eval-harness, Inspect AI, LightEval), understanding the strengths and limitations of standard benchmarks.
Experience evaluating both generation and representation/classification, including model-as-a-judge for generative quality and metrics like precision/recall, PR-AUC, retrieval metrics, and label-noise handling.
Deep understanding of Continuous Pre-training (CPT), Instruction Tuning (SFT), and the impact of data quality on model behavior.
Fluency in Python; strong data-pipeline and eval-harness engineering experience (e.g., Hugging Face Transformers, vLLM, lm-eval-harness). Working knowledge of PyTorch and distributed training (FSDP2, DeepSpeed ZeRO-3).

Nice to Have

Experience with MLflow or similar experiment-tracking frameworks.
Familiarity with modern fine-tuning frameworks (Axolotl, TorchTune) and PyTorch-native training stacks (TorchTitan).
Synthetic data generation techniques (e.g., Self-Instruct).
Experience with preference optimization (DPO, RLHF, RLAIF, GRPO).
Publications in NLP/ML/FAccT or related venues, or other evidence of research leadership.
Experience evaluating multimodal models.

Benefits

Comprehensive Healthcare Benefits and Income Replacement Programs
401k with Employer Match
Global Benefit programs tailored to lifestyle needs (workspace, professional development, caregiving support)
Family Planning Support
Gender-Affirming Care
Mental Health & Coaching Benefits
Flexible Vacation & Paid Volunteer Time Off
Generous Paid Parental Leave

Staff Research Engineer, Post-training & Evaluation

✨ AI Insights & Summary

Staff Research Engineer, Post-Training & Evaluation Science

About the Role

Responsibilities

Required Qualifications

Nice to Have

Benefits

Apply Now

Share Job

Job Overview

FAQ

Is this position remote?

What is the salary?

How do I apply?

Similar Opportunities

AI Engineer

Full Stack Senior Developer (AI‑Native)

Computer Vision & AI Engineer - N3XT Interceptor C‑UAS (m/f/d)