← Back to all jobs
22d 19h left to apply
R

Staff Research Engineer, Post-training & Evaluation

Reddit📍 Remote - United StatesEstimated: $80,000 - $120,000

✨ AI Insights & Summary

Join Reddit's groundbreaking AI Engineering team and shape the future of large language models. This is a unique opportunity to define and own the core evaluation science for Reddit-native LLMs, establishing the internal benchmark that drives model quality and alignment. If you are a seasoned ML researcher with deep expertise in LLM evaluation and a passion for building robust, scalable AI systems, this role offers the chance to make a significant impact on a platform used by millions daily.

Staff Research Engineer, Post-Training & Evaluation Science

Reddit is a vibrant community of communities, built on shared interests, passion, and trust. As a globally growing company, Reddit offers remote-friendly opportunities within the United States, with optional office access in San Francisco, Los Angeles, New York City, and Chicago.

About the Role

The AI Engineering team at Reddit is at the forefront of building proprietary, Reddit-native foundational Large Language Models (LLMs). This role is crucial for defining the "feedback loop" in our model development, focusing on how we measure and improve model safety, intelligence, and cultural relevance. You will be instrumental in establishing the "Reddit Benchmark," our internal standard for rigorous model quality, and leading the evaluation science that underpins all subsequent iterations.

Responsibilities

  • Define the "Reddit Benchmark" evaluation standard: Own the methodology for measuring model quality across Safety, Reasoning, representation/retrieval, and Reddit-specific knowledge. Set the standard for what "Reddit-native" means in measurable terms.
  • Own evaluation reliability and statistical rigor: Establish the scientific basis for trustworthy evaluations, addressing variance, multi-sample scoring, inter-rater agreement, sampling effects, and automated judge calibration. Drive evaluation as a release gate.
  • Design model-as-a-judge methodology: Own judge selection, prompt design, calibration, and reliability for automated evaluations using frontier models.
  • Set post-training recipes and strategy: Design SFT recipes to convert base models into helpful, well-aligned endpoints and partner with engineering for scaling.
  • Evaluate base and CPT checkpoints: Design checkpoint-selection methodology to identify the optimal base models before committing significant compute.
  • Drive synthetic data generation strategy: Define and curate high-quality instruction and evaluation sets to improve generalization where human data is scarce.
  • Partner with Safety Engineering: Translate safety policy into concrete classification metrics, probe sets, and CI/CD unit tests.
  • Diagnose post-training instability: Analyze loss curves and eval logs to identify alignment tax and capability degradation, recommending fixes.
  • Lead research direction: Set technical direction for evaluation and post-training, mentor team members, and represent the work internally and externally.

Required Qualifications

  • 6+ years of professional ML experience (or PhD + 4+) with a direct focus on LLM post-training and evaluation.
  • PhD or MS in CS, ML, NLP, IR, or a related quantitative field, or equivalent industry research experience.
  • Deep expertise in evaluation reliability: judge/sample variance, multi-sample scoring, calibration, statistical significance, and automated evaluation failure modes.
  • Strong experience building custom, domain-specific evaluation harnesses (e.g., lm-eval-harness, Inspect AI, LightEval), understanding the strengths and limitations of standard benchmarks.
  • Experience evaluating both generation and representation/classification, including model-as-a-judge for generative quality and metrics like precision/recall, PR-AUC, retrieval metrics, and label-noise handling.
  • Deep understanding of Continuous Pre-training (CPT), Instruction Tuning (SFT), and the impact of data quality on model behavior.
  • Fluency in Python; strong data-pipeline and eval-harness engineering experience (e.g., Hugging Face Transformers, vLLM, lm-eval-harness). Working knowledge of PyTorch and distributed training (FSDP2, DeepSpeed ZeRO-3).

Nice to Have

  • Experience with MLflow or similar experiment-tracking frameworks.
  • Familiarity with modern fine-tuning frameworks (Axolotl, TorchTune) and PyTorch-native training stacks (TorchTitan).
  • Synthetic data generation techniques (e.g., Self-Instruct).
  • Experience with preference optimization (DPO, RLHF, RLAIF, GRPO).
  • Publications in NLP/ML/FAccT or related venues, or other evidence of research leadership.
  • Experience evaluating multimodal models.

Benefits

  • Comprehensive Healthcare Benefits and Income Replacement Programs
  • 401k with Employer Match
  • Global Benefit programs tailored to lifestyle needs (workspace, professional development, caregiving support)
  • Family Planning Support
  • Gender-Affirming Care
  • Mental Health & Coaching Benefits
  • Flexible Vacation & Paid Volunteer Time Off
  • Generous Paid Parental Leave

Apply Now

This job is active but will expire soon. Click below to apply on the company's website.

Apply for this role ↗

Share Job

Know someone who would be a perfect fit? Share this opportunity.

Job Overview

Posted6/12/2026
CategoryAI & Machine Learning
SourceGreenhouse

FAQ

Is this position remote?

The Staff Research Engineer, Post-training & Evaluation role is a remote opportunity. The location specified is Remote - United States.

What is the salary?

The salary is not explicitly stated, but is competitive and based on experience.

How do I apply?

You can apply by clicking the "Apply for this role" button above to submit your application on the hiring website.

Similar Opportunities

S

AI Engineer

SAP FioneerRemote Worldwide🏠 Remote
Competitive
AI & Machine Learning
View Job →
T

Full Stack Senior Developer (AI‑Native)

TalanRemote Worldwide🏠 Remote
Competitive
AI & Machine Learning
View Job →
Q

Computer Vision & AI Engineer - N3XT Interceptor C‑UAS (m/f/d)

Quantum- Systems GmbHGilching🏠 Remote
Competitive
AI & Machine Learning
View Job →