← Back to all jobs
13d 2h left to apply
G

Senior Manager System Engineering

GoDaddy🌍 Remote WorldwideEstimated: $80,000 - $120,000

JOB 8

About

At GoDaddy, the future of work looks different for each team. Some teams work in the office full-time; others have a hybrid arrangement (they work remotely some days and in the office some days), and some work entirely remotely.

This is a remote position, so you’ll be working remotely from your home. You may occasionally visit a GoDaddy office to meet with your team for events or meetings.

Join GoDaddy's Forge Ops team at the intersection of Data, Infrastructure, and AI-driven operations. As Senior Manager, Systems Engineering, you will lead the reliability, cost efficiency, and agentic operation of the Data & AI ecosystem that serves GoDaddy. This is a deeply technical leadership role, not a hands-off manager position. You will operate as GoDaddy’s L1/L2 authority over critical analytics and data platforms while advancing Forge Operations: a structured operating model designed to transition platform operations from hero-based, expert-dependent support to system-based, agent-assisted, self-improving operations. If you can translate a business problem into a technical architecture and that architecture into team execution — and you want to build the AI Ops pattern for a large-scale data organization, this role is for you.

Responsibilities

  • Own and operate GoDaddy’s analytical and data intelligence platforms (Redshift, QuickSight, FeedDB, Protegrity, Alation) as the authoritative L1/L2 platform owner — driving reliability, deployment standards, cost optimization, and user enablement across an ecosystem with a 50PB+ data lake and thousands of consumers.
  • Lead 24/7 incident management and production operations across 10+ Data & AI platforms, owning MTTR/MTTD targets, AAR rigor, and a root-cause-to-control loop that converts every incident into a runbook, monitoring improvement, or automation — not just a resolved ticket.
  • Architect and advance Forge Ops OS, the team’s agent-based operating model. This model uses history-informed early warning, auto-recovery agents, runbook intelligence, and bounded agentic orchestration. The team transitions from operating systems to leading all aspects of agents that operate systems.
  • Drive data platform cost efficiency through unit economics— cost per query, cost per workload, cost per dashboard visit — translating AWS spend into measurable business metrics and continuous optimization across Redshift, QuickSight, DPaaS, and ML infrastructure.
  • Manage operational planning and executive reporting weekly, monthly, and quarterly. Run a sprint-based improvement program with a near 70% strategic allocation. Provide clear traceability from team execution to company goals and landmark outcomes.

Requirements

  • 5+ years validated 24/7 production operations leadership— leading incident response end-to-end, owning MTTR performance, leading post-mortems (AARs) that produce controls, and driving the systemic fixes that reduce incident recurrence
  • Hands-on AWS architecture/platform expertise — Redshift, EMR/Airflow, Lambda, EKS, S3, IAM/RBAC, and CDK/CloudFormation — with end-to-end operational and cost ownership of at least two production data or analytics platforms.
  • Systems and software architecture fluency— able to translate business requirements into scalable technical designs, reason about architectural trade-offs, and decompose solutions into actionable engineering tasks without deferring all technical judgment to individual contributors.
  • Data platform operations at scale— ETL/ELT pipelines, data lakes, orchestration frameworks (Airflow, EMR), and BI tooling — with deep understanding of data quality, SLAs, lineage, and the dependency chains that connect producers to executive-facing consumers.
  • Technical team leadership with operational rigor— proven ability to lead engineers through sprint-based planning, capacity management, and cross-functional delivery, while maintaining the hands-on technical credibility to unblock, review, and elevate the team’s output.
  • Experience with AI/agentic operations — building or operating LLM-based tools such as automated runbooks, incident response agents, AAR generation systems, or bounded auto-recovery workflows.
  • Familiarity with graph databases or lineage/observability architectures (e.g., Neptune or equivalent) for dependency mapping, early warning, and blast-radius analysis in large data ecosystems.
  • Hands-on experience with Databricks or analytical compute platforms (Lakehouse, feature stores, ML infrastructure) in a production operations context.
  • Experience with data protection platforms (e.g., Protegrity) and PII/tokenization workflows in large-scale data lake or analytics environments.
  • Familiarity with ServiceNow/CMDB or equivalent incident management systems (Jira, PagerDuty) as operational systems of record — including MTTR/MTTD tracking and CI/lineage integration.

Apply Now

This job is active but will expire soon. Click below to apply on the company's website.

Apply for this role ↗

Share Job

Know someone who would be a perfect fit? Share this opportunity.

Job Overview

Posted6/3/2026
CategoryFullstack Development
SourceJobsCollider

FAQ

Is this position remote?

The Senior Manager System Engineering role is a hybrid opportunity. The location specified is Remote Worldwide.

What is the salary?

The salary is not explicitly stated, but is competitive and based on experience.

How do I apply?

You can apply by clicking the "Apply for this role" button above to submit your application on the hiring website.

Similar Opportunities

National Veterinary Associates

Veterinary Assistant

National Veterinary AssociatesUSA🏠 Remote
Competitive
Fullstack Development
View Job →
S

Werkstudent (m/w/d) Power BI & Power Apps

S01 Projektmanagement GmbHEschborn🏠 Remote
Competitive
Fullstack Development
View Job →
O

Litigation & Appraisal Adjuster (Remote, US)

OpenlyRemote Worldwide🔄 Hybrid
Competitive
Fullstack Development
View Job →