✨ AI Insights & Summary
Railway is building the future of software development infrastructure, empowering engineers to focus on creation rather than configuration. This Senior Infra Engineer role is a rare opportunity to shape the very core of their platform, focusing on the critical domain of Observability. If you thrive on tackling complex distributed systems, have a passion for making things just *work* reliably at scale, and want to contribute to a globally distributed, high-ownership team, this is your chance to leave a significant mark on a company that values deep technical expertise and impactful solutions.
Senior Infra Engineer: Observability
About the Role
Our core mission at Railway is to make software engineers more effective. We believe in providing powerful tools that allow engineers to spend less time on setup and more time on doing. While many infrastructure platforms focus solely on deploying individual applications, Railway aims to be an all-encompassing solution for how these applications function in concert. Questions about zero-downtime deployments and service-to-service communications are handled by our platform, allowing engineers to focus on their core tasks. We take special care in defining our networking infrastructure, ensuring that "things just work and are self-managing."
For this role, you will:
- Build and operate ingestion pipelines to consume over 1 million requests per second (RPS) streams of logs, metrics, and other telemetry.
- Develop scalable, fault-tolerant alerting engines for real-time user notifications of threshold breaches.
- Craft rich backend observability APIs, collaborating with product teams to build amazing experiences for instantly understanding application performance.
- Provide APIs to access real-time log/metrics streams for consumption by Dashboard and Product Teams.
- Extend and build Golang/Rust gRPC services capable of supporting millions of users and tens of millions to come.
- Define infrastructure that can be torn down, failed over, and reconstituted from scratch using the principles of immutable infrastructure with Terraform and Ansible.
- Write Engineering Requirement Documents (ERDs) to take an idea from conception to implementation and monitoring of its success.
- Interface with our TypeScript and GraphQL edge to expose your microservice APIs for both internal and potential external consumption.
Today, this role leans heavily operational, focusing on keeping our existing observability stack fast, reliable, and scaling under load, with focused stretches of service-building work as we extend the platform. This is a high-impact, high-agency role with a direct effect on company culture, trajectory, and outcomes.
About You
- Strong understanding of distributed systems. You enjoy building and operating fault-tolerant, resilient, and scalable services.
- Experience operating and extending Observability stacks (e.g., ClickHouse, VictoriaMetrics) at scale.
- A solid intuition about the longevity of your solutions. All systems age; in startups, we aim for 2-3 orders of magnitude improvement or a 12-18 month lifespan. You possess the tact to implement your solution, create monitors for its error boundaries, and document requirements for when you're not around.
- A great sense of direction and prioritization when dealing with the ambiguity of an early-stage startup.
- Grit to dive into a problem, implement a solution, scale it, and replace it when necessary.
- Excellent communication skills for getting your point across, ensuring solutions are implemented, and beyond.
We value and love to work with diverse individuals from all backgrounds.
Things to Know
For better or worse, we're a startup. Our team dynamics differ from companies of other sizes and stages. We're globally distributed and growing more so. Things are always happening somewhere. We don't expect you to be online all the time, but you'll need to be diligent about your boundaries – your end of day will overlap with someone else's start.
We are a small, high-ownership team that cares deeply about doing exceptional work. We're scaling quickly, which means we rely on leverage – systems over coordination, judgment over process. Expect ambiguity and a fast-moving environment. You'll own real outcomes, meaning you'll make decisions, not just execute, and own the success or failure that comes with them.
Benefits and Perks
At Railway, we provide best-in-class benefits, including a great salary, full health benefits for dependents, strong equity grants, an equipment stipend, and much more. For more details, please refer to our main careers page.
Beyond compensation, here are a few things that make working at Railway unique:
- Autonomy: We have very few meetings, just a Monday and Friday sync for the Company Board. We believe your time is sacred, both at work and outside of it.
- Ownership: We foster a high-ownership, high-autonomy culture. We hope you'll join us, contribute significantly, and do the best work of your life over many years. When we bring you onboard, we expect you to help change the company.
- Novel Problems/Solutions: As a well-funded startup, we tackle cool problems that allow us to implement novel solutions. We abhor "busywork" and believe there are always opportunities for creative and high-leverage solutions in community, engineering, operations, etc.
- Growth: We want you to grow with us, but we know talent is temporary. When you identify your next growth area, whether at Railway or elsewhere, we'll ensure you land there.
How We Hire
No tricks, no surprises. Here's our entire process:
- Talk with us about the role: This is completely open-ended; we aim to understand who you are, what you want to do, and where you want to go.
- Work on a small project to discuss in the interview: Asynchronously implement the following: Imagine a theoretical or actual system like Railway that can manage stateless and stateful compute workloads. Design the engine for managing observability.
- Interview Structure (60 Minutes):
- Pre-work (before your interview): Complete your solution (advised).
- 0-5m: Introduction.
- 5-50m: Building (or expanding) your solution.
- 50-60m: Questions about Railway/Tech/etc. You can and SHOULD ask us questions ahead of time.
- Review your solution with the Team: You'll meet with a team member to go over your solution. We'll explore your problem-solving approach and introduce you to two more team members.
- Looking for: Your problem-solving skills, how you break down a problem, and how you present a solution.
- Meet the Team: You'll meet four people from vastly different sections of the company.
- Looking for: How you work with the rest of the team and communicate.
- Chat with CEO: A 30-minute, 1:1, open-ended conversation with our founder and CEO.
- Offer call: We'll present the offer, finalize details about your position, set up onboarding, and begin our journey together.
Final Note: The interview goes both ways. Please ask us many questions, even the hard ones. That's what we're here for.
Apply
[Apply for this position]