β¨ AI Insights & Summary
This Senior Engineer role at Stack presents a unique chance to shape the future of autonomous systems in the trucking industry. You'll be instrumental in developing and optimizing Stack's cutting-edge AI inference platform, working with advanced technologies like GPUs, ML frameworks, and high-performance networking. If you thrive on tackling complex backend distributed systems challenges and mentoring engineers, this position offers significant impact and growth opportunities in a rapidly evolving field.
Senior Engineer, Inference Platform
About Stack
Stack is at the forefront of developing revolutionary AI and advanced autonomous systems to enhance safety, reliability, and efficiency in modern operations. Our technology integrates artificial intelligence, robotics, machine learning, and cloud technologies to create innovative solutions tailored for the dynamic trucking transportation industry. With extensive experience in deploying real-world systems, the Stack team is dedicated to building an autonomous solution ecosystem specifically for trucking.
About the Role
As a Senior Engineer, you will own critical subsystems within Stack AV's inference platform, guiding them from initial design through to production. You will be the key technical expert for areas such as model onboarding, serving APIs, metering, observability, performance optimization, and tenant isolation. This role demands strong hands-on implementation skills, proficiency in production debugging, thoughtful system design, and the ability to mentor other engineers while ensuring timely delivery.
Responsibilities
- Own the technical design and delivery of subsystems within a high-throughput, low-latency inference platform designed for multi-tenant, enterprise-grade workloads.
- Develop robust API layers (gRPC, WebSockets, REST, etc.) and developer SDKs to abstract complex distributed inference orchestration into seamless, reliable token streams.
- Build and harden a multi-tenant control plane for accurate metering, rate limiting, quotas, tenant isolation, and fair resource allocation across the platform.
- Optimize inference performance across the entire system stack, including the model engine layer.
- Implement observability and define SLOs to gain insights into system economics, cache-hit rates, GPU utilization, and cost accounting per model and tenant.
- Collaborate with product and infrastructure teams on model onboarding, capacity planning, external API contracts, and customer adoption.
- Decompose ambiguous tasks, drive issues to resolution, and elevate engineering standards through code quality, reviews, testing, and mentorship.
Qualifications
- Education: Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- Experience: 4+ years of experience building and operating backend distributed systems end-to-end.
- Strong Data & ML Systems Fundamentals: Expertise in data-intensive distributed systems, concurrency, networking, and performance profiling.
- Inference Services Experience: Hands-on experience with large-scale inference services on GPUs, including KV caches, prefill/decode stages, and throughput/latency trade-offs.
- Inference Engines/Serving Frameworks: Direct experience with inference engines (e.g., TensorRT, vLLM) or serving frameworks (e.g., Dynamo, Triton).
- Technical Skills:
- Proficiency in C++, Go, Rust, or Python.
- Familiarity with deep learning frameworks (e.g., PyTorch) and model parallelism.
- Knowledge of GPU computing primitives like CUDA, NCCL, NVLink, and hardware-specific optimizations.
- Practical understanding of high-performance networking architectures (InfiniBand, RoCE, low-latency cluster communication).
- Problem-Solving: Strong analytical and problem-solving capabilities.
- Bonus: Experience with autonomous vehicles (AV) is a plus.
We are proud to be an equal opportunity workplace. We believe that diverse teams produce the best ideas and outcomes. We are committed to building a culture of inclusion, entrepreneurship, and innovation across gender, race, age, sexual orientation, religion, disability, and identity.