Maxana is seeking an experienced Infrastructure Engineer for a confidential client — a fast-growing AI company. In this role you will build and maintain the platform layer supporting large-scale ML training, inference, and deployment. This is a high-impact role at the intersection of cloud infrastructure and ML systems.
Key Responsibilities
-
Build and maintain infrastructure supporting large-scale ML training and inference workloads
-
Work with GPU and compute infrastructure, distributed systems, and cloud-native platforms
-
Improve reliability, observability, and performance across the platform layer
-
Collaborate directly with senior engineers and product teams on architecture decisions
-
Own production reliability — monitoring, incident response, and proactive risk reduction
-
Develop and maintain internal tooling and automation to support engineering operations
Requirements
-
5+ years of infrastructure or platform engineering experience in a production environment
-
Strong distributed systems background — experience with large-scale compute workloads preferred
-
Cloud-native infrastructure experience — AWS, GCP, or Azure; Docker and Kubernetes required
-
Familiarity with ML infrastructure a strong plus — training pipelines, inference serving, GPU workloads
-
Experience owning production reliability end to end
Benefits
-
Competitive base salary ($130,000-$240,000) + equity
-
Medical, dental, and vision
-
Flexible paid time off
-
Learning and development stipend
-
Working at the forefront of AI infrastructure at scale