Senior Infrastructure Engineer
Judgment Labs —San Francisco, CA
- Full-time
- Azure
- System design
- Office
Senior Software Engineer, Front-End Infrastructure
Discord —San Francisco, CA
- $196,000 - $220,500 a year
- Full-time
- Remote
- Office
- Senior level
- Relocation assistance
Senior Software Engineer, Infrastructure
Hayden AI —San Francisco, CA
- $175,579 - $228,253 a year
- Full-time
- Hybrid work
- Computer science
- 6 years
- Computer Science
- Health insurance
- Dental insurance
Senior Support Engineer
Karbon —San Francisco, CA
- $107,000 - $120,000 a year
- Remote
- Azure
- 5 years
- SQL
- Paid parental leave
- Dental insurance
Quick Apply
2d
Sr. Project Manager, FEED – On Prem Power
Prologis —San Francisco, CA
- $157,000 - $216,000 a year
- Full-time
- Bachelor of Science
- Office
- Analysis skills
- Paid holidays
- Health insurance
Senior Software Developer (AI)
Procom —Richmond, CA
- $75 - $100 an hour
- Contract
- Computer science
- Computer Science
- Office
Quick Apply
Manager Digital Workplace Engineering
Delta Dental —Oakland, CA
- $122,400 - $265,100 a year
- Hybrid work
- Background check
- Azure
- Maintenance
- Paid holidays
- Health insurance
Manager Digital Workplace Collaboration
Delta Dental —Oakland, CA
- $122,400 - $265,100 a year
- Full-time
- Hybrid work
- Background check
- Maintenance
- Research
- Paid holidays
- Health insurance
AI Solutions Architect (Senior)
Procom —Richmond, CA
- $78.75 - $105.00 an hour
- Contract
- Computer science
- Computer Science
- SQL
Quick Apply
4d
Senior Salesforce Technical Architect
Crowe —San Francisco, CA
- $108,700 - $222,200 a year
- Full-time
- Remote
- 5 years
- Writing skills
- Java
Quick Apply
Senior Engineering Manager, AI Developer Experience
Rippling —San Francisco, CA
- $207,000 - $362,250 a year
- Full-time
- Hybrid work
- Office
- Senior level
- 4 years
Senior Software Engineer, Infrastructure
Parafin —San Francisco, CA
- $230,000 - $265,000 a year
- Full-time
- Hybrid work
- System design
- AWS
- Mentoring
- Commuter assistance
- Paid parental leave
Senior IT Associate, Audio & Video
Discord —San Francisco Bay Area, CA
- $128,000 - $144,000 a year
- Full-time
- Hybrid work
- Google Workspace
- 5 years
- System design
- Relocation assistance
11d
Hyperbolic Labs - Senior GPU Infrastructure Engineer
deCircle —San Francisco, CA
- Full-time
- Remote
- DevOps
- Senior level
- Communication skills
Quick Apply
Software Engineer, Infrastructure
Middesk —San Francisco, CA
- $148,000 - $230,000 a year
- Full-time
- Hybrid work
- Office
- 3 years
- Senior level
Senior Developer Relations Engineer, AI for Builders
Google —San Francisco, CA
- $163,000 - $237,000 a year
- Full-time
- Computer science
- Computer Science
- 5 years
2d
Senior Engineering Manager, AI Platform
Rippling —San Francisco, CA
- $207,000 - $345,000 a year
- Full-time
- Hybrid work
- System design
- Office
- Mentoring
4d
Senior Software Engineer, Cloud Infrastructure
Altruist Corp —San Francisco, CA
- $200,000 - $250,000 a year
- On call
- Hybrid work
- 5 years
- Office
- Mentoring
- Paid parental leave
- Health insurance
Senior Developer Advocate
Vast.ai —San Francisco, CA
- $160,000 - $200,000 a year
- Full-time
- Office
- Senior level
- Communication skills
- Health insurance
- Dental insurance
Quick Apply
2d
Senior Salesforce Workflow Automation Developer
GLIDE —San Francisco, CA
- $110,000 - $120,000 a year
- Full-time
- System design
- Office
- Analysis skills

I want to receive the latest job alert for senior infrastructure developer in san francisco, ca

By signing in to your account, you agree to SimplyHired's Terms of Service and consent to our Cookie and Privacy Policy.

Explore jobs in more locations

senior infrastructure developer jobs near san francisco, ca

Senior Developer jobs in San Francisco, CA

Senior Software Engineer jobs in San Francisco, CA

Senior Salesforce Developer jobs in San Francisco, CA

Support Engineer jobs in San Francisco, CA

Audio Visual Engineer jobs in San Francisco Bay Area, CA

Infrastructure Engineer jobs in San Francisco, CA

Senior Engineering Manager jobs in San Francisco, CA

Senior Project Manager jobs in San Francisco, CA

Senior Developer jobs in Lake Oswego, OR

Entry Level No Experience Software Developer jobs in California

Senior Developer jobs in Tucker, GA

Senior Developer jobs in Temple Terrace, FL

Senior Developer jobs in Indore, WV

Senior Software Engineer jobs in Madison, WI

Cloverland Park Senior Living jobs in Brentwood, TN

Senior Software Engineer jobs in San Clemente, CA

Environmental Engineer jobs in Hawaii

Senior Software Engineer jobs in Melbourne, FL

Data Engineer jobs in Austin, TX

Senior Director Employee Relations jobs in Florida

Senior Infrastructure Engineer

Judgment Labs
San Francisco, CA

Apply Now

Job Details

Full-time

Qualifications

Customer communication
Programming languages
Technical writing
DevOps automation
Failure analysis

Full Job Description

Senior Cloud Infrastructure Engineer

San Francisco · On Site · Full Time

Judgment Labs is building the infrastructure for continual learning in long-horizon AI agents.

The next generation of agents will not improve from prompts alone. They will improve from experience: the tasks they attempt, the tools they use, the mistakes they make, the edge cases they encounter, and the outcomes they produce in production. The hard part is turning that raw experience into high-quality data that can actually improve the system.

Judgment builds the infrastructure to do that. We turn long agent trajectories into clean, structured data for evals, labeling, rubric generation, context engineering, and RL workflows. Instead of only showing teams what happened, Judgment helps decide what matters, what should be learned from, and how that learning should flow back into the agent.

Databricks built the data infrastructure for analytics. Judgment is building the learning infrastructure for agents.

We’ve raised $30M+ from Lightspeed, SV Angel, Valor Equity Partners, and others.

The Role

We’re looking for a Senior Cloud Infrastructure Engineer to own the infrastructure that lets Judgment run reliably across our cloud, customer environments, and enterprise deployments.

This role focuses on cloud/platform infrastructure: Terraform, EKS, ArgoCD/Kargo, IAM, DNS, observability, CI/CD, multi-region architecture, BYOC, self-hosted deployments, private connectivity, and enterprise-grade reliability.

You’ll work on the systems that keep high-throughput telemetry ingestion, ClickHouse, RabbitMQ, Temporal, evaluation workers, and customer-facing services running under real production load. You’ll also make Judgment deployable for customers with strict security and infrastructure requirements: multi-region, data residency, private networking, self-hosted, air-gapped, and Bring Your Own Cloud environments.

Interesting Technical Challenges

Enterprise-grade deployment architecture. Run Judgment in customer environments — self-hosted, air-gapped, or BYOC — while keeping operations, upgrades, observability, and reliability sane.
Multi-region reliability. Design failover, disaster recovery, data residency, and deployment patterns for customers that cannot tolerate downtime or ambiguous data movement.
Infrastructure for high-throughput telemetry. Support ingestion systems parsing and persisting hundreds of thousands of spans per second, with graceful backpressure and clear failure modes.
Operating stateful systems at scale. Keep ClickHouse, RabbitMQ, Temporal, evaluation workers, and supporting services healthy as workloads grow and customer traffic becomes spiky.
Private and secure connectivity. Build secure paths into customer environments using network isolation, IAM, SSO/SAML/SCIM, encryption, private connectivity, and restricted-network deployment patterns.
A single operational story across many deployment modes. Cloud, multi-region, BYOC, and self-hosted deployments should not become four totally different products to operate.
Safe production rollouts. Build deployment automation, environment parity, feature-flag discipline, CI/e2e reliability, monitoring, and rollback mechanisms so the team can move fast without breaking customer trust.

What You’ll Do

Own cloud infrastructure for production services across Terraform, EKS, ArgoCD/Kargo, IAM, DNS, networking, metrics, CI/CD, and deployment automation.
Build and operate infrastructure for trace ingestion, evaluation workers, RabbitMQ, Temporal, ClickHouse, and the systems that support Judgment’s core product.
Design multi-region and enterprise deployment architectures, including data residency, automatic failover, disaster recovery, and customer-managed environments.
Build secure deployment patterns for BYOC, self-hosted, private-network, and restricted environments.
Implement private connectivity, identity integrations, network isolation, encryption patterns, and enterprise security requirements.
Improve observability, alerting, runbooks, incident response, and operational tooling so the team can debug root causes quickly rather than chase symptoms.
Partner with backend engineers on reliability, scaling limits, queue behavior, storage growth, ingestion throughput, and production incidents.
Make deployments safer and faster through automation, rollout strategies, environment parity, CI reliability, e2e test health, and better internal tooling.
Work directly with customers when deployment, networking, security, or production environment constraints are the blocker.
Raise the bar for infrastructure quality through design docs, code reviews, operational rigor, and clean abstractions.

What We’re Looking For

Strong experience designing, building, and operating production cloud infrastructure for real customer-facing systems.
Deep understanding of distributed systems failure modes, especially around stateful services, queues, networking, storage, degraded networks, partial outages, and regional failures.
Strong programming ability in a modern language and a bias toward automating repeated operational work.
Experience with Kubernetes / EKS or similar orchestration systems, infrastructure-as-code, CI/CD, cloud networking, IAM, DNS, and production observability.
Ability to reason about reliability, security, deployment ergonomics, and developer velocity at the same time.
Experience owning infrastructure systems from design through implementation, rollout, incident response, and long-term maintenance.
Comfort working directly with customers on enterprise deployment, networking, compliance, or security constraints.
Clear written communication. You can write architecture proposals, operational runbooks, incident notes, and crisp tradeoff docs.

Nice to Have

Experience with Terraform, EKS, ArgoCD, Kargo, AWS networking, IAM, DNS, and production metrics/logging systems.
Experience operating ClickHouse, RabbitMQ, Temporal, Kafka, or other stateful production infrastructure.
Experience with private connectivity such as AWS PrivateLink, Azure Private Link, or GCP Private Service Connect.
Experience building BYOC, self-hosted, air-gapped, hybrid-cloud, or enterprise SaaS deployment models.
Experience with SSO, SAML, SCIM, secrets management, encryption, network isolation, and enterprise security reviews.
Experience with observability infrastructure, telemetry ingestion, or platforms like Datadog, Honeycomb, Sentry, or similar systems.
Experience supporting AI infrastructure, LLM evaluation workloads, or high-throughput event pipelines.

Why Judgment?

We’re building the learning infrastructure for agents. As agents move from demos to production, the bottleneck is no longer just better prompts. It is turning real production experience into high-quality data for evals, labeling, rubric generation, context engineering, and RL workflows.
Infrastructure is a product requirement here. Customers need Judgment to run reliably across our cloud, enterprise environments, and customer-managed deployments. Deployment quality directly affects whether they can use us.
The systems are real. High-throughput ingestion, stateful services, workflow orchestration, ClickHouse, LLM scoring, multi-region reliability, and BYOC all show up early.
This is a Databricks-scale infrastructure opportunity. Databricks built the data infrastructure for analytics. Judgment is building the learning infrastructure for agents.
You’ll have broad ownership. This is a small team, so infrastructure engineers own architecture, implementation, operations, and customer deployment outcomes.
In person in San Francisco. We work together in person because the problems are hard, the product is moving fast, and the feedback loops matter.

Apply Now

Refine Your Search

senior infrastructure developer jobs in san francisco, ca

Senior Infrastructure Engineer

Senior Software Engineer, Front-End Infrastructure

Senior Software Engineer, Infrastructure

Senior Support Engineer

Sr. Project Manager, FEED – On Prem Power

Senior Software Developer (AI)

Manager Digital Workplace Engineering

Manager Digital Workplace Collaboration

AI Solutions Architect (Senior)

Senior Salesforce Technical Architect

Senior Engineering Manager, AI Developer Experience

Senior Software Engineer, Infrastructure

Senior IT Associate, Audio & Video

Hyperbolic Labs - Senior GPU Infrastructure Engineer

Software Engineer, Infrastructure

Senior Developer Relations Engineer, AI for Builders

Senior Engineering Manager, AI Platform

Senior Software Engineer, Cloud Infrastructure

Senior Developer Advocate

Senior Salesforce Workflow Automation Developer

I want to receive the latest job alert for senior infrastructure developer in san francisco, ca

Related Searches

Explore jobs in more locations

Senior Cloud Infrastructure Engineer

The Role

Interesting Technical Challenges

What You’ll Do

What We’re Looking For

Nice to Have

Why Judgment?

Job Seeker Tools

Employer Tools

Browse

Stay Connected