Senior Developer and Operations Engineer (AI Infrastructure)

NVIDIA - Santa Clara, CA (30+ days ago)4.2


We are now looking for a Developer and Operations Engineer (AI Infrastructure):

NVIDIA is hiring engineers to scale up its AI infrastructure. You will need to have strong scripting, systems, and development skills; a deep understanding of continuous integration & delivery (CI/CD) systems; extensive experience with orchestration & automation systems; hands-on skills of managing a data center environment, as well as excellent communication and planning skills. You and other specialists in this team will help advance NVIDIA's capacity to build and deploy leading solutions for a broad range of GPU-based applications, including directly supporting open-source initiatives.

What you'll be doing:
Collaborate with multiple internal and external teams to understand their orchestration, scaling and deployment requirements

Maintain and manage expansions of both cloud and local data-center based lab environments

Deploy and operate our orchestration layer over both bare metal and cloud service providers, including future expansions and upgrades

Collaborate with internal and external researchers and leaders to build scale-out CI/CD solutions for our internal teams and partners in the open-source community

Build automation and tools that will increase the productivity of internal teams and external developers who use CUDA & open-source software

What we need to see:
You have a BS or MS in Computer Engineering, Computer Science, Information Systems or related field with 4+ years of Work or Research Experience in the areas below

Proven scripting/development skills in Python and Bash (experience with Conda/virtualenv, pip/pypi necessary), as well as, custom plugin and app development (experience with Java preferred)

Solid technical foundation and experience with at least one container orchestration system (Kubernetes, Swarm, Mesos, Marathon, Aurora, etc)

Considerable experience with at least one automation tool (Ansible, Puppet, Terraform, etc) to manage a cluster/cloud environment ensuring uptime and ease of deployment/management

Strong skills for building, managing, and deploying Docker images at scale, including converting existing processes into a Docker container

Extensive CI/CD experience leveraging (Jenkins CI, GitLab CI, etc) and necessary plugins to build and maintain a fully automated: build, test, and deploy pipeline

First-hand experience of managing Git code repositories, including branching/merge strategies for teams of all sizes

Proven ability to effectively plan, manage, troubleshoot, and remediate issues for a Linux lab environment in areas including hardware, networking, and software

Self-starter that has an interest and desire to improve existing methods, and has a history of expanding your skills through continued learning

Highly motivated with strong communication skills, you have the ability to work successfully with multi-functional teams, principles and architects and coordinate effectively across organizational boundaries and geographies.

With highly competitive salaries and a comprehensive benefits package, Nvidia is widely considered to be one of the technology industry's most desirable employers. We have some of the most brilliant and talented people in the world working with us and our engineering teams are growing fast in some of the hottest state of the art fields: Deep Learning, Artificial Intelligence, and Autonomous Vehicles. If you're a creative and autonomous computer scientist with a real passion for distributed systems & parallel computing, we want to hear from you.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression , sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.