We are now looking for a Developer and Operations Engineer (AI Infrastructure):
NVIDIA is hiring engineers to scale up its AI infrastructure. You will need to have strong scripting, systems, and development skills; a deep understanding of continuous integration & delivery (CI/CD) systems; extensive experience with orchestration & automation systems; hands-on skills of managing a data center environment, as well as excellent communication and planning skills. You and other specialists in this team will help advance NVIDIA's capacity to build and deploy leading solutions for a broad range of GPU-based applications, including directly supporting open-source initiatives.
What you'll be doing:
Collaborate with multiple internal and external teams to understand their orchestration, scaling and deployment requirements
Maintain and manage expansions of both cloud and local data-center based lab environments
Deploy and operate our orchestration layer over both bare metal and cloud service providers, including future expansions and upgrades
Collaborate with internal and external researchers and leaders to build scale-out CI/CD solutions for our internal teams and partners in the open-source community
Build automation and tools that will increase the productivity of internal teams and external developers who use CUDA & open-source software
What we need to see:
You have a BS or MS in Computer Engineering, Computer Science, Information Systems or related field with 4+ years of Work or Research Experience in the areas below
Proven scripting/development skills in Python and Bash (experience with Conda/virtualenv, pip/pypi necessary), as well as, custom plugin and app development (experience with Java preferred)
Solid technical foundation and experience with at least one container orchestration system (Kubernetes, Swarm, Mesos, Marathon, Aurora, etc)
Considerable experience with at least one automation tool (Ansible, Puppet, Terraform, etc) to manage a cluster/cloud environment ensuring uptime and ease of deployment/management
Strong skills for building, managing, and deploying Docker images at scale, including converting existing processes into a Docker container
Extensive CI/CD experience leveraging (Jenkins CI, GitLab CI, etc) and necessary plugins to build and maintain a fully automated: build, test, and deploy pipeline
First-hand experience of managing Git code repositories, including branching/merge strategies for teams of all sizes
Proven ability to effectively plan, manage, troubleshoot, and remediate issues for a Linux lab environment in areas including hardware, networking, and software
Self-starter that has an interest and desire to improve existing methods, and has a history of expanding your skills through continued learning
Highly motivated with strong communication skills, you have the ability to work successfully with multi-functional teams, principles and architects and coordinate effectively across organizational boundaries and geographies.
With highly competitive salaries and a comprehensive benefits package, Nvidia is widely considered to be one of the technology industry's most desirable employers. We have some of the most brilliant and talented people in the world working with us and our engineering teams are growing fast in some of the hottest state of the art fields: Deep Learning, Artificial Intelligence, and Autonomous Vehicles. If you're a creative and autonomous computer scientist with a real passion for distributed systems & parallel computing, we want to hear from you.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression , sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.