Site Reliability Engineer

JFrog - Sunnyvale, CA (29 days ago)4.0


JFrog is a fast growing startup, building the next generation of software development and distribution tools. We are setting up the future of software development and DevOps tooling chain. Our top-notch development talent, open source community, awarded products, cool technology, paying customers, and solid funding - makes JFrog a really great place to join! We are looking for a DevOps Ninja to join our globally expanding Site Reliability Engineering team and establish a 24/7 Production reliability routine.

Responsibilities

In this role you will:
Work with cutting edge technology in the cloud and hardware computing space
Assume responsibility for production SaaS services with sensitivity to uptime based on customer SLAs
Install, configure, update and troubleshoot services such as web servers, relational and nonrelational Databases, CM and CSM tools, Application servers, engage with Docker and Kubernetes and much more
Monitor, troubleshoot and resolve Production grade issues, troubleshoot and configure system and applicative aspects of our SaaS platform and applications
Troubleshoot developer support cases and provide 24/07 emergency response to JFrog’s SaaS customers
Collaborate in a “DevOps” environment where you will work closely with our global Support, Solution Engineering, R&D, QA and DevOps teams Worldwide
Maintaining a knowledge base of known issues and solutions
Desired Skills and Experience

The ideal candidate would have:
Excellent problem solving skills with a desire to take on responsibility
Excellent written and verbal communication skills with ability to communicate technical issues to both technical and nontechnical audiences

A deep understanding and familiarity with:
Linux - CentOS, Ubuntu, Other
Networking knowledge - Firewalls, VPNs, proxies & Load balancers
Web/Application servers - Apache, Nginx, Tomcat, JVM environments
Monitoring and logging systems familiarity - experience with tools like Graphite, LogicMonitor, Logentries, SumoLogic, ELK stack – Advantage
Experience with public clouds (AWS, GCP etc) - Great Advantage
Virtualization and containers - Xen, KVM, Qemu, Docker etc. - Advantage
Storage, any of the following - NFS, SANs, RAID, lvm – Advantage
Customer facing experience – Great Advantage
2-3 years of relevant work experience, hands-on Linux experience and preferably using languages like Shell/bash, Ruby, Python, Java, Perl
Background in NOC / SOC operations - Great Advantage
Experience using and administering software version control systems (SVN, Git etc.) - advantage
Familiarity with Atlassian Suite (Confluence, JIRA, BitBucket etc.) - advantage
Knowledge of the following is a big plus: Artifactory & Bintray, Build tools, CI servers
Knowledge with Docker & Kubernetes - Great advantage
Ability to work independently, learn quickly and be proactive
Ability to join on call 24x7 roster (follow the Sun model)
Ability to work off routine hours occasionally