Site Reliability Engineer

JFrog - Sunnyvale, CA (30+ days ago)4.0

JFrog is a fast growing startup, building the next generation of software development and distribution tools. We are setting up the future of software development and DevOps tooling chain. Our top-notch development talent, open source community, awarded products, cool technology, paying customers, and solid funding - makes JFrog a really great place to join! We are looking for a DevOps Ninja to join our globally expanding Site Reliability Engineering team and establish a 24/7 Production reliability routine.

Responsibilities In this role you will:
  • Work with cutting edge technology in the cloud and hardware computing space
  • Assume responsibility for production SaaS services with sensitivity to uptime based on customer SLAs
  • Install, configure, update and troubleshoot services such as web servers, relational and nonrelational Databases, CM and CSM tools, Application servers, engage with Docker and Kubernetes and much more
  • Monitor, troubleshoot and resolve Production grade issues, troubleshoot and configure system and applicative aspects of our SaaS platform and applications
  • Troubleshoot developer support cases and provide 24/07 emergency response to JFrog’s SaaS customers
  • Collaborate in a “DevOps” environment where you will work closely with our global Support, Solution Engineering, R&D, QA and DevOps teams Worldwide
  • Maintaining a knowledge base of known issues and solutions
Desired Skills and Experience The ideal candidate would have:
  • Excellent problem solving skills with a desire to take on responsibility
  • Excellent written and verbal communication skills with ability to communicate technical issues to both technical and nontechnical audiences
  • A deep understanding and familiarity with:
  • Linux - CentOS, Ubuntu, Other
  • Networking knowledge - Firewalls, VPNs, proxies & Load balancers
  • Web/Application servers - Apache, Nginx, Tomcat, JVM environments
  • Monitoring and logging systems familiarity - experience with tools like Graphite, LogicMonitor, Logentries, SumoLogic, ELK stack – Advantage
  • Experience with public clouds (AWS, GCP etc) - Great Advantage
  • Virtualization and containers - Xen, KVM, Qemu, Docker etc. - Advantage
  • Storage, any of the following - NFS, SANs, RAID, lvm – Advantage
  • Customer facing experience – Great Advantage
  • 2-3 years of relevant work experience, hands-on Linux experience and preferably using languages like Shell/bash, Ruby, Python, Java, Perl
  • Background in NOC / SOC operations - Great Advantage
  • Experience using and administering software version control systems (SVN, Git etc.) - advantage
  • Familiarity with Atlassian Suite (Confluence, JIRA, BitBucket etc.) - advantage
  • Knowledge of the following is a big plus: Artifactory & Bintray, Build tools, CI servers
  • Knowledge with Docker & Kubernetes - Great advantage
  • Ability to work independently, learn quickly and be proactive
  • Ability to join on call 24x7 roster (follow the Sun model)
  • Ability to work off routine hours occasionally