As a Staff Site Reliability Engineer you will be responsible for building and supporting the application infrastructure of one of the largest eCommerce sites in the world. This will require you to maintain high site uptime while embracing rapid change and growth using a strong devops mindset of continuous delivery and site automation. This is a demanding role that requires deep technical knowledge, adaptability, hands on execution, and a ruthless drive towards higher levels of availability.
In this role
You will have a maniacal focus on site uptime
Engineer application infrastructure that is reliable, efficient, and maintainable
Partner closely with software engineering teams using a strong devops mindset
Constantly improve operational processes and efficiency
Automate, Automate, Automate
MAJOR TASKS, RESPONSIBILITES AND KEY ACCOUNTABILITIES
70% - Delivery & Execution:
Collaborates and pairs with other product team members (UX, engineering, and product management) to create secure, reliable, scalable software solutions
Documents, reviews and ensures that all quality and change control standards are met
Writes custom code or scripts to automate infrastructure, monitoring services, and test cases
Writes custom code or scripts to do "destructive testing" to ensure adequate resiliency in production
Creates meaningful dashboards, logging, alerting, and responses to ensure that issues are captured and addressed proactively
Contributes to enterprise-wide tools to drive destructive testing, automation, or engineering empowerment
Identifies product enhancements (client-facing or technical) to create a better experience for the end users
Identifies unsecured code areas and implements fixes as they are discovered with or without tooling
Contributes to foundational code elements that can be reused many times by a product
Contributes to meaningful architecture diagrams and other documentation needed for security reviews or other interested parties
Defines Service Level Objectives for product(s) to constantly measure their reliability in production and help prioritize backlog work
20% - Support & Enablement:
Fields questions from other product teams or support teams
Monitors tools and participates in conversations to encourage collaboration across product teams
Provides application support for software running in production
Proactively monitors production Service Level Objectives for product(s)
Proactively reviews the performance and capacity of all aspects of production: code, infrastructure, data, and message processing
Triages high priority issues and outages as they arise
10% - Learning:
Participates in and leads learning activities around modern software design and development core practices (communities of practice)
Proactively views articles, tutorials, and videos to learn about new technologies and best practices being used within other technology organizations
Attends conferences and learns how to apply new technologies where appropriate
NATURE AND SCOPE
Typically reports to the Software Engineer Manager or Sr. Manager.
ENVIRONMENTAL JOB REQUIREMENTS
Located in a comfortable indoor area. Any unpleasant conditions would be infrequent and not objectionable.
Typically requires overnight travel less than 10% of the time.
Must be eighteen years of age or older.
Must be legally permitted to work in the United States.
Additional Minimum Qualifications:
Proficient in an object oriented programming language (preferably Java)
Must be legally permitted to work in the United States
The knowledge, skills and abilities typically acquired through the completion of a bachelor's degree program or equivalent degree in a field of study related to the job.
Years of Relevant Work Experience: 5 years
Most of the time is spent sitting in a comfortable position and there is frequent opportunity to move about. On rare occasions there may be a need to move or lift light articles.
5+ years’ experience building and supporting large-scale, business critical systems
Public Cloud experience (AWS/ Google Cloud)
Expert knowledge of at least one web application platform: WebSphere, JBoss, Tomcat, Apache, NginX, Varnish, Endeca,
Expert knowledge of Application Performance Monitoring tools: Dynatrace, Splunk, Gomez, Coradiant, and Tealeaf
Experience with continuous integration platforms such as Jenkins
Experience with infrastructure configuration management tools such as Puppet and Chef
Mastery of at least one scripting language including Python, PERL, Ruby, Shell
Unix/Linux power user
Bachelor’s degree in Computer Science or related field
Knowledge, Skills, Abilities and Competencies:
Cultivates Innovation: Creating new and better ways for the organization to be successful
Action Oriented: Taking on new opportunities and tough challenges with a sense of urgency, high energy and enthusiasm
Business Insight: Applying knowledge of business and the marketplace to advance the organization s goals
Collaborates: Building partnerships and working collaboratively with others to meet shared objectives
Communicates Effectively: Developing and delivering multi-mode communications that convey a clear understanding of the unique needs of different audiences
Drives Results: Consistently achieving results, even under tough circumstances
Global Perspective: Taking a broad view when approaching issues; using a global lens
Interpersonal Savvy: Relating openly and comfortably with diverse groups of people
Manages Ambiguity: Operating effectively, even when things are not certain or the way forward is not clear
Optimizes Work Processes: Knowing the most effective and efficient processes to get things done, with a focus on continuous improvement
Self-Development: Actively seeking new ways to grow and be challenged using both formal and informal development channels
Situational Adaptability: Adapting approach and demeanor in real time to match the shifting demands of different situations