Full Job Description
With Discover, you’ll have the chance to make a difference at one of the world’s leading digital banking and payments companies. From Day 1, you’ll do meaningful work you’re passionate about, with the support and resources you need for success. We value what makes each employee unique and provide a collaborative, team-based culture that gives everyone an opportunity to shine. Be the reason millions of people find a brighter financial future, while building the future you want, here at Discover.
Responsible for the technical design, deployment, monitoring and ongoing support and maintenance of a diverse set of cloud technologies. The role is a technical, hands-on opportunity with a heavy focus on automation, resilient design and deployment of cloud ready systems and services. This role collaborates with Product teams internal and external to IS to provide world class products and services in support of our application development community, and our business as whole. This is a 'DevOps' position, responsible for the full-stack engineering and support of products that support our hybrid cloud capabilities.
Being a Site Reliability Engineer at Discover is someone who likes to take responsibility for new applications going into production to ensure operational excellence (Availability, latency, performance, efficiency, problem management, monitoring, emergency response and capacity planning). You will participate in anything that prevents a system/app from serving its’ intended purpose. Could be slowness, could be an outage, to understand how we can improve Time to Detect, Time to Fix, and Time to Mitigate issues. You will improve our monitoring solutions and define SLIs/SLOs. You will develop automated solutions using a variety of coding languages. We are organized as a Chapter organization, so you will be expected to lead the SRE mindset across the organization.
In general, an SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.” Site reliability engineers create a bridge between development and operations by applying a software engineering mindset to system administration topics.
Leads the design, build and maintenance of modern cloud platforms that support agile teams.
Partners with key stakeholders as a platform champion for cloud-native systems, and coaches on how to use platform capabilities effectively through appropriate venues.
Drives continuous improvement of cloud products & capabilities though internal user groups and external market research.
Driving innovation and platform evolution, Scaling cloud infrastructure to support our growing ecosystem
Provide reliable, predictable deployment and maintenance of distributed systems Adhering to security best practices
Writing and designing automation, monitoring, diagnostics and debug tooling to improve troubleshooting and recovery
Participating in production support and on-call rotations
Conducting incident management and contribute to associated retrospective/post mortem as needed
Responsible for the Stability and Performance of critical Business Services
Contribute to associated retrospective/post mortem as needed
Participating in Agile Sprints and associated ceremonies
Bachelor’s Degree in Information Technology
6+ years of Application or platform development, Consulting, or related
In lieu of education 8+ years of Application or platform development, consulting, or related
3+ years in a SRE role
Well versed with the entire software development lifecycle, DevOps, and SRE practices
Experience with operational monitoring tools with a mindset towards predictive analysis
Working knowledge of the automation tools such as Ansible, Terraform, or Chef
Familiar with Pivotal Cloud Foundry (PCF), OpenShift (OCP), Amazon Web Service (AWS), and Google Cloud Platform (GCP)
A solid understanding of working with git
Experience with troubleshooting and debugging issues at any level
Strong knowledge and understanding of microservices based architectures, APIs, etc.
Good understanding of networking including L2 and L3 concepts, including Firewall, Load Balancing, Routing and Switching.
A working knowledge of Linux based systems and Virtual Machines (VM) technology
Strong scripting skills including ability to write scripts from scratch using Python and/or Bash
Can identify and mitigate reliability risks
Excellent communication and troubleshooting skills
Strong analytical and problem-solving skills
Basic knowledge and understanding of Security (CIA Model and PCI compliance) is a plus
Experience with Continuous Integration and Continuous Delivery models including Blue/Green and Canary release models is a plus
Experience with continuous integration/deployment frameworks such as Jenkins
What are you waiting for? Apply today!
The same way we treat our employees is how we treat all applicants – with respect. Discover Financial Services is an equal opportunity employer (EEO is the law). We thrive on diversity & inclusion. You will be treated fairly throughout our recruiting process and without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status in consideration for a career at Discover.
24-Hour Nurse Hotline & Telehealth Services
7 Paid Holidays
Annual Flu Shots
Employee Assistance Program
Flexible Work Environment
Group Auto, Home and Pet Insurance
Healthy Eating Program
Legal Assistance Plan
Onsite Emotional Health Counselors
Onsite Fitness Centers
Onsite Weight Watchers at Work
Paid Parental Leave
Professional and Leadership Development Programs
Service Anniversary Awards
Annual Health Evaluation and Health Coaching
Critical Illness Insurance
Health Savings Account, Health Reimbursement Account and Flexible Spending Accounts
Health, Vision and Dental Insurance
Life and Accident Insurance
Long-term and Short-term Disability Insurance
Onsite Health Services Center with Nurse Practitioner
401(k) Savings Plan with Fixed and Matching Contributions
Employee Stock Purchase Plan
“Financial Wellness for You” Learning Programs