Responsible for major IT systems incident management from initiation until an acceptable work-around is in place or resolved. Candidates must have experience in, and not limited to, support operations, escalation management, critical incident response, Candidate needs to demonstrate excellent communication skills and have experience in leading in a matrix management organization.
- Responsible for major IT systems incident management from initiation until an acceptable work-around is in place or resolved. Coordinate appropriate resources to resolve critical incidents in accordance with service level agreements and operational level agreements. Own all communication during a major system outage, ensuring IT management and the businesses are kept updated until the incident is resolved. Coordinate, manage and keep chronology of events during incident management conference calls.
- With thorough understanding of technology assets/environments/services, business needs and SLAs, lead the creation, revision and implementation of monitoring tools, processes and reports. Regularly review and identify process improvement opportunities and implement changes in collaboration with process owner and other technology functions. Champion and provide oversight to ensure adherence to established processes, tools and methodologies.
- Engage in establishment of environment and technical asset and service availability, reliability and maintainability requirements. Review availability information and identify developing issues and opportunities for improvement. Ensure effective hand-offs with appropriate technology function(s). Provide input into and drive availability improvement plans.
- Lead (with Process Owner) the definition and creation of data collection/tracking tools and methods and standardized reports to understand and optimize technology systems and/or services. Execute and/or lead the use of a variety of techniques and systems to collect and understand performance data, design and implement reporting templates and dashboards to convey trends, consumption, and performance, and monitor compliance with SLAs. Manage ad hoc reporting needs and requests.
- Monitor assigned environments, technical assets and/or services for behavior or performance outside of standards or SLAs. Identify potential cause and evaluate impact on infrastructure, delivery or services. Determine appropriate next steps (e.g. closer monitoring, further review or immediate action). Alert appropriate team (per process) when a threshold has been reached or a change/failure has occurred. Provide advice and guidance to others in monitoring and analysis of assets, systems and services.
- Provide oversight, technical direction, and expertise to the operations support teams as it relates to data analysis, monitoring tools and processes, and event detection...
- Document concerns and findings, collecting all pertinent data (to include comparison of exception data and normal data). Ensure incident/event tracking tools are current (per established guidelines and procedures). Review, improve and champion the accuracy and maintenance of knowledge base content and known error database.
- Bachelor’s degree in Computer Science, IT, MIS, Math or related field; or equivalent work experience.
- 3-5 years of relevant experience.
- 5+ years of technical operations/support experience with proven knowledge of and experience working with ITIL framework.
- 3+ years of experience with event monitoring and/or incident/problem management, to include setting-up monitoring thresholds and views.
- 5+ years of broad technical experience with proven expertise in a majority of the following areas: servers, networks, hardware, operating systems (Windows, UNIX, Solaris, Linux, AIX), virtualization software, middleware and related base build infrastructure and software.
- Experience and subject matter expertise in the web and distributed computing environment, as well as mainframe experience.
- ITIL Foundations certification preferred.
- Demonstrated experience following the ITIL® Service Lifecycle disciplines.
- Excellent problem detection and determination skills in multiple functional infrastructure/application areas.
- Proven organizational and leadership skills to successfully lead and influence cross-functional teams without a direct line of authority.
- Proven ability to identify opportunities for improvement to configurations, procedures and process, enabling greater availability & capability.
- Strong written and verbal communication skills with experience creating, championing and maintaining processes, procedures and policies.
- Experience working in the financial services industry or other similar, highly regulated environment.