About CareView:
CareView Communications (OTC: CRVW) is a leading information technology provider serving the healthcare industry, focused on improving patient safety and reducing operational costs. Installed in over 200 hospitals nationwide, our innovative solutions include the development and manufacturing of advanced patient safety products designed to enhance care delivery. We deploy a proprietary high-speed data network within healthcare facilities. This network powers the CareView Patient Safety System® and a suite of integrated software applications that enable real-time bedside patient monitoring and virtual care.
Job Overview:
We are seeking an experienced Site Reliability Engineer (SRE) to own the reliability, resilience, and operational health of our server infrastructure. This role bridges software engineering and infrastructure operations, applying engineering discipline to availability challenges across both on-premises and cloud-connected environments. The primary focus is designing and implementing high-availability (HA) architecture for our clinical platform, managing a distributed fleet of servers deployed at customer sites, and maintaining a structured operating system update and patch management program.
Essential Duties & Responsibilities:
- High Availability Architecture and Implementation
- Design, implement, and maintain high availability architecture across on-premises and cloud-connected environments, ensuring continuous uptime for clinical systems
- Identify single points of failure across the infrastructure stack and develop mitigation strategies spanning network, application, and data layers
- Implement zero-downtime deployment strategies, including blue/green and rolling updates, to eliminate planned maintenance windows
- Collaborate with Development and Technical Operations teams to ensure HA considerations are addressed at every layer of the platform
- Fielded Server and Infrastructure Management
- Own the health, stability, and performance of a fleet of on-premises servers deployed at customer clinical sites
- Establish and maintain remote monitoring and telemetry systems to provide real-time visibility into the behavioral state of fielded hardware
- Develop and execute remote diagnostic and remediation procedures that minimize disruption to live clinical environments
- Operating System and Patch Management
- Develop and maintain a structured OS update and patch management program covering all fielded and cloud-based server infrastructure
- Design and execute staged rollout strategies for OS updates that account for the constraints of live clinical environments, including customer change control windows and regulatory requirements
- Maintain hardened OS baseline configurations in compliance with applicable security and regulatory frameworks
- Monitoring, Alerting, and Incident Response
- Design and operate a comprehensive monitoring and alerting platform covering system health, performance metrics, error rates, and availability across all infrastructure
- Build automated health checks and self-healing mechanisms to detect and recover from failures before they impact end users
- Develop and maintain operational runbooks enabling consistent incident response across the team
- Reliability Engineering and Automation
- Write code and automation to eliminate recurring manual operational tasks, improving team scalability and reducing toil
- Develop infrastructure-as-code for consistent, repeatable provisioning and configuration of both cloud and on-premises resources
- Build and maintain disaster recovery procedures, including regular DR testing with documented RTO and RPO targets
- Lead capacity planning efforts, forecasting infrastructure needs ahead of product growth and customer site expansion
- Performs other related duties as assigned
Minimum Qualifications (Knowledge, Skills & Abilities):
- 5+ years of experience in site reliability engineering or infrastructure engineering
- Demonstrated experience managing on-premises server infrastructure in addition to cloud environments
- Hands-on experience designing and implementing high availability architecture, including load balancing, clustering, and automated failover in an on-premises environment.
- Strong Linux administration with command-line proficiency; experience in hardening and maintaining Linux server environments in production
- Proficiency with containerization and orchestration technologies, including Docker and Kubernetes
- Experience with monitoring and observability platforms such as Prometheus and Grafana
Preferred (not required) experience:
- Experience with VMware, Hyper-V, or other hypervisor platforms in production environments
Education:
- Bachelor’s degree in computer science or a related field, or equivalent practical experience
Work Environment:
- Must pass a criminal background check and drug test
- Must pass HIPAA compliance test
- Must be authorized to work in the United States
- Working Hours: 8:00 AM to 5:00 PM
- Occasional evening and weekend work may be required as job duties demand
- Hybrid
- Location: 405 State Hwy 121 Bypass, Suite B240 Lewisville, TX 75067
Benefits:
- Health insurance
- Dental insurance
- Vision insurance
- Life insurance
- Paid time off
- 7 paid holidays
Equal Opportunity Statement:
CareView is committed to the principles of equal employment. We are committed to complying with all federal, state, and local laws providing equal employment opportunities, and all other employment laws and regulations. It is our intent to maintain a work environment that is free of harassment, discrimination, or retaliation based on the following protected classes: age (40 and older), race, color, national origin, ancestry, religion, sex, sexual orientation (including transgender status, gender identity or expression), pregnancy (including childbirth, lactation, and related medical conditions), physical or mental disability, genetic information (including testing and characteristics), veteran status, uniformed servicemember status, or any other status protected by federal, state, or local laws. The Company is dedicated to the fulfillment of this policy in regard to all aspects of employment, including, but not limited to, recruiting, hiring, placement, transfer, training, promotion, rates of pay, other compensation, termination, and all other terms, conditions, and privileges of employment.
Pay: $95,000.00 - $110,000.00 per year
Benefits:
- Dental insurance
- Employee assistance program
- Health insurance
- Life insurance
- Paid time off
- Vision insurance
Work Location: Hybrid remote in Lewisville, TX 75067