Site Reliability Engineer (SRE) – Vulnerability Management, Observability & Server Patching | Careers

Site Reliability Engineer responsible for ensuring security, reliability, and operational excellence of server infrastructure. The role focuses on proactive vulnerability management, server patching, and robust observability practices using industry-standard platforms.

Responsibilities

● Manage and improve enterprise vulnerability management program for aggregation, prioritization, and reporting
● Identify, analyze, and assess vulnerabilities across server infrastructure including operating systems and applications
● Partner with security, infrastructure, and application teams to prioritize remediation efforts
● Ensure adherence to corporate security policies and regulatory requirements
● Plan, schedule, and execute server patching activities for operating systems and third-party software
● Track patch compliance and remediation metrics including mean time to patch
● Develop and maintain automation scripts and tooling to streamline patching workflows
● Maintain and enhance observability of supported services using monitoring and alerting solutions
● Design and implement effective monitoring, alerting, and dashboards
● Define and measure service-level indicators and service-level objectives
● Analyze incidents and trends to drive continuous improvement
● Collaborate with application owners and platform teams to support SRE objectives
● Support incident response, root cause analysis, and post-incident reviews

Requirements

● Windows Server and Linux operating systems
● Server patching methodologies
● Vulnerability management frameworks and risk-based prioritization
● Vulnerability management tools (Brinqa, Qualys, or similar)
● Datadog implementation and monitoring
● On-premise and Microsoft Azure environments
● Docker and Kubernetes containerization
● CI/CD pipelines and GitOps deployments (ArgoCD)
● Python, PowerShell, or Bash scripting
● On-call rotations and incident response
● Networking concepts (TCP/IP, DNS, load balancing, firewall rules)
● ITIL concepts and operational best practices

← All Positions