Careers
Site Reliability Engineer (SRE) – Vulnerability Management, Observability & Server Patching
Contract · Seattle, WA (on-site, 4 days/week)
Site Reliability Engineer responsible for ensuring security, reliability, and operational excellence of server infrastructure. The role focuses on proactive vulnerability management, server patching, and robust observability practices using industry-standard platforms.
Responsibilities
- ● Manage and improve enterprise vulnerability management program for aggregation, prioritization, and reporting
- ● Identify, analyze, and assess vulnerabilities across server infrastructure including operating systems and applications
- ● Partner with security, infrastructure, and application teams to prioritize remediation efforts
- ● Ensure adherence to corporate security policies and regulatory requirements
- ● Plan, schedule, and execute server patching activities for operating systems and third-party software
- ● Track patch compliance and remediation metrics including mean time to patch
- ● Develop and maintain automation scripts and tooling to streamline patching workflows
- ● Maintain and enhance observability of supported services using monitoring and alerting solutions
- ● Design and implement effective monitoring, alerting, and dashboards
- ● Define and measure service-level indicators and service-level objectives
- ● Analyze incidents and trends to drive continuous improvement
- ● Collaborate with application owners and platform teams to support SRE objectives
- ● Support incident response, root cause analysis, and post-incident reviews
Requirements
- ● Windows Server and Linux operating systems
- ● Server patching methodologies
- ● Vulnerability management frameworks and risk-based prioritization
- ● Vulnerability management tools (Brinqa, Qualys, or similar)
- ● Datadog implementation and monitoring
- ● On-premise and Microsoft Azure environments
- ● Docker and Kubernetes containerization
- ● CI/CD pipelines and GitOps deployments (ArgoCD)
- ● Python, PowerShell, or Bash scripting
- ● On-call rotations and incident response
- ● Networking concepts (TCP/IP, DNS, load balancing, firewall rules)
- ● ITIL concepts and operational best practices