Careers

Site Reliability Engineer (SRE) – Vulnerability Management, Observability & Server Patching

Contract · Seattle, WA (on-site, 4 days/week)

Site Reliability Engineer responsible for ensuring security, reliability, and operational excellence of server infrastructure. The role focuses on proactive vulnerability management, server patching, and robust observability practices using industry-standard platforms.

Responsibilities

  • Manage and improve enterprise vulnerability management program for aggregation, prioritization, and reporting
  • Identify, analyze, and assess vulnerabilities across server infrastructure including operating systems and applications
  • Partner with security, infrastructure, and application teams to prioritize remediation efforts
  • Ensure adherence to corporate security policies and regulatory requirements
  • Plan, schedule, and execute server patching activities for operating systems and third-party software
  • Track patch compliance and remediation metrics including mean time to patch
  • Develop and maintain automation scripts and tooling to streamline patching workflows
  • Maintain and enhance observability of supported services using monitoring and alerting solutions
  • Design and implement effective monitoring, alerting, and dashboards
  • Define and measure service-level indicators and service-level objectives
  • Analyze incidents and trends to drive continuous improvement
  • Collaborate with application owners and platform teams to support SRE objectives
  • Support incident response, root cause analysis, and post-incident reviews

Requirements

  • Windows Server and Linux operating systems
  • Server patching methodologies
  • Vulnerability management frameworks and risk-based prioritization
  • Vulnerability management tools (Brinqa, Qualys, or similar)
  • Datadog implementation and monitoring
  • On-premise and Microsoft Azure environments
  • Docker and Kubernetes containerization
  • CI/CD pipelines and GitOps deployments (ArgoCD)
  • Python, PowerShell, or Bash scripting
  • On-call rotations and incident response
  • Networking concepts (TCP/IP, DNS, load balancing, firewall rules)
  • ITIL concepts and operational best practices
← All Positions