SRE Team Leader & Escalation Manager

Perimeter 81

Perimeter 81

Tel Aviv District, Israel · Tel Aviv-Yafo, Israel

Posted on May 12, 2026
Why Join Us?

We are looking for a technically strong and AI-savvy SRE Team Lead & Escalation Manager to own production reliability, incident management, and cross-functional prioritization. This role leads our AI-driven automation strategy, drives self-healing infrastructure development, and sets a new standard for modern reliability engineering.

Key Responsibilities

  • Lead and mentor the SRE team; improve monitoring, alerting, and observability.
  • Own production incidents and escalations end-to-end — from mitigation to RCA to corrective action.
  • Lead the design and development of self-healing systems capable of detecting, diagnosing, and remediating incidents autonomously.
  • Drive automation of repetitive operational workflows using AI/ML-based solutions to reduce toil and MTTR.
  • Manage the cross-functional Squad handling customer and production issues; align priorities across Support, QA, R&D, and Sources.
  • Track key operational metrics and lead long-term reliability improvements.

Qualifications

  • 3-5 years in SRE or Incident Management.
  • Mandatory: Hands-on experience applied to operational challenges (AIOps, anomaly detection, LLM-based automation, or auto-remediation).
  • Proven track record of automating workflows and reducing manual toil at scale.
  • Strong cloud background (AWS/Azure/GCP) and experience with Kubernetes, Docker, and CI/CD.
  • Proficiency with observability tools (Grafana, Prometheus, ELK) and scripting (Python, Bash).
  • Demonstrated leadership in high-pressure, cross-functional environments.

Advantages

  • Background in cybersecurity or SaaS platforms.
  • Experience with LLMOps, AI agents, or orchestration platforms (e.g., n8n, Temporal).

Key Attributes

  • Strong ownership, accountability, and composure under pressure.
  • Passionate about leveraging AI to automate workflows, reduce toil, and accelerate incident resolution.
  • Visionary about self-healing operations — able to both define the strategy and drive its implementation.
  • Collaborative leader with the ability to align cross-functional stakeholders.
  • Technically hands-on systems-level thinker with the drive to engineer scalable, long-term solutions.