DescriptionSaaS Operations Team Lead
Buffalo Grove, IL
Monday – Friday 8:00 am – 5:00 pm
Remote
We’re seeking a talented and motivated hands-on Team Lead to lead our SaaS Operations with a strong Site Reliability Engineering (SRE) mindset. You’ll own and improve the reliability of our SaaS platform—treating availability, performance, and operational excellence as core product features. This role is Azure-first and cloud-forward, while operating in a hybrid environment (Microsoft Azure plus private infrastructure).
Some of the things you’ll be doing:
- Lead the SaaS Operations/SRE team: prioritize work, mentor engineers, set standards, and act as the primary escalation point
- Own reliability outcomes: define and improve service health, availability, latency, and operational readiness
- Operate and optimize Azure services including Azure Front Door, Azure Container Apps, virtual networking, PaaS databases, and Key Vault
- Lead incident response end-to-end: triage, coordination, clear communications, and follow-through
- Drive root cause analysis and postmortems; ensure corrective actions are implemented and tracked
- Reduce operational toil through automation, self-service, and repeatable runbooks
- Build and refine observability: monitoring, logging, dashboards, and actionable alerting
- Manage day-to-day operational tickets and change activity following defined controls (incident/problem/change)
- Partner with Engineering, Infrastructure, and Security to improve operability and safe delivery (release readiness, rollout/rollback planning)
- Participate in an on-call rotation and planned maintenance windows after hours/weekends when needed
What technical skills, experience and qualifications do you need?
- 5+ years in production operations (SRE, platform engineering, DevOps, SaaS operations, systems engineering, or similar)
- Demonstrated technical leadership (team lead responsibilities, mentoring, ownership of operational standards)
- Strong troubleshooting across distributed systems: web platform, networking, containers, identity, certificates/secrets, and performance bottlenecks
- Azure production experience with:
- Azure Front Door
- Azure Container Apps
- Azure virtual networking (VNets, private endpoints, DNS patterns, hybrid connectivity concepts)
- Azure Key Vault
- PaaS databases
- Automation and scripting: PowerShell, Bash, Azure CLI, and YAML-based pipelines/workflows
- DevOps toolchain experience (GitHub and/or Azure DevOps); automation/config tooling such as Ansible (or equivalent)
- ITSM/process discipline and tools (e.g., ServiceNow): incident, problem, change management
Hybrid environment requirements
This position supports a hybrid platform. You must be able to operate and troubleshoot components running in private infrastructure, including:
- Enterprise identity systems (e.g., Active Directory, Group Policy)
- Web Platform (IIS)
- Microsoft server-based platforms and related operational practices (patching/maintenance, certificate lifecycle, file services such as DFS)
- Virtualization/hypervisor platforms (Nutanix AHV, VMware, or similar)
Nice to have
- Infrastructure as Code experience (Bicep preferred; Terraform/ARM also valuable)
- Experience implementing SLOs and improving alerting hygiene (noise reduction, paging policies)
- Experience improving incident response practices (runbooks, escalation paths, reliability reviews)
#CSC #CSCCareers #LI-HL1