To Apply for this Job Click Here
We are seeking a Cloud Site Reliability Engineer (SRE) to join our client’s Business Services organization. This role requires close collaboration with Development, QA, IT Operations, and Customer Operations teams to support and maintain a FedRAMP-compliant cloud infrastructure. The ideal candidate will be responsible for ensuring system reliability, performance, and scalability while adhering to compliance controls throughout the migration and cloud environment.
Day-to-Day Responsibilities:
-
Provide 75% Azure Cloud support and 25% Linux system administration.
-
Support and maintain FedRAMP-compliant infrastructure, ensuring compliance in all cloud operations.
-
Work with prebuilt modules and develop new ones as needed to meet compliance standards.
-
Collaborate with various teams to troubleshoot issues, improve system reliability, and optimize cloud infrastructure.
-
Utilize Kubernetes (AKS) for container orchestration, including Helm charts and monitoring solutions like Grafana/Prometheus.
-
Implement and maintain CI/CD pipelines using GitHub Actions, Jenkins, or Argo CD.
-
Develop and maintain automation scripts in Bash, Shell, or PowerShell.
-
Support Azure services such as AKS, Blob Storage, Redis, LinkerD, and Azure Egress Proxy.
Must-Have Skills:
- 5+ years of experience in SRE, Cloud Engineering, or DevOps roles.
- 5+ years supporting applications on Linux OS, with expertise in:
- User & permission management, system processes, and service administration.
- Bash/Python scripting for automation and system management.
- TCP/IP, DNS, firewalls, load balancing, and network troubleshooting.
- System performance monitoring using tools like top, htop, vmstat, iostat.
- Log analysis with journalctl, syslog, dmesg.
-
Linux security best practices (SSH hardening, firewalls, vulnerability patching).
-
3-5+ years experience with Azure Cloud (AKS, Blob Storage, Redis, etc.).
-
2-3+ years experience with Docker & Kubernetes:
- Experience managing Kubernetes clusters and deploying Helm Charts.
- Integrating Grafana/Prometheus for monitoring Kubernetes clusters.
- Hands-on experience with CI/CD pipelines (GitHub Actions, Jenkins, Argo CD).
- 2-3+ years scripting experience (Bash, Shell, or PowerShell).
- Experience designing Cloud Architecture solutions to address business needs.
Top Skills:
- Kubernetes (AKS)
- Docker
- Azure Cloud Services
- Linux System Administration
- CI/CD Pipelines (GitHub Actions, Jenkins, Argo CD)
Nice to Have:
- FedRAMP Compliance Experience
- Experience using JFrog Artifactory
- Cloud Certifications (Azure, Kubernetes, DevOps, etc.)
- Large-scale on-prem to Azure migration experience
This is a high-impact, fast-paced role where you will be at the forefront of cloud automation, compliance, and infrastructure reliability. If you’re an experienced Cloud SRE looking to take on a challenging opportunity, apply today!
1412804_1743686594