Sultan Gillani

About

Senior Site Reliability / DevOps Engineer with 12+ years of experience designing, automating, and operating reliable cloud infrastructure across Azure, AWS, and GCP. Strong background in Kubernetes, Terraform, CI/CD, observability, incident management, and infrastructure automation. Proven ability to lead SRE initiatives, improve production reliability, implement scalable monitoring platforms, and partner with engineering teams to strengthen operational excellence.

Experience

#
Site Reliability Engineer, AMH

February 2024 — Present 2 years
- Established and led AMH's first Site Reliability Engineering team, defining reliability practices, incident response processes, and service ownership standards across critical systems.
- Spearheaded the migration from Microsoft Application Insights to OpenTelemetry, standardizing distributed tracing and improving application performance visibility across services.
- Led the migration from Azure Monitor to Grafana Cloud, implementing a unified observability platform using Loki, Tempo, Prometheus, dashboards, and alerting.
- Implemented Infrastructure as Code with Terraform to automate provisioning and management of Grafana dashboards, alerts, and observability infrastructure.
- Defined and implemented Service Level Indicators and Service Level Objectives to measure reliability, improve operational transparency, and align service performance with business expectations.
- Designed proactive monitoring and alerting strategies to improve production issue detection, reduce response times, and support faster incident resolution.
- Led production incident investigations and Root Cause Analysis, documenting findings and driving preventative actions to reduce repeat incidents.
- Partnered with software engineering, DevOps, QA, SecOps, and IT operations teams to improve reliability, telemetry instrumentation, deployment practices, and operational visibility across Azure-hosted services.
February 2024
— Present

2 years
#
Senior DevOps Engineer, Evisort

February 2022 — September 2023 2 years
- Led the migration from GitLab CI to GitHub Actions, reducing operational costs and standardizing CI/CD workflows across engineering teams.
- Supported AWS infrastructure for data science, application development, and engineering workloads.
- Implemented Infrastructure as Code using Terraform, Ansible, GitLab, and Rundeck to automate provisioning, configuration management, and deployments.
- Migrated manually managed GitLab CI runner infrastructure to Amazon EKS, improving scalability, reliability, and operational maintainability.
- Automated Kubernetes deployments using Flux and ArgoCD, enabling GitOps-based delivery for containerized services.
- Developed Python and Bash automation to streamline DevOps workflows and support a modular Terraform infrastructure stack.
- Implemented Consul health checks and service monitoring to improve application reliability and operational visibility.
February 2022
—
September 2023

2 years
#
Site Reliability Engineer, Nolan Transportation Group

December 2019 — February 2022 2 years
- Played a key role in transitioning infrastructure from on-premises operations to Google Cloud Platform and later to Azure, supporting scalable and reliable cloud adoption.
- Configured and deployed GCP Kubernetes environments, establishing deployment processes for 50+ applications.
- Created and managed multiple GCP projects across development, staging, and production environments, automating infrastructure provisioning with Terraform.
- Used Terraform, GitLab CI, and Ansible to automate infrastructure provisioning, configuration management, and deployments across Azure and GCP.
- Spearheaded the transition from Azure DevOps to GitLab, implementing Git-based workflows and modern version control practices across engineering teams.
- Implemented automated testing in GitLab CI and integrated static and dynamic code analysis tools, reducing security vulnerabilities by 85%.
- Configured application performance monitoring with New Relic, improving performance visibility, troubleshooting, and user experience.
December 2019
—
February 2022

2 years
#
DevOps Engineer, Veracity Industrial Networks

November 2018 — December 2019 a year
- Maintained and enhanced a product demo platform hosted on AWS, ensuring a reliable and seamless experience for prospective customers.
- Implemented DevOps best practices, including Infrastructure as Code, CI/CD pipelines, and configuration management using Jenkins and Groovy.
- Created a Jenkins Shared Library to standardize and streamline build and deployment processes across projects.
- Led the migration from OpenBSD to Linux, enabling broader DevOps tooling, improved maintainability, and modern configuration practices.
- Developed front-end and back-end components for a security appliance UI using React, Redux, Ruby, Sinatra, RSpec, and Capybara.
November 2018
—
December 2019

a year
#
DevOps Engineer, Yardi Systems Inc.

January 2014 — October 2018 5 years
- Maintained 99.99% availability for production environments across hybrid infrastructure.
- Created Docker images, Docker Compose configurations, Kubernetes manifests, and ECS parameter files to support containerized deployments.
- Managed DevOps tools and platforms including Jenkins, GitHub, TeamCity, Consul, and Ansible.
- Containerized Nginx reverse proxy servers and implemented dynamic container provisioning to improve scalability, reliability, and security.
- Implemented AWS infrastructure components including VPCs, subnets, Elastic Load Balancers, CloudFront, Cloudflare, S3, and EC2.
- Developed automation and operational tooling using C#, VB.NET, and shell scripting.
January 2014
—
October 2018

5 years

Skills

#
Cloud Technologies: AWS, GCP, Kubernetes, Azure, Amazon EKS, ECS — Intermediate
- AWS
- GCP
- Azure
- Kubernetes
- Amazon EKS
- ECS
#
Programming Languages: Python, Ruby, C#, VB.NET — Intermediate
- Python
- Ruby
- C#
- VB.NET
- Javascript
#
Infrastructure as Code & Automation: Terraform, Ansible, Rundeck, Consul, Packer — Expert
- Configuration Management
- Terraform
- Ansible
- Rundeck
- Consul
- Packer
#
Scripting Language: UNIX Shell Scripting(bash, sh, zsh), Groovy, & Python — Intermediate
- Python Scripting
- Shell Scripting
- Groovy Scripting
#
Databases: PostgreSQL, SQL Server, MySQL, SQLite, Amazon RDS/Aurora — Expert
- Databases
- Postgresql
- MySQL
- SQL Server
- SQLite
- Amazon RDS
- Aurora
#
Networking & Cloud Infrastructure: VPC, NAT, Subnets, Load Balancers, TCP/IP, CDNs, Firewalls, CloudFront, Cloudflare — Expert
- Networking
- Firewalls
- NAT
- VPC
- Load Balancers
- CDNs
- CloudFront
- Cloudflare
#
Web Server: Nginx, Apache, IIS — Expert
- Nginx
- Ingress
- Apache
- Load Balancer
- IIS
#
Version Control System: Git, Subversion, Mercurial — Expert
- Git
- VCS
- SVN
- HG
#
Observability & Monitoring: OpenTelemetry, Grafana, Prometheus, Loki, Tempo, New Relic, Dynatrace, Honeycomb, Azure Monitor, Application Insights — Intermediate
- Monitoring
- OpenTelemetry
- Grafana
- Prometheus
- Loki
- Tempo
- New Relic
- Dynatrace
- Honeycomb
- Azure Monitor
- Application Insights
#
CI/CD & GitOps: GitHub Actions, GitLab CI, Jenkins, TeamCity, ArgoCD, Flux, Azure DevOps — Expert
- Automation
- CI/CD
- Jenkins
- GitLab CI
- GitHub Actions
- TeamCity
- ArgoCD
- Flux
- Azure DevOps
#
Platforms: Linux, Windows, OSX, OpenBSD/Unix — Expert
- Linux
- OSX
- Debian
- Ubuntu
- Centos
- OpenBSD
#
Leadership — Expert
- Team Leadership
- Engineering Excellence
- Mentoring

Awards

#
Scrum Master Accredited Certification
#
Scrum Product Owner Accredited Certification

Languages

#
English, Native speaker

Senior Site Reliability Engineer

About

Experience

Site Reliability Engineer, AMH

Senior DevOps Engineer, Evisort

Site Reliability Engineer, Nolan Transportation Group

DevOps Engineer, Veracity Industrial Networks

DevOps Engineer, Yardi Systems Inc.

Skills

Cloud Technologies: AWS, GCP, Kubernetes, Azure, Amazon EKS, ECS — Intermediate

Programming Languages: Python, Ruby, C#, VB.NET — Intermediate

Infrastructure as Code & Automation: Terraform, Ansible, Rundeck, Consul, Packer — Expert

Scripting Language: UNIX Shell Scripting(bash, sh, zsh), Groovy, & Python — Intermediate

Databases: PostgreSQL, SQL Server, MySQL, SQLite, Amazon RDS/Aurora — Expert

Networking & Cloud Infrastructure: VPC, NAT, Subnets, Load Balancers, TCP/IP, CDNs, Firewalls, CloudFront, Cloudflare — Expert

Web Server: Nginx, Apache, IIS — Expert

Version Control System: Git, Subversion, Mercurial — Expert

Observability & Monitoring: OpenTelemetry, Grafana, Prometheus, Loki, Tempo, New Relic, Dynatrace, Honeycomb, Azure Monitor, Application Insights — Intermediate

CI/CD & GitOps: GitHub Actions, GitLab CI, Jenkins, TeamCity, ArgoCD, Flux, Azure DevOps — Expert

Platforms: Linux, Windows, OSX, OpenBSD/Unix — Expert

Leadership — Expert

Awards

Scrum Master Accredited Certification

Scrum Product Owner Accredited Certification

Languages

English, Native speaker