Gideon
DevOps & SRE engineer building observable, resilient infrastructure. I specialize in monitoring pipelines, synthetic checks, and alert engineering using Prometheus, Grafana, Splunk, and PagerDuty.
End-to-end SFTP lifecycle check: connect → authenticate → upload → download → SHA-256 verify → cleanup. Writes Prometheus metrics to node_exporter's textfile collector. Covers services endpoints across stacks.
Comprehensive CI/CD pipelines integrating Terraform for infrastructure automation and Ansible for configuration management. Designed GitOps workflows to streamline deployments for a three-tier application stack and monitoring solutions (Prometheus, Grafana, Loki). Implemented branching strategies to automate infrastructure validation, planning, and application deployment, and delivered modular and reusable workflows for validation, planning, applying, and monitoring, ensuring efficient and scalable operations.
Containerized a Flask service using Docker (Python 3.10 Alpine), served via Gunicorn, and built a full CI/CD pipeline around it. GitHub Actions builds and pushes versioned images to Docker Hub on every push to main, with the image tag tied to github.run_number. Kubernetes deployment is fully automated via Terraform using the Kubernetes provider — provisioning a configurable-replica Deployment and a NodePort Service. The workflow closes the CI/IaC loop by automatically updating the Terraform image variable to reference the latest build, ensuring no manual tag bumping between pipeline and infrastructure.
- Migrated 1,200+ SCOM alerts to Splunk, automated NOC forwarding, and diagnosed high-severity Windows Server incidents, reducing response latency by 45%
- Developed Python and PowerShell remediation scripts integrated into operational playbooks, automating recovery for 70% of repetitive alerts and cutting intervention time by 60%
- Implemented multi-layer validity checks (Layers 1–6) for critical services, improving MTTD by 35% and MTTR by 42% across high-priority incidents
- Designed and deployed Grafana dashboards for 4XX and 5XX error monitoring, giving leadership real-time reliability insights and reducing undetected service degradations by 50%
- Developed SFTP synthetic monitoring via Python lifecycle checks (connect → auth → upload → download → verify → cleanup) writing to node_exporter textfile collector
- Participated in ORCA (Operational Root Cause Analysis) reviews for core production systems, preventing recurrence of repeat error patterns and reducing post-incident
- Engineered complex, real-world application scenarios to demonstrate the functional depth of Gemini AI agents across workplace automation, business operations, and enterprise systems integration
- Translated multifaceted business workflows into executable AI logic, designing modular, testable structures that mirrored real-world processes such as sales enablement, lead generation, and automated contact intelligence
- Developed structured, code-like datasets for AI scenario modeling—simulating key automation tasks including business data extraction, contact enrichment, personalized outreach generation, and intelligent follow-up prioritization
- Applied software-engineering principles such as iterative refinement, logic debugging, and functional validation to enhance the precision and adaptability of synthetic datasets across multiple industry verticals
- Collaborated with AI models and human reviewers to identify logic gaps, improve output consistency, and establish scalable frameworks for reproducible testing and agent training
- Contributed to the development of a reusable AI scenario library, enabling accelerated deployment, benchmarking, and product QA for enterprise-grade automation tools
- Automated user and group creation, home directory setup, and password management using Bash scripts with error handling and logging for system administration tasks
- Containerized full-stack web applications (React, FastAPI, PostgreSQL) using Docker and Nginx, ensuring proper proxy configurations and cloud deployment on AWS EC2 with domain setup and HTTPS redirection
- Developed and deployed an email queue management and logging system using Python behind NGINX with RabbitMQ/Celery to automate email tasks and logging activities
- Developed and wrote the documentation of a GitHub bot used to automate pull requests and deployments, and provide real-time status updates and resource cleanup
- Automated deployment of full-stack applications using Ansible, managing PostgreSQL databases, messaging queues, and application configuration on cloud servers