About Me
Hello! I'm Dilhan, a results-driven Cloud/DevOps Architect and an Engineering Manager with experience in pre-sales and leading high-performing teams to innovate and implement scalable cloud solutions.
Skilled in orchestrating the development and deployment of applications by leveraging modern DevOps methodologies and cloud architectures. Adept at fostering a culture of continuous integration and continuous deployment (CI/CD) while ensuring robust security and compliance standards. Excels in automating workflows, optimizing infrastructure, and managing cross-functional teams to increase efficiency and drive project deliverables.
With a strong emphasis on collaboration and communication, I seamlessly bridge the gap between technical teams and business stakeholders to deliver results that align with organizational objectives and customer needs.
I thrive on solving complex technical challenges and building robust systems that empower development teams. I am proficient in tools like Kubernetes, Docker, Terraform, Ansible, Jenkins, and more. I believe in continuous learning and always staying ahead of the curve in the rapidly evolving cloud landscape.
Core Competencies: Cloud Architecture (AWS, Azure, GCP, Alibaba, Singapore Government Clouds) | DevOps Automation | Kubernetes | Terraform | CI/CD | FinOps | Infrastructure as Code | Pre-Sales Engineering | Solution Architecture | Governance & Compliance | Agile Delivery | Engineering Management | Stakeholder Management | Performance Optimization | Team Management
KPIs
Years of Experience
Team Members Led
Lead Time Reduction
Team Certification Growth
Cloud Cost Reduction
SGD Project Portfolio
Core Expertise
My career is built on a foundation of technical architecture, team leadership, and strategic pre-sales. This chart illustrates the synergy between these core areas, which enables me to deliver comprehensive, business-aligned solutions.
Skills
Cloud Platforms
AWS, Azure, Google Cloud Platform (GCP), Alibaba Cloud, G-Cloud (Singapore), GPC (Singapore), GCC (Singapore)
Containerization & Orchestration
Docker, Kubernetes, ECS, EKS, AKS, GKE, OpenShift, Karpenter
Infrastructure as Code (IaC)
Terraform, Bicep, ARM, AWS CDK, CloudFormation, OpenTofu, Pulumi, Ansible, Puppet, Chef
Databases & Distributed Systems
SQL (PostgreSQL, MySQL), NoSQL (CosmosDB, DynamoDB, MongoDB), Kafka, Azure Service Bus, SQS, Redis
Integration Tools
Dell Boomi, Pentaho, AWS Step Functions, Azure Logic Apps, REST, GraphQL
CI/CD & Automation
Jenkins, GitLab CI, GitHub Actions, Azure DevOps, CircleCI, ArgoCD
Security & Compliance
IAM, Network Security, Azure Policies, SonarQube, Checkmarx, Compliance Audits
Monitoring & Logging
Prometheus, Grafana, ELK Stack, CloudWatch, Azure Monitor, SumoLogic, Dynatrace, New Relic
Enterprise Architecture
SAFE, TOGAF, Domain-Driven Design, Microservice Architecture, Event-Driven Architecture
Latest Missions
SIT-CIT AWS Landing Zone Migration
Orchestrating an enterprise-grade cloud migration and governance framework for SIT-CIT using AWS Control Tower.
Secure IaC Deployment Engine
Architected a secure, Docker-based Terraform deployment host for isolated and auditable infrastructure delivery.
FIND - Evaluation Platform
Built a cloud-native platform for validating AI software for clinical evaluations in a secure AWS environment.
Gembaa - Cloud-Native SaaS Platform
Led the design and deployment of a multi-tenant, highly available Kubernetes-based SaaS solution on AWS.
Octais Platform – Enterprise Azure Cloud Platform
Led cloud infrastructure transformation for a large-scale multi-tenant platform on Azure.
Governance.com (Consulting Engagement)
A process improvement initiative to enhance client support, operational efficiency, and development workflows.
Direct Asia - Portal/CRM/Cloud & DevOps
A technology transformation mission to improve data pipelines and modernize cloud architecture in Azure.
PUB - 1BUP QP & LP Migration to GPC
Led the pre-sales and architectural migration of a critical government portal to GPC 2.0.
Resume
Download my full resume from LinkedIn for a detailed overview of my experience, skills, and certifications.
Summary
Highly accomplished Cloud and DevOps Architect with 18+ years of progressive experience in designing, deploying, and managing robust, scalable, and secure cloud infrastructure. Proven expertise in automating complex workflows, optimizing system performance, and implementing best practices for continuous integration and delivery.
Experience
Associate Director - Lead Cloud Consultant | Singapore Institute of Technology (SIT)
October 2025 – Present | Singapore
- Directed the design and deployment of secure, scalable multi-cloud infrastructures across AWS and Azure, guiding teams in adopting IaC practices with Terraform and CloudFormation to deliver cost efficiency and reliability.
- Partnered with C-level stakeholders and IT Security leadership to establish cloud security frameworks, enforce compliance with regulatory standards, and implement proactive vulnerability management strategies.
- Championed the adoption of Kubernetes, Docker, and serverless architectures, mentoring engineering teams to deliver cloud-native platforms that improved agility, resilience, and time-to-market.
- Oversaw the implementation of enterprise-grade CI/CD pipelines, disaster recovery frameworks, and API management platforms, ensuring business continuity, operational excellence, and seamless integration across complex ecosystems.
Cloud Solution Architect & Engineering Manager | Staizen
August 2020 – August 2025 | Singapore
- Defined technical vision for a 30+ member team, leading cloud transformations.
- Managed complex infrastructure and led pre-sales engagements, securing multi-million-dollar contracts.
- Directed OKR-based goal setting for engineering teams, improving delivery predictability and alignment with business objectives.
Senior Consultant & Cloud Solution Architect | Mantu
June 2019 – August 2020 | Singapore
- Acted as primary cloud architect for large-scale projects, translating business needs into secure, scalable solutions.
- Provided technical leadership during pre-sales cycles and conducted cloud readiness assessments for enterprise clients.
Senior Solution Architect | Total eBiz Solution
March 2018 – June 2019 | Singapore
- Owned architectural design for public sector projects with Singapore Government Agencies.
- Developed response strategies for government tenders, ensuring compliance with IM8 and MAS standards.
Chief Technology Officer | Ascentic
May 2017 – March 2018 | Sri Lanka
- Set the company's technology roadmap, led pre-sales for digital transformation, and oversaw end-to-end delivery.
- Introduced DevOps practices, reducing delivery lead times by 35%.
Technical Architect | Inexis Consulting
December 2014 – April 2017 | Sri Lanka
- Provided technical leadership during pre-sales cycles.
- Designed cloud-first, API-driven solutions for healthcare and insurance domains, emphasizing performance and compliance.
Technical Lead | Virtusa
September 2008 – November 2014 | Sri Lanka
- Directed full-stack development teams on enterprise-grade financial solutions.
- Oversaw SharePoint administration and custom module development.
Education
Doctorate in Computer Science - Reading | Aspen University, USA
2020 September - Present
MSc in Enterprise Application Development (Distinction) | Sheffield Hallam University
August 2015 – December 2017
Certifications
- AWS Certified Solutions Architect
- AWS Certified Developer
- AWS Data Warehousing
- Microsoft Azure Administrator
- Microsoft Azure Architect
- Dell Boomi Associate Developer
- Certified SCRUM Master
Blog & Insights
Architecting the “Agentic Mesh” for the Autonomous Enterprise
We have officially crossed the Rubicon. 2024 and 2025 were defined by the race to integrate Large Language Models (LLMs) into business workflows, predominantly through Retrieval-Augmented Generation (RAG) chatbots and deterministic tool calling. While valuable, these implementations remain largely reactive human-in-the-loop, triggering single actions. As we settle into 2026, the paradigm is shifting rapidly from single-model implementations to multi-agent autonomous systems. We are moving from building systems that answer questions to building systems that achieve goals.
Published: January 05, 2026The Drumbeat of Change
As a Cloud & DevOps Architect, I’ve been hearing the whispers, then the murmurs, and now the increasingly loud conversations about the future of Nginx. While “retiring” might be too strong a word, Nginx is still a powerful and widely used tool; the shift in ownership and the evolving open-source landscape have many organizations pondering alternatives and future strategies. This isn’t a doomsday prediction, but rather a call to proactive planning.
Published: November 15, 2025Beyond the Pair Programmer
"Hybrid Teams," where humans and AI agents operate as a single unit. In this vision, human developers move away from repetitive implementation tasks to focus on high-level architecture and the "why" behind a product, while specialized agents handle the technical "how," including coding, security reviews with CodeQL, and maintenance.
Published: November 11, 2025A Senior Architect’s Playbook for Migrating to KRaft
The general availability of Apache Kafka 4.0 officially removes the ZooKeeper dependency, mandating a one-way migration to the internal KRaft protocol for all self-managed clusters. This fundamental architectural shift delivers significant benefits, including operational simplicity, massive scalability to millions of partitions, and near-instant controller failover. The post outlines a zero-downtime, phased migration playbook, starting with a "bridge" release upgrade, followed by deploying a new KRaft controller quorum, and then performing a "dual-write" rolling restart to safely migrate metadata. This process concludes with an irreversible final cut-over, which necessitates critical pre-migration validation, implementing new KRaft-specific monitoring, and updating all admin tools to no longer connect to ZooKeeper.
Published: October 26, 2025The Strategic Imperative for Scalable Cloud Delivery and Optimized Developer Experience
Platform Engineering (PE) is the strategic architectural solution necessary to overcome the Cloud Complexity Crisis, which has resulted in unsustainable cognitive overload for application developers under decentralized DevOps models. PE addresses this by creating an Internal Developer Platform (IDP)—a productized, self-service interface that abstracts infrastructure complexity through standardized, automated Golden Paths. This approach centralizes high-skill infrastructure expertise and embeds security, governance, and cost efficiency into the architecture, enabling developers to focus exclusively on business logic. The article concludes that PE is a mandatory shift that significantly boosts organizational velocity, minimizes developer toil, and delivers measurable business value through improvements in key DORA metrics, fundamentally reshaping cloud delivery towards a highly automated and optimized future.
Published: October 23, 2025Cloud Ops is Still Too Hard. LLMs are About to Fix It.
Let’s face it: our cloud environments are an absolute monster. We’re all juggling sprawling microservices, endless multi-cloud configurations, and a thousand tiny details. Yeah, Infrastructure as Code (IaC) and our CI/CD pipelines have helped, but they haven't solved the core problem. So much of DevOps is still just repetitive, exhausting toil: debugging incidents that take hours, endlessly tuning policies, and making sure documentation isn't instantly outdated. It’s draining. Here’s the game changer: Large Language Models (LLMs). They're injecting genuine contextual reasoning and flexibility into operations. Instead of us having to write every single script, the LLMs can now generate configs, intelligently analyze logs, and even propose the fix.
Published: October 02, 2025A Deep Dive into Declarative Side-Effects
Terraform has always excelled at managing infrastructure, but handling side-effects like cache invalidations or notifications often meant hacks and external scripts. With the introduction of Terraform Actions in v1.14, these tasks now have a native, declarative home. Actions let you trigger operations either directly from the CLI or bound to resource lifecycle events, with support for scaling, conditions, and long-running workflows. Real-world examples include automatically invalidating a CDN when content changes or stopping EC2 instances immediately after creation to save costs. While provider support is still limited and action results cannot yet be dependencies, the approach significantly reduces complexity and brings more operational workflows into Terraform itself. This is a major step forward in unifying infrastructure and side-effects under Infrastructure as Code.
Published: September 28, 2025Azure AKS Automatic
Azure Kubernetes Service (AKS) Automatic is a major leap forward, offering a fully managed, "production-ready" Kubernetes experience that eliminates much of the traditional operational burden. By baking in best practices for security and automation, the service streamlines everything from initial setup to day-two operations. It intelligently scales workloads with pre-configured tools like HPA, VPA, and Karpenter, and provides built-in safeguards for security and reliability. This approach makes Kubernetes far more accessible to lean teams and startups, while also providing a standardized, efficient platform for larger enterprises, ultimately allowing all teams to accelerate application delivery and focus on innovation.
Published: September 21, 2025AWS Launches 8th-Gen Memory-Optimized EC2
The new Amazon EC2 R8i and R8i-flex instances, powered by custom Intel Xeon 6 processors, are now generally available and offer significant performance and price-performance improvements over the previous generation. Designed for memory-intensive workloads, these instances provide up to 20% higher overall performance and feature the latest AWS Nitro Cards to double network and EBS bandwidth. The R8i-flex is a great option for workloads that don't need full compute utilization, while the R8i is built for more demanding applications.
Published: September 19, 2025What’s really inside Terraform Enterprise v1.0.1?
Scheduled for release next week, Terraform Enterprise v1.0.1 signals a new era of maturity for the platform, moving beyond flashy features to solidify its core enterprise capabilities. This foundational update is expected to enhance governance with more granular, context-aware policy controls, shift cost management left with proactive budget and drift detection, and streamline Day-2 operations by introducing intelligent, automated drift remediation workflows. While seemingly an incremental update, v1.0.1 represents a deliberate step toward embedding smarter, more autonomous controls for security, cost, and compliance directly into the infrastructure lifecycle, reinforcing TFE's role as a comprehensive solution for managing IaC at scale.
Published: September 10, 2025Supercharging GuardDuty with Custom Entity Lists
Amazon GuardDuty has long served as a crucial intelligent threat detection service within AWS, leveraging machine learning and global threat intelligence to identify malicious activity. However, a significant limitation was its inability to natively incorporate an organization's unique, context-specific threat intelligence, forcing teams to rely on separate, often complex, workarounds. The introduction of new custom entity lists fundamentally changes this dynamic.
Published: September 08, 2025Azure App Service Premium v4 Release
Microsoft quietly but significantly raised the bar for application hosting with the General Availability of Azure App Service Premium v4 on September 1, 2025. This update is not just another SKU refresh. It represents a step forward in performance, scalability, and flexibility for developers and organizations running critical workloads on Azure.
Published: September 04, 2025Azure Landing Zone with Terraform
An Azure landing zone uses subscriptions to isolate and scale application resources and platform resources. Subscriptions for application resources are called application landing zones, and subscriptions for platform resources are called platform landing zones.
Published: September 02, 2025AWS Landing Zone using Terraform and Sentinel
This post provides a comprehensive overview of the architectural principles, design patterns, and core components used in this AWS Landing Zone. The goal of this architecture is to establish a secure, scalable, cost-efficient, and operationally excellent foundation for deploying all workloads on AWS.
Published: September 01, 2025Demystifying Kubernetes Networking
Kubernetes networking can feel like a black box until you know exactly how packets move inside your cluster. In this blog post, I break down how networking really works in Kubernetes and share practical insights for running production-grade clusters.
Published: August 15, 2025Terraform best practices for large-scale deployments
Terraform is a fantastic tool for Infrastructure as Code (IaC). It makes provisioning and managing resources a breeze... until it doesn't. Managing Terraform at scale is very different from running a few `.tf` files for a single project.
Published: August 09, 2025AWS Egress Cost Optimizer
In the dynamic world of cloud computing, managing and optimizing data transfer costs, especially egress (data leaving AWS), is a persistent challenge. These costs can be hazy, unpredictable, and often a significant portion of the AWS bill.
Published: August 01, 2025Azure Governance Guardian
In today's dynamic cloud environments, maintaining control over Azure resources isn't just a "nice-to-have", it's absolutely critical for security, cost optimization, and compliance. This solution provides a robust, automated framework to ensure Azure deployments consistently adhere to the organization's governance policies, in real-time.
Published: July 10, 2025EFS Backup Consistency Verification
AWS Backup is fantastic, but what happens when your EFS is constantly being written to during the backup window? You might end up with backups that aren't truly consistent, leading to potential data integrity nightmares during a restore. 😱
Published: July 04, 2025Azure Intelligent Data Governance Classification
Automate sensitive data discovery, classification, and governance across your Azure Data Lake. This solution leverages AI/ML (Azure ML, OpenAI, Cognitive Services) to bring clarity, compliance, and control to vast datasets.
Published: In ProgressGet in Touch
Have a project in mind or just want to chat about Cloud/DevOps? Feel free to reach out!
GitHub: github.com/chathushka-dilhan