Skip to content

Multi-Cloud Governance -- Architecture

Component breakdown, canonical topology patterns, and reference architectures for enterprises running across AWS, GCP, Alibaba Cloud, and Tencent Cloud.

Networking Patterns

Hub-and-Spoke

The most common multi-cloud topology. A central network hub (often an on-premises data center or a dedicated transit VPC/VNet) routes all inter-cloud traffic. Each cloud environment is a "spoke" connected via dedicated interconnect.

                    [ On-Prem DC / Transit Hub ]
                     /        |          \
                    /         |           \
            [ AWS TGW ]  [ GCP NCC ]  [ Alibaba CEN ]
                |            |              |
            [ VPCs ]     [ VPCs ]       [ VPCs ]

Key services per cloud:

Cloud Hub Service Interconnect Service
AWS Transit Gateway Direct Connect (1/10/100 Gbps)
GCP Network Connectivity Center (NCC) Dedicated / Partner Interconnect
Alibaba Cloud Cloud Enterprise Network (CEN) Express Connect (physical dedicated line)
Tencent Cloud Cloud Connect Network (CCN) Direct Connect

Advantages: Centralized routing policy, single audit point, clear blast-radius boundary. Disadvantages: Hub is a single point of failure; all cross-cloud traffic incurs hub hairpinning latency.

Full Mesh

Every cloud peer connects directly to every other cloud peer, typically via dedicated point-to-point circuits or an SD-WAN overlay. No central transit hub.

            [ AWS TGW ] ------ [ GCP NCC ]
                |    \          /     |
                |     \        /      |
                |    [ Alibaba CEN ]   |
                |         |            |
                +---[ Tencent CCN ]----+

Advantages: Lowest inter-cloud latency (direct paths); no single point of failure. Disadvantages: O(n^2) circuit management; complex routing tables; higher cost at scale.

Transit (Backbone)

A provider-agnostic backbone -- often a third-party fabric like Equinix Fabric, Megaport, or PacketFabric -- acts as the Layer 2/3 transit layer. Each cloud connects to the nearest fabric node via its dedicated interconnect service. The backbone provides any-to-any reachability with per-circuit QoS and bandwidth policies.

            [ Equinix Fabric / Megaport Backbone ]
              /         |            |          \
        [ AWS DC ]  [ GCP DC ]  [ Alibaba PoP ] [ Tencent PoP ]
             |           |            |              |
          [ VPCs ]    [ VPCs ]     [ VPCs ]       [ VPCs ]

Advantages: Centralized bandwidth management; sub-cloud provisioning; any-to-any reachability without full mesh circuits. Disadvantages: Added cost of third-party fabric; dependency on fabric provider SLA.

SD-WAN Overlay

For enterprises that cannot justify dedicated circuits at every edge, SD-WAN solutions (Cisco Viptela, VMware Velocloud, Palo Alto Prisma SD-WAN, Fortinet Secure SD-WAN) create encrypted IPsec/GRE tunnels over the public internet between cloud VPCs and branch offices.

Advantages: Rapid provisioning; internet-based (no colo requirement); built-in WAN optimization. Disadvantages: Higher and variable latency; bandwidth limited by internet path; not suitable for latency-sensitive workloads.

Decision Matrix

Factor Hub-Spoke Full Mesh Transit Backbone SD-WAN Overlay
Latency Medium (hairpin) Low (direct) Low-Medium High-Variable
Complexity Low High Medium Low-Medium
Cost Medium High Medium-High Low
Blast Radius Hub is SPOF Isolated per link Fabric is SPOF Per-tunnel
Use Case Regulated enterprise Low-latency apps Global scale Branch / DR

Key Interconnect Services Reference

Service Cloud Bandwidth Protocol
AWS Direct Connect AWS 1/10/100 Gbps 802.1Q VLAN, BGP
AWS Transit Gateway Connect AWS Up to 50 Gbps per attachment GRE/BGP
GCP Dedicated Interconnect GCP 10/100 Gbps per link 802.1Q, BGP4
GCP Partner Interconnect GCP 50 Mbps -- 50 Gbps Varies by partner
GCP Cross-Cloud Interconnect GCP 10 Gbps (GA) Dedicated to other clouds
Alibaba Cloud Express Connect Alibaba 1/10/100 Gbps Physical dedicated line, BGP
Alibaba Cloud CEN Alibaba Bandwidth packages Transit routing
Tencent Cloud Direct Connect Tencent 1/10/100 Gbps Physical dedicated line, BGP
Tencent Cloud CCN Tencent Bandwidth limits per instance Transit routing
Equinix Fabric Vendor-neutral 50 Mbps -- 100 Gbps Software-defined L2/L3
Megaport Vendor-neutral 1 Mbps -- 100 Gbps Software-defined L2

DNS and Traffic Management

Provider DNS Services

Service Cloud Key Routing Policies
Amazon Route 53 AWS Latency, geolocation, weighted, failover, multivalue, IP-based
Google Cloud DNS GCP Geolocation, weighted round-robin (via Traffic Director / Cloud CDN)
Google Cloud Global LB GCP L7/LB with cross-region failover
Alibaba Cloud DNS (Alidns) Alibaba Geolocation, weighted, ISP-line routing (telecom/unicom/mobile)
Tencent DNSPod Tencent Geolocation, weighted, ISP-line routing, search-engine lines
Cloudflare DNS Vendor-neutral Geolocation, weighted, failover, load balancing
NS1 (IBM) Vendor-neutral Filter chains, regional targeting, Pulsar active telemetry

Multi-Cloud DNS Strategy Patterns

1. Delegated Subdomain per Cloud

Each cloud owns a subdomain zone (e.g., aws.example.com, gcp.example.com, cn.example.com). A global apex zone delegates NS records per subdomain to the respective cloud DNS service.

example.com (Route 53 / Cloudflare)
  ├── aws.example.com   NS → Route 53 hosted zone
  ├── gcp.example.com   NS → Cloud DNS zone
  ├── cn.example.com    NS → Alidns zone
  └── global.example.com → GSLB (weighted/latency routing across clouds)

2. GSLB Overlay with Health Checks

A global DNS layer (Route 53, Cloudflare, NS1) resolves global.example.com by evaluating health-check endpoints in each cloud. On failure, traffic shifts to the next healthy cloud within TTL convergence time.

  • Route 53 health checks: HTTP/HTTPS/TCP, configurable intervals (10s-30s), failure threshold.
  • Cloudflare Load Balancing: Health checks per pool, failover steering, session affinity.
  • NS1 Filter Chain: Programmatic DNS decisions based on telemetry feeds (Pulsar), availability, and geolocation.

3. China-Specific DNS Consideration

Mainland China DNS resolution for example.cn domains requires ICP filing. Alidns and DNSPod both support ISP-line resolution (routing queries from China Telecom, China Unicom, China Mobile to the nearest endpoint). For multi-cloud within China, DNSPod or Alidns serve as the apex; for global-with-China architectures, a split-horizon DNS configuration is typical -- international queries resolve via Route 53/Cloudflare; China queries resolve via Alidns/DNSPod.

Traffic-Management Tools Beyond DNS

Tool Layer Multi-Cloud
Istio L7 (service mesh) Cross-cluster multi-primary on different clouds
Cilium L3-L7 (CNI + service mesh) Cluster mesh across clouds via tunnel or direct routing
Google Cloud Traffic Director L7 Can manage Envoy proxies outside GCP
AWS App Mesh L7 Tied to AWS; can manage Envoys on EKS/ECS
Kong Gateway / Traefik L7 API Gateway Cloud-agnostic; deploy anywhere

Infrastructure-as-Code

Terraform / OpenTofu

The dominant multi-cloud IaC tool. HCL configurations declare resources across providers in a single state or split across workspaces. OpenTofu is the Linux-foundation-governed fork created after HashiCorp's BSL license change in 2023.

Multi-provider configuration example:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
    alicloud = {
      source  = "aliyun/alicloud"
      version = "~> 1.220"
    }
    tencentcloud = {
      source  = "tencentcloudstack/tencentcloud"
      version = "~> 1.80"
    }
  }
}

provider "aws" {
  region = "ap-southeast-1"
}

provider "google" {
  project = "my-project"
  region  = "asia-southeast1"
}

provider "alicloud" {
  region = "cn-hangzhou"
}

provider "tencentcloud" {
  region = "ap-guangzhou"
}

Multi-cloud deployment with identity tokens (Terraform Stacks):

identity_token "aws" {
  audience = ["aws.workload.identity"]
}

identity_token "gcp" {
  audience = ["gcp.workload.identity"]
}

deployment "multi_cloud" {
  inputs = {
    aws_token = identity_token.aws.jwt
    gcp_token = identity_token.gcp.jwt
    aws_role  = "arn:aws:iam::123456789012:role/terraform-role"
    gcp_sa    = "[email protected]"
  }
}

State isolation strategy: Use separate workspaces or separate state backends per cloud to limit blast radius. Common backend choices: S3 (AWS), GCS (GCP), OSS (Alibaba), COS (Tencent).

Pulumi

General-purpose IaC using TypeScript, Python, Go, C#, or Java. Same provider ecosystem as Terraform (bridged providers) plus native providers (AWS Native, Azure Native) that offer same-day support for new cloud services.

Multi-cloud provider configuration (TypeScript):

import * as aws from "@pulumi/aws";
import * as gcp from "@pulumi/gcp";
import * as alicloud from "@pulumi/alicloud";

const awsProvider = new aws.Provider("aws-provider", {
  region: "ap-southeast-1",
});

const gcpProvider = new gcp.Provider("gcp-provider", {
  project: "my-project",
  region: "asia-southeast1",
});

const aliProvider = new alicloud.Provider("ali-provider", {
  region: "cn-hangzhou",
});

Pulumi ESC (Environments, Secrets, and Configuration) provides dynamic OIDC credentials for AWS, Azure, and GCP, eliminating static access keys. Stack references allow one stack to consume outputs from another, enabling dependency tracking across cloud boundaries.

Crossplane

Kubernetes-native IaC. Infrastructure is declared as CRDs and reconciled by provider controllers running inside the cluster. Crossplane Compositions allow platform teams to create abstract, cloud-agnostic APIs (XRDs) that map to cloud-specific managed resources underneath.

Multi-cloud composition concept:

# Abstract XRD -- cloud-agnostic
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
  name: xdatastores.example.org
spec:
  group: example.org
  names:
    kind: XDataStore
  versions:
  - name: v1alpha1
    served: true
    referenceable: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              engine:
                type: string
                enum: ["mysql", "postgresql"]
              size:
                type: string
              cloudProvider:
                type: string
                enum: ["aws", "gcp", "alibaba"]

Provider ecosystem (2025-2026):

Provider Maturity Notes
provider-upjet-aws High Full AWS coverage
provider-upjet-gcp High Full GCP coverage
provider-upjet-azure High Full Azure coverage
provider-alicloud Medium Community-maintained, growing coverage
provider-tencentcloud Low-Medium Community contribution, limited resources

Crossplane is strongest when the organization has already standardized on Kubernetes as its platform layer. It integrates natively with GitOps tools (ArgoCD, Flux) since CRDs are just Kubernetes objects.

Decision Matrix

Factor Terraform/OpenTofu Pulumi Crossplane
Learning curve Low (HCL) Medium (code) High (K8s CRDs)
Multi-cloud maturity Highest High Growing (AWS/GCP/Azure strong, Alibaba/Tencent developing)
GitOps native Via Atlantis Via Pulumi Deployments Native (CRDs)
State management External backend Pulumi Cloud or self-managed Kubernetes etcd
Platform abstraction Modules ComponentResource Compositions + XRDs
Best fit General infra teams Developer-heavy teams Platform engineering / K8s-first orgs

CI/CD and GitOps

GitOps Patterns for Multi-Cloud

GitOps uses Git as the single source of truth for declarative infrastructure and application state. Automated controllers (ArgoCD, Flux) continuously reconcile the live state with the desired state in Git.

Hub-and-Spoke GitOps:

A central management cluster runs ArgoCD/Flux, which deploys to remote target clusters across clouds via ApplicationSets or Flux's Kustomization resources.

[ Management Cluster (ArgoCD/Flux) ]
        |                |               \
   [ AWS EKS ]     [ GCP GKE ]    [ Alibaba ACK ]
  • ArgoCD ApplicationSets with git-directory or matrix generators create one Application per target cluster.
  • Flux Kustomization resources with spec.kubeConfig reference kubeconfigs for remote clusters.

Per-Cluster GitOps:

Each cluster runs its own ArgoCD/Flux instance. Simpler blast radius but harder to enforce global policy. Useful for regulated environments where clusters must be autonomous.

Progressive Delivery:

  • Argo Rollouts: Canary / blue-green / experiment strategies with metric analysis.
  • Flagger (with Flux): Automated canary deployments with Prometheus metric analysis.
  • ArgoCD Image Updater / Flux Image Automation: Automatically update image tags in Git when new images are published.

Multi-Cloud CI Pipeline Structure

[ Git Push ]
     |
[ CI Pipeline (GitHub Actions / GitLab CI) ]
  ├── Build container image
  ├── Run tests
  ├── Push image to multi-cloud registries
  │   ├── ECR (AWS)
  │   ├── Artifact Registry (GCP)
  │   └── ACR / Alibaba Container Registry
  ├── Update GitOps manifest (image tag)
  └── GitOps controller detects change and rolls out

Secret management in GitOps: Sealed Secrets (Bitnami), SOPS (Mozilla) with KMS per cloud, or External Secrets Operator syncing from HashiCorp Vault or cloud-native secret stores (AWS Secrets Manager, GCP Secret Manager, Alibaba KMS).

Tool Reference

Tool Type Multi-Cluster License
ArgoCD Pull-based GitOps ApplicationSets, Sync Waves Apache 2.0
Flux Pull-based GitOps Remote Kustomization, KubeConfig refs Apache 2.0
Argo Rollouts Progressive delivery Multi-cluster rollouts Apache 2.0
Flagger Progressive delivery Per-cluster, pairs with Flux MIT
External Secrets Operator Secret syncing Multi-cloud secret store backends Apache 2.0
Sealed Secrets In-cluster encryption Cloud-agnostic Apache 2.0
SOPS File-level encryption KMS per cloud MPL 2.0

Reference Architecture

A vendor-neutral enterprise multi-cloud architecture built on CNCF and open-source components.

graph TB
    subgraph "Identity Layer"
        IdP[IdP: Okta / Entra ID / Keycloak]
        SPIFFE[SPIFFE / SPIRE]
    end

    subgraph "Control Plane"
        Git[Git Repository]
        CI[CI Pipeline]
        ArgoCD[ArgoCD / Flux]
        IaC[Terraform / Crossplane]
    end

    subgraph "Networking Layer"
        Fabric[Equinix Fabric / Megaport]
        DNS[Global DNS: Route 53 / Cloudflare]
    end

    subgraph "Workload Plane"
        AWS[AWS EKS]
        GCP[GCP GKE]
        ALI[Alibaba ACK]
        TENCENT[Tencent TKE]
    end

    subgraph "Observability Plane"
        OTel[OTel Collectors]
        Backend[Grafana LGTM / Datadog]
    end

    subgraph "Policy Layer"
        OPA[OPA / Gatekeeper]
        Sentinel[Sentinel / Cloud Org Policies]
    end

    IdP -->|SAML/OIDC| AWS
    IdP -->|SAML/OIDC| GCP
    IdP -->|SAML/OIDC| ALI
    IdP -->|SAML/OIDC| TENCENT

    Git --> CI --> ArgoCD
    ArgoCD --> AWS
    ArgoCD --> GCP
    ArgoCD --> ALI
    ArgoCD --> TENCENT

    IaC --> AWS
    IaC --> GCP
    IaC --> ALI
    IaC --> TENCENT

    Fabric --> AWS
    Fabric --> GCP
    Fabric --> ALI
    Fabric --> TENCENT

    DNS --> AWS
    DNS --> GCP
    DNS --> ALI
    DNS --> TENCENT

    OTel --> Backend
    AWS --> OTel
    GCP --> OTel
    ALI --> OTel
    TENCENT --> OTel

    OPA --> AWS
    OPA --> GCP
    OPA --> ALI
    OPA --> TENCENT

Layer Descriptions

Layer Components Purpose
Identity IdP (SAML/OIDC), SPIFFE/SPIRE Unified human + workload identity
Control Plane Git, CI, GitOps controller, IaC engine Declarative intent and reconciliation
Networking Interconnect fabric, global DNS Cross-cloud connectivity and traffic steering
Workload Plane Managed K8s clusters per cloud Application runtime
Observability OTel Collectors, central backend Traces, metrics, logs, profiles
Policy OPA/Gatekeeper, Sentinel, org policies Governance, compliance, cost guardrails

How It Works

  1. Identity federation: The IdP issues SAML assertions (human SSO) and OIDC tokens (workload identity) to each cloud. SPIRE agents on each workload node provide mTLS workload identity independent of cloud IAM.

  2. Infrastructure provisioning: Terraform or Crossplane declares VPCs, databases, queues, and other cloud resources. Changes land in Git, CI validates with terraform plan or crossplane dry-run, and applies after approval.

  3. Application delivery: CI builds images, pushes to per-cloud registries, and updates GitOps manifests. ArgoCD/Flux reconciles manifests to clusters across all clouds.

  4. Networking: Equinix Fabric or Megaport provides any-to-any private connectivity. Global DNS routes users to the nearest healthy cloud endpoint.

  5. Observability: OTel Collectors in each cluster collect traces, metrics, and logs. They export to a central backend (Grafana LGTM stack or a SaaS observability platform) via OTLP.

  6. Policy enforcement: OPA/Gatekeeper enforces Kubernetes-level policies (resource limits, allowed images, labeling). Cloud-native org policies (AWS SCPs, Azure Policy, GCP Org Policy, Alibaba RAM policies) enforce cloud-resource-level governance. HashiCorp Sentinel adds policy-as-code guardrails for Terraform operations.