Skip to content

GCP Landing Zone -- Architecture

This note covers the detailed architecture for each major GCP project setup pattern, with component breakdowns, Mermaid diagrams, and key design decisions.

1. Single Project with Single VPC

Architecture Summary

The simplest GCP deployment: all resources live in one project with one VPC network. Subnets are regional (carved from the VPC's primary and secondary CIDR ranges). All internal communication happens via internal IP addresses.

graph TD
    Internet[Internet] --> LB[External Load Balancer]
    LB --> VPC[VPC Network - custom mode]

    subgraph VPC
        subgraph Region1[Region: us-central1]
            Sub1[Subnet: 10.0.1.0/24 - App Tier]
            Sub2[Subnet: 10.0.2.0/24 - Data Tier]
        end
        Sub1 --> Sub2
    end

    Sub1 --> NAT[Cloud NAT]
    Sub1 --> PGA[Private Google Access]
    PGA --> APIs[Google APIs]

Key Services

  • VPC (custom mode) -- define your own subnet CIDR ranges
  • Cloud NAT -- outbound internet for private VMs
  • Private Google Access -- reach GCP APIs without external IPs
  • VPC Firewall Rules -- tier-based traffic control via network tags
  • Cloud Load Balancing -- distribute incoming traffic
  • Use custom-mode VPC (not auto-mode) for explicit subnet control
  • Separate subnets per tier (app, data, management) within each region
  • Enable VPC Flow Logs for all subnets
  • Use Regional Managed Instance Groups for cross-zone HA
  • Apply firewall rules using network tags, not IP ranges

Real-World Example

A startup running a Django web application with a Cloud SQL PostgreSQL backend. Everything lives in myapp-prod. Two subnets in us-central1: one for Compute Engine MIGs serving traffic, one for Cloud SQL private IP. Cloud NAT handles package updates. A Global HTTP(S) LB serves traffic from the MIG backends.

2. Multi-VPC Architecture

2A. Shared VPC

Architecture Summary

A centralized networking model where a host project owns the VPC networks, subnets, firewall rules, and routes. Multiple service projects attach to the host and deploy compute resources (VMs, GKE clusters) into shared subnets. Network administration is separated from service administration.

graph TD
    Org[Organization] --> HostProj[Host Project: shared-net-prod]
    Org --> SP1[Service Project: payments-team]
    Org --> SP2[Service Project: orders-team]

    HostProj --> VPC[Shared VPC Network]
    VPC --> Sub1[Subnet: 10.0.1.0/24 - us-central1]
    VPC --> Sub2[Subnet: 10.0.2.0/24 - us-east1]

    SP1 -->|compute.networkUser| Sub1
    SP2 -->|compute.networkUser| Sub2

    VPC --> VPN[Cloud VPN / Interconnect]
    VPN --> OnPrem[On-Premises Network]

Key Services

  • Shared VPC (compute.xpnAdmin role required)
  • IAM -- compute.networkUser for subnet-level delegation
  • Cloud DNS -- private zones in host project, accessible by service projects
  • Cloud Interconnect / Cloud VPN -- centralized hybrid connectivity in host project
  • One host project per environment (prod, non-prod) to isolate network policies
  • Grant compute.networkUser at subnet level (not project level) for least privilege
  • Use organization policy constraints:
    • constraints/compute.restrictSharedVpcHostProjects -- limit which projects can be hosts
    • constraints/compute.restrictSharedVpcSubnetworks -- limit which subnets service projects can use
  • Keep Cloud VPN / Interconnect gateways in the host project for centralized routing
  • Use Cloud Router with custom advertisement mode to advertise service project subnets to on-premises

Real-World Example

A financial services company with 8 application teams. The networking-host-prod project owns the production VPC. Each team has its own service project (payments-prod, trading-prod, etc.). The networking team controls firewall rules and routing centrally. Billing is attributed to each team's service project.

2B. VPC Network Peering

Architecture Summary

A peer-to-peer connectivity model where two VPC networks exchange routes to enable internal IP communication. Peered networks remain administratively separate. Peering is not transitive -- if A peers with B and B peers with C, A cannot reach C through B.

graph LR
    VPCA[VPC A: 10.0.0.0/16] <-->|Peering| VPCB[VPC B: 10.1.0.0/16]
    VPCB <-->|Peering| VPCC[VPC C: 10.2.0.0/16]
    VPCB -->|Export custom routes| OnPrem[On-Premises via Cloud VPN in B]

    noteA[VPC A cannot reach VPC C\nPeering is not transitive]

Key Services

  • VPC Network Peering -- managed by compute.networkAdmin role on each side
  • Cloud Router -- for exporting/importing custom routes through peering
  • Plan CIDR ranges carefully -- subnet ranges cannot overlap across peered VPCs
  • Enable --export-custom-routes and --import-custom-routes for on-premises transit scenarios
  • Use a central "transit" VPC that peers with all others for hub-and-spoke topology
  • Remember: firewall rules, network tags, and service accounts do not cross peering boundaries
  • Use Cloud DNS peering zones or authorize the same private zone to all peered VPCs for DNS resolution

Real-World Example

A SaaS provider offers a managed analytics platform. Their VPC (saas-prod) peers with each customer's VPC. Customers can access the analytics service using internal IP addresses. Each customer's VPC remains fully isolated from other customers.

2C. Private Service Connect

Architecture Summary

A service-oriented connectivity model where consumers access managed services (Google APIs, third-party SaaS, or internal services) using internal IP addresses in their own VPC. Unlike peering, PSC provides unidirectional, service-level access without exposing the full VPC. Traffic uses NAT so no IP coordination is needed between consumer and producer.

graph TD
    ConsumerVPC[Consumer VPC] -->|PSC Endpoint\ninternal IP| PSC[Private Service Connect]
    PSC -->|Service Attachment| ProducerLB[Producer Load Balancer]
    ProducerLB --> ProducerVPC[Producer VPC: Managed Service]

    ConsumerVPC -->|PSC Endpoint\nAPI bundle| GoogleAPIs[Google APIs\nCloud Storage, BigQuery, etc.]

Key Services

  • Private Service Connect endpoints -- forwarding rules mapping internal IPs to service attachments or Google API bundles
  • Private Service Connect backends -- NEGs behind consumer load balancers for advanced traffic management
  • Private Service Connect interfaces -- producer-initiated connections (bidirectional)
  • Use PSC endpoints for Google API access instead of Private Google Access when you need multiple internal IPs or per-service routing control
  • Use PSC backends when you need custom URLs, TLS certificates, or failover between regional service endpoints
  • Service producers should use --consumer-accept-lists to control which projects can connect
  • Combine PSC with VPC Service Controls for defense-in-depth

Real-World Example

A company consumes a third-party data enrichment API hosted on GCP. Instead of routing traffic over the internet, they create a PSC endpoint in their VPC pointing to the vendor's service attachment. All traffic stays within Google's network. They also use PSC to access Cloud Storage and BigQuery APIs via internal IPs.

3. Multi-Project Strategy

Architecture Summary

The GCP resource hierarchy (Organization > Folders > Projects) provides policy inheritance and delegation. A well-structured hierarchy is the foundation of enterprise cloud governance.

graph TD
    Org[Organization: example.com] --> FProd[Folder: Production]
    Org --> FNonProd[Folder: Non-Production]
    Org --> FShared[Folder: Shared-Infrastructure]
    Org --> FSandbox[Folder: Sandboxes]

    FProd --> FTeamA[Folder: Team-A]
    FProd --> FTeamB[Folder: Team-B]
    FTeamA --> ProdProjA[Project: prod-team-a-app]
    FTeamB --> ProdProjB[Project: prod-team-b-app]

    FNonProd --> FDev[Folder: Development]
    FNonProd --> FStaging[Folder: Staging]
    FDev --> DevProj[Project: dev-team-a]

    FShared --> NetProj[Project: shared-networking]
    FShared --> SecProj[Project: shared-security]
    FShared --> LogProj[Project: shared-logging]

Key Services

  • Resource Manager -- manages the org/folder/project hierarchy
  • Organization Policies -- constraints applied at org, folder, or project level
  • IAM -- roles granted at any hierarchy level, inherited downward
  • Cloud Billing -- billing accounts linked to projects; labels for cost attribution

Hierarchy Design: - Top-level folders by environment (Production, Non-Production, Shared-Infrastructure) - Second-level folders by team or business unit - Limit depth to 3-4 levels for manageable policy inheritance - Use Terraform (Cloud Foundation Toolkit) for all hierarchy management

Essential Organization Policies (set at org level): | Constraint | Purpose | |---|---| | constraints/compute.requireOsLogin | Enforce OS Login for SSH | | constraints/compute.disableSerialPortAccess | Disable serial port | | constraints/iam.disableServiceAccountKeyCreation | Prefer Workload Identity | | constraints/compute.vmExternalIpAccess | Restrict external IPs | | constraints/gcp.resourceLocations | Restrict resource locations | | constraints/compute.requireShieldedVm | Enforce Shielded VMs | | constraints/iam.allowedPolicyMemberDomains | Restrict IAM to org domains |

IAM Best Practices: - Grant roles at the highest appropriate level (org > folder > project) - Use Google Groups (not individual users) for role bindings - Use custom roles for least privilege when predefined roles are too broad - Separate duties: project creator is not project owner is not billing admin

Real-World Example

A global retailer adopts GCP. They create top-level folders for Production, Non-Production, and Platform. The Platform folder contains shared networking (Shared VPC host project), security tooling (SCC, VPC Service Controls), and centralized logging. Each business unit (ecommerce, supply chain, retail analytics) has a sub-folder under Production with its own projects. Org policies enforce resourceLocations to US/EU regions and disable service account key creation.

4. Multi-Zone and Multi-Region Deployment

Architecture Summary

GCP regions contain 3-4 zones. Zonal resources (Compute Engine VMs, Persistent Disks) are scoped to a single zone. Regional resources (Cloud SQL HA, Regional MIGs, GKE multi-zone clusters) automatically span zones. Multi-region resources (Cloud Spanner, multi-region Cloud Storage) span regions.

graph TD
    GLB[Global HTTP/S Load Balancer] --> Region1[Region: us-central1]
    GLB --> Region2[Region: europe-west1]

    subgraph Region1
        MIG1[Regional MIG: 3 zones]
        SQL1[Cloud SQL HA: regional]
        GCS1[Cloud Storage: regional]
    end

    subgraph Region2
        MIG2[Regional MIG: 3 zones]
        SQL2[Cloud SQL HA: regional]
        GCS2[Cloud Storage: regional]
    end

    SQL1 -->|Cross-region replica| SQL2
    GCS1 -->|Dual-region bucket| GCS2

Key Services

  • Regional MIGs -- auto-distribute VMs across 3+ zones with autoscaling
  • Regional Persistent Disks -- synchronously replicate data across zones
  • GKE (multi-zone node pools) -- spread nodes across zones for pod HA
  • Global External Application Load Balancer -- L7 routing with health-check-driven failover
  • Global External Network Load Balancer -- L4 for non-HTTP workloads
  • Cloud DNS -- failover routing policies based on health checks

Data Layer Strategy:

Service Multi-Zone Multi-Region Strategy
Cloud SQL Automatic with HA config Cross-region read replicas; manual promote for failover
Cloud Spanner Automatic Built-in multi-region (TrueTime API, 99.999% SLA)
AlloyDB Automatic Cross-region read replicas
Cloud Bigtable Automatic Replication with configurable failover policies
Cloud Storage Automatic (regional) Dual-region or multi-region buckets
Firestore Automatic (multi-region native mode) Built-in

Compute Strategy: - Use Regional MIGs, not zonal MIGs, for all production workloads - GKE: use multi-zone node pools; consider GKE Enterprise fleet for multi-cluster management - Cloud Run: deploy to multiple regions; use global LB for routing - Configure health checks with aggressive thresholds for fast failover

Real-World Example

A media streaming service deploys in us-central1 (primary) and europe-west1 (secondary). Regional MIGs in each region run the API tier. A Global HTTP(S) LB routes users to the nearest healthy region. Cloud Spanner in a multi-region configuration (nam6 or eu) handles the global data layer. Cloud Storage uses dual-region buckets for media assets.

5. Disaster Recovery Patterns

5A. Pilot Light

Architecture Summary

A minimal footprint of critical infrastructure runs continuously in a secondary region. Only core components (database replicas, IaC templates) are active. Application tier is provisioned on demand during a disaster.

graph LR
    subgraph Primary[Primary Region: us-central1]
        MIG_P[Full MIG: 10 instances]
        SQL_P[Cloud SQL Primary]
    end

    subgraph DR[DR Region: us-east1]
        SQL_R[Cloud SQL Read Replica - active]
        IaC[Terraform config - stored]
        MIG_D[MIG: 0 instances\nscale on failover]
    end

    SQL_P -->|Async replication| SQL_R
    GLB[Global LB] -->|Healthy| Primary
    GLB -.->|Failover| DR

Key Services

  • Cloud SQL cross-region read replicas -- async replication, manual promote
  • Persistent Disk snapshots -- scheduled cross-region snapshots
  • Terraform / Cloud Build -- automated provisioning of compute tier on failover
  • Global HTTP(S) LB -- health-check-driven traffic rerouting

Configuration

  • RPO: minutes to hours (depends on replication lag)
  • RTO: hours (includes provisioning time for compute tier)
  • Cost: low -- only database replication and storage snapshots are continuously billed
  • Test failover at least quarterly

5B. Warm Standby

Architecture Summary

A fully running but scaled-down copy of the production environment operates in the secondary region. During failover, resources scale up to production capacity.

graph LR
    subgraph Primary[Primary Region: us-central1]
        MIG_P[MIG: 10 instances]
        SQL_P[Cloud SQL Primary]
    end

    subgraph Warm[Warm Standby: us-east1]
        MIG_W[MIG: 2 instances - scaled down]
        SQL_W[Cloud SQL Read Replica - active]
    end

    SQL_P -->|Async replication| SQL_W
    GLB[Global LB] -->|100% traffic| Primary
    GLB -.->|Failover| Warm

Key Services

  • Regional MIGs -- running at reduced capacity, scale up on failover trigger
  • Cloud SQL cross-region replicas -- continuously replicating
  • Global HTTP(S) LB -- automatic failover with health checks
  • Cloud Monitoring + Cloud Functions -- automated scale-up triggers

Configuration

  • RPO: minutes (async replication lag)
  • RTO: minutes (resources already running, just need scale-up)
  • Cost: moderate -- paying for reduced compute continuously plus full database replication
  • Automate scale-up via Cloud Monitoring alerting + Cloud Functions

5C. Active-Active Multi-Region

Architecture Summary

Full production workloads run simultaneously in two or more regions. A global load balancer distributes traffic to the nearest healthy region. Data is replicated synchronously or asynchronously depending on the service.

graph TD
    Users[Users Global] --> GLB[Global HTTP/S LB]
    GLB -->|Nearest region| Region1[Region 1: us-central1]
    GLB -->|Nearest region| Region2[Region 2: europe-west1]

    subgraph Region1
        MIG1[MIG: 10 instances]
        Spanner1[Spanner: Multi-Region Config]
    end

    subgraph Region2
        MIG2[MIG: 10 instances]
        Spanner2[Spanner: Multi-Region Config]
    end

    Spanner1 <-->|Synchronous replication| Spanner2

Key Services

  • Global External Application Load Balancer -- routes to nearest healthy region
  • Cloud Spanner -- globally distributed DB with synchronous multi-region replication
  • Cloud Bigtable -- replicated with configurable consistency
  • Multi-region Cloud Storage -- dual-region buckets
  • GKE Enterprise -- fleet-level multi-cluster management
  • Cloud DNS -- geo-based or latency-based routing

Configuration

  • RPO: near-zero (synchronous replication for critical data)
  • RTO: near-zero (traffic automatically rerouted by global LB)
  • Cost: highest -- full duplicate infrastructure in multiple regions
  • Use Cloud Spanner or Bigtable for data layers requiring synchronous cross-region replication
  • Use committed use discounts for base capacity; spot VMs for non-critical batch workloads

DR Comparison Matrix

Strategy RPO RTO Relative Cost Complexity Best For
Pilot Light Hours Hours Low Low Cost-conscious orgs tolerating downtime
Warm Standby Minutes Minutes Medium Medium Most enterprises needing fast recovery
Active-Active Near-zero Near-zero High Very High Mission-critical: payments, ecommerce

6. DMZ / Network Perimeter Patterns

Architecture Summary

GCP does not require a traditional DMZ subnet. External load balancers are Google-managed resources that sit outside your VPC. Backend instances live in private subnets with no external IPs. This enables a "DMZ-less" architecture that is the Google-recommended pattern.

graph TD
    Internet[Internet] --> ExtLB[External HTTP/S LB\nGoogle-managed, outside VPC]
    Internet --> IAP[Identity-Aware Proxy]

    ExtLB --> FW1[Firewall: allow LB health check ranges]
    IAP --> FW2[Firewall: allow IAP ranges\n35.235.240.0/20]

    subgraph VPC[VPC Network]
        FW1 --> AppSubnet[Private Subnet: App Tier\nNo external IPs]
        FW2 --> AppSubnet
        AppSubnet --> FW3[Firewall: allow from app tier only]
        FW3 --> DataSubnet[Private Subnet: Data Tier\nNo external IPs]
    end

    AppSubnet --> NAT[Cloud NAT\nfor outbound egress]
    AppSubnet --> PGA[Private Google Access\nfor GCP API access]

Key Services

  • External Load Balancers (HTTP(S), SSL Proxy, TCP Proxy, Network LB) -- sit outside VPC, distribute to private backends
  • Cloud NAT -- outbound internet for private VMs (no inbound)
  • Identity-Aware Proxy (IAP) -- zero-trust SSH/RDP and web app access, replaces bastion hosts
  • Private Google Access -- reach GCP APIs without external IPs
  • VPC Firewall Rules -- tier-based access control using network tags
  • Cloud Armor -- WAF and DDoS protection on external LBs
  • Place all workloads in private subnets (no external IPs)
  • Use External LBs for all inbound traffic (they are outside VPC, no DMZ subnet needed)
  • Use Cloud NAT for all outbound traffic (updates, external API calls)
  • Use IAP tunneling for SSH/RDP access (no bastion host required)
  • Use Private Google Access for reaching GCP APIs from private subnets
  • Allow only LB health-check ranges and IAP ranges (35.235.240.0/20) through firewall
  • Attach Cloud Armor policies to external LBs for WAF protection

7. GCP Landing Zone

Architecture Summary

A Landing Zone is the foundational setup that establishes secure, scalable, well-governed baselines for an entire GCP organization. It covers resource hierarchy, IAM, networking, security, logging, and billing.

graph TD
    Org[Organization] --> LZ[Landing Zone Components]

    LZ --> Hierarchy[Resource Hierarchy\nOrg > Folders > Projects]
    LZ --> IAM[IAM & Groups\nLeast privilege via groups]
    LZ --> Network[Networking\nShared VPC + Cloud NAT + Hybrid]
    LZ --> Security[Security\nSCC + VPC SC + Binary Auth]
    LZ --> Logging[Logging & Monitoring\nAggregated sinks + dashboards]
    LZ --> Billing[Billing\nBudgets + labels + BigQuery export]
    LZ --> IaC[Infrastructure as Code\nTerraform + CFT modules]

Key Components

Component Service Purpose
Resource Hierarchy Resource Manager Org > Folders > Projects for policy inheritance
IAM Cloud IAM + Groups Role-based access via groups, custom roles for least privilege
Networking Shared VPC + Cloud NAT Centralized networking with private egress
Hybrid Connectivity Cloud Interconnect / VPN On-premises connectivity
Security Posture Security Command Center Premium Threat detection, vulnerability scanning
Data Protection VPC Service Controls API perimeter security, data exfiltration prevention
Container Security Binary Authorization Enforce signed container images in GKE
Web Security Cloud Armor DDoS and WAF for external-facing services
Logging Cloud Logging Aggregated log sinks to central project
Monitoring Cloud Monitoring Custom dashboards, alerting policies
Cost Management Cloud Billing + BigQuery Billing export, budgets, label-based attribution
Automation Terraform + CFT + Cloud Build IaC for all infrastructure; CI/CD pipelines
Access Identity-Aware Proxy Zero-trust access without VPN/bastion
  1. Set up Google Workspace / Cloud Identity and enable the Organization resource
  2. Create the folder hierarchy (Production, Non-Production, Shared-Infrastructure)
  3. Apply baseline Organization Policies
  4. Create shared infrastructure projects (networking, security, logging)
  5. Configure Shared VPC with subnets per environment and region
  6. Set up IAM groups and roles at folder level
  7. Enable Security Command Center, VPC Service Controls
  8. Configure Cloud Interconnect / VPN for hybrid connectivity
  9. Set up centralized logging sinks and monitoring dashboards
  10. Implement Terraform modules (Cloud Foundation Toolkit) for project factory and guardrails
  11. Create CI/CD pipelines for infrastructure changes

Real-World Example

A regulated financial institution adopts GCP. They deploy the full Landing Zone using the GCP Landing Zone Accelerator (open-source Terraform). The hierarchy separates Production and Non-Production at the top level. Shared VPC host projects provide networking. VPC Service Controls protect BigQuery and Cloud Storage perimeters. Security Command Center Premium detects misconfigurations. All infrastructure is managed through Terraform in a GitOps pipeline via Cloud Build. Project provisioning is self-service through a "project factory" module that enforces naming, labeling, budget, and org policy defaults.

Sources