GCP Landing Zone -- Architecture¶

This note covers the detailed architecture for each major GCP project setup pattern, with component breakdowns, Mermaid diagrams, and key design decisions.

1. Single Project with Single VPC¶

Architecture Summary¶

The simplest GCP deployment: all resources live in one project with one VPC network. Subnets are regional (carved from the VPC's primary and secondary CIDR ranges). All internal communication happens via internal IP addresses.

graph TD
    Internet[Internet] --> LB[External Load Balancer]
    LB --> VPC[VPC Network - custom mode]

    subgraph VPC
        subgraph Region1[Region: us-central1]
            Sub1[Subnet: 10.0.1.0/24 - App Tier]
            Sub2[Subnet: 10.0.2.0/24 - Data Tier]
        end
        Sub1 --> Sub2
    end

    Sub1 --> NAT[Cloud NAT]
    Sub1 --> PGA[Private Google Access]
    PGA --> APIs[Google APIs]

Key Services¶

VPC (custom mode) -- define your own subnet CIDR ranges
Cloud NAT -- outbound internet for private VMs
Private Google Access -- reach GCP APIs without external IPs
VPC Firewall Rules -- tier-based traffic control via network tags
Cloud Load Balancing -- distribute incoming traffic

Recommended Configuration¶

Use custom-mode VPC (not auto-mode) for explicit subnet control
Separate subnets per tier (app, data, management) within each region
Enable VPC Flow Logs for all subnets
Use Regional Managed Instance Groups for cross-zone HA
Apply firewall rules using network tags, not IP ranges

Real-World Example¶

A startup running a Django web application with a Cloud SQL PostgreSQL backend. Everything lives in myapp-prod. Two subnets in us-central1: one for Compute Engine MIGs serving traffic, one for Cloud SQL private IP. Cloud NAT handles package updates. A Global HTTP(S) LB serves traffic from the MIG backends.

2. Multi-VPC Architecture¶

2A. Shared VPC¶

Architecture Summary¶

A centralized networking model where a host project owns the VPC networks, subnets, firewall rules, and routes. Multiple service projects attach to the host and deploy compute resources (VMs, GKE clusters) into shared subnets. Network administration is separated from service administration.

graph TD
    Org[Organization] --> HostProj[Host Project: shared-net-prod]
    Org --> SP1[Service Project: payments-team]
    Org --> SP2[Service Project: orders-team]

    HostProj --> VPC[Shared VPC Network]
    VPC --> Sub1[Subnet: 10.0.1.0/24 - us-central1]
    VPC --> Sub2[Subnet: 10.0.2.0/24 - us-east1]

    SP1 -->|compute.networkUser| Sub1
    SP2 -->|compute.networkUser| Sub2

    VPC --> VPN[Cloud VPN / Interconnect]
    VPN --> OnPrem[On-Premises Network]

Key Services¶

Shared VPC (compute.xpnAdmin role required)
IAM -- compute.networkUser for subnet-level delegation
Cloud DNS -- private zones in host project, accessible by service projects
Cloud Interconnect / Cloud VPN -- centralized hybrid connectivity in host project

Recommended Configuration¶

One host project per environment (prod, non-prod) to isolate network policies
Grant compute.networkUser at subnet level (not project level) for least privilege
Use organization policy constraints:
- constraints/compute.restrictSharedVpcHostProjects -- limit which projects can be hosts
- constraints/compute.restrictSharedVpcSubnetworks -- limit which subnets service projects can use
Keep Cloud VPN / Interconnect gateways in the host project for centralized routing
Use Cloud Router with custom advertisement mode to advertise service project subnets to on-premises

Real-World Example¶

A financial services company with 8 application teams. The networking-host-prod project owns the production VPC. Each team has its own service project (payments-prod, trading-prod, etc.). The networking team controls firewall rules and routing centrally. Billing is attributed to each team's service project.

2B. VPC Network Peering¶

Architecture Summary¶

A peer-to-peer connectivity model where two VPC networks exchange routes to enable internal IP communication. Peered networks remain administratively separate. Peering is not transitive -- if A peers with B and B peers with C, A cannot reach C through B.

graph LR
    VPCA[VPC A: 10.0.0.0/16] <-->|Peering| VPCB[VPC B: 10.1.0.0/16]
    VPCB <-->|Peering| VPCC[VPC C: 10.2.0.0/16]
    VPCB -->|Export custom routes| OnPrem[On-Premises via Cloud VPN in B]

    noteA[VPC A cannot reach VPC C\nPeering is not transitive]

Key Services¶

VPC Network Peering -- managed by compute.networkAdmin role on each side
Cloud Router -- for exporting/importing custom routes through peering

Recommended Configuration¶

Plan CIDR ranges carefully -- subnet ranges cannot overlap across peered VPCs
Enable --export-custom-routes and --import-custom-routes for on-premises transit scenarios
Use a central "transit" VPC that peers with all others for hub-and-spoke topology
Remember: firewall rules, network tags, and service accounts do not cross peering boundaries
Use Cloud DNS peering zones or authorize the same private zone to all peered VPCs for DNS resolution

Real-World Example¶

A SaaS provider offers a managed analytics platform. Their VPC (saas-prod) peers with each customer's VPC. Customers can access the analytics service using internal IP addresses. Each customer's VPC remains fully isolated from other customers.

2C. Private Service Connect¶

Architecture Summary¶

A service-oriented connectivity model where consumers access managed services (Google APIs, third-party SaaS, or internal services) using internal IP addresses in their own VPC. Unlike peering, PSC provides unidirectional, service-level access without exposing the full VPC. Traffic uses NAT so no IP coordination is needed between consumer and producer.

graph TD
    ConsumerVPC[Consumer VPC] -->|PSC Endpoint\ninternal IP| PSC[Private Service Connect]
    PSC -->|Service Attachment| ProducerLB[Producer Load Balancer]
    ProducerLB --> ProducerVPC[Producer VPC: Managed Service]

    ConsumerVPC -->|PSC Endpoint\nAPI bundle| GoogleAPIs[Google APIs\nCloud Storage, BigQuery, etc.]

Key Services¶

Private Service Connect endpoints -- forwarding rules mapping internal IPs to service attachments or Google API bundles
Private Service Connect backends -- NEGs behind consumer load balancers for advanced traffic management
Private Service Connect interfaces -- producer-initiated connections (bidirectional)

Recommended Configuration¶

Use PSC endpoints for Google API access instead of Private Google Access when you need multiple internal IPs or per-service routing control
Use PSC backends when you need custom URLs, TLS certificates, or failover between regional service endpoints
Service producers should use --consumer-accept-lists to control which projects can connect
Combine PSC with VPC Service Controls for defense-in-depth

Real-World Example¶

A company consumes a third-party data enrichment API hosted on GCP. Instead of routing traffic over the internet, they create a PSC endpoint in their VPC pointing to the vendor's service attachment. All traffic stays within Google's network. They also use PSC to access Cloud Storage and BigQuery APIs via internal IPs.

3. Multi-Project Strategy¶

Architecture Summary¶

The GCP resource hierarchy (Organization > Folders > Projects) provides policy inheritance and delegation. A well-structured hierarchy is the foundation of enterprise cloud governance.

graph TD
    Org[Organization: example.com] --> FProd[Folder: Production]
    Org --> FNonProd[Folder: Non-Production]
    Org --> FShared[Folder: Shared-Infrastructure]
    Org --> FSandbox[Folder: Sandboxes]

    FProd --> FTeamA[Folder: Team-A]
    FProd --> FTeamB[Folder: Team-B]
    FTeamA --> ProdProjA[Project: prod-team-a-app]
    FTeamB --> ProdProjB[Project: prod-team-b-app]

    FNonProd --> FDev[Folder: Development]
    FNonProd --> FStaging[Folder: Staging]
    FDev --> DevProj[Project: dev-team-a]

    FShared --> NetProj[Project: shared-networking]
    FShared --> SecProj[Project: shared-security]
    FShared --> LogProj[Project: shared-logging]

Key Services¶

Resource Manager -- manages the org/folder/project hierarchy
Organization Policies -- constraints applied at org, folder, or project level
IAM -- roles granted at any hierarchy level, inherited downward
Cloud Billing -- billing accounts linked to projects; labels for cost attribution

Recommended Configuration¶

Hierarchy Design: - Top-level folders by environment (Production, Non-Production, Shared-Infrastructure) - Second-level folders by team or business unit - Limit depth to 3-4 levels for manageable policy inheritance - Use Terraform (Cloud Foundation Toolkit) for all hierarchy management

Essential Organization Policies (set at org level): | Constraint | Purpose | |---|---| | constraints/compute.requireOsLogin | Enforce OS Login for SSH | | constraints/compute.disableSerialPortAccess | Disable serial port | | constraints/iam.disableServiceAccountKeyCreation | Prefer Workload Identity | | constraints/compute.vmExternalIpAccess | Restrict external IPs | | constraints/gcp.resourceLocations | Restrict resource locations | | constraints/compute.requireShieldedVm | Enforce Shielded VMs | | constraints/iam.allowedPolicyMemberDomains | Restrict IAM to org domains |

IAM Best Practices: - Grant roles at the highest appropriate level (org > folder > project) - Use Google Groups (not individual users) for role bindings - Use custom roles for least privilege when predefined roles are too broad - Separate duties: project creator is not project owner is not billing admin

Real-World Example¶

A global retailer adopts GCP. They create top-level folders for Production, Non-Production, and Platform. The Platform folder contains shared networking (Shared VPC host project), security tooling (SCC, VPC Service Controls), and centralized logging. Each business unit (ecommerce, supply chain, retail analytics) has a sub-folder under Production with its own projects. Org policies enforce resourceLocations to US/EU regions and disable service account key creation.

4. Multi-Zone and Multi-Region Deployment¶

Architecture Summary¶

GCP regions contain 3-4 zones. Zonal resources (Compute Engine VMs, Persistent Disks) are scoped to a single zone. Regional resources (Cloud SQL HA, Regional MIGs, GKE multi-zone clusters) automatically span zones. Multi-region resources (Cloud Spanner, multi-region Cloud Storage) span regions.

graph TD
    GLB[Global HTTP/S Load Balancer] --> Region1[Region: us-central1]
    GLB --> Region2[Region: europe-west1]

    subgraph Region1
        MIG1[Regional MIG: 3 zones]
        SQL1[Cloud SQL HA: regional]
        GCS1[Cloud Storage: regional]
    end

    subgraph Region2
        MIG2[Regional MIG: 3 zones]
        SQL2[Cloud SQL HA: regional]
        GCS2[Cloud Storage: regional]
    end

    SQL1 -->|Cross-region replica| SQL2
    GCS1 -->|Dual-region bucket| GCS2

Key Services¶

Regional MIGs -- auto-distribute VMs across 3+ zones with autoscaling
Regional Persistent Disks -- synchronously replicate data across zones
GKE (multi-zone node pools) -- spread nodes across zones for pod HA
Global External Application Load Balancer -- L7 routing with health-check-driven failover
Global External Network Load Balancer -- L4 for non-HTTP workloads
Cloud DNS -- failover routing policies based on health checks

Recommended Configuration¶

Data Layer Strategy:

Service	Multi-Zone	Multi-Region Strategy
Cloud SQL	Automatic with HA config	Cross-region read replicas; manual promote for failover
Cloud Spanner	Automatic	Built-in multi-region (TrueTime API, 99.999% SLA)
AlloyDB	Automatic	Cross-region read replicas
Cloud Bigtable	Automatic	Replication with configurable failover policies
Cloud Storage	Automatic (regional)	Dual-region or multi-region buckets
Firestore	Automatic (multi-region native mode)	Built-in

Compute Strategy: - Use Regional MIGs, not zonal MIGs, for all production workloads - GKE: use multi-zone node pools; consider GKE Enterprise fleet for multi-cluster management - Cloud Run: deploy to multiple regions; use global LB for routing - Configure health checks with aggressive thresholds for fast failover

Real-World Example¶

A media streaming service deploys in us-central1 (primary) and europe-west1 (secondary). Regional MIGs in each region run the API tier. A Global HTTP(S) LB routes users to the nearest healthy region. Cloud Spanner in a multi-region configuration (nam6 or eu) handles the global data layer. Cloud Storage uses dual-region buckets for media assets.

5. Disaster Recovery Patterns¶

5A. Pilot Light¶

Architecture Summary¶

A minimal footprint of critical infrastructure runs continuously in a secondary region. Only core components (database replicas, IaC templates) are active. Application tier is provisioned on demand during a disaster.

graph LR
    subgraph Primary[Primary Region: us-central1]
        MIG_P[Full MIG: 10 instances]
        SQL_P[Cloud SQL Primary]
    end

    subgraph DR[DR Region: us-east1]
        SQL_R[Cloud SQL Read Replica - active]
        IaC[Terraform config - stored]
        MIG_D[MIG: 0 instances\nscale on failover]
    end

    SQL_P -->|Async replication| SQL_R
    GLB[Global LB] -->|Healthy| Primary
    GLB -.->|Failover| DR

Key Services¶

Cloud SQL cross-region read replicas -- async replication, manual promote
Persistent Disk snapshots -- scheduled cross-region snapshots
Terraform / Cloud Build -- automated provisioning of compute tier on failover
Global HTTP(S) LB -- health-check-driven traffic rerouting

Configuration¶

RPO: minutes to hours (depends on replication lag)
RTO: hours (includes provisioning time for compute tier)
Cost: low -- only database replication and storage snapshots are continuously billed
Test failover at least quarterly

5B. Warm Standby¶

Architecture Summary¶

A fully running but scaled-down copy of the production environment operates in the secondary region. During failover, resources scale up to production capacity.

graph LR
    subgraph Primary[Primary Region: us-central1]
        MIG_P[MIG: 10 instances]
        SQL_P[Cloud SQL Primary]
    end

    subgraph Warm[Warm Standby: us-east1]
        MIG_W[MIG: 2 instances - scaled down]
        SQL_W[Cloud SQL Read Replica - active]
    end

    SQL_P -->|Async replication| SQL_W
    GLB[Global LB] -->|100% traffic| Primary
    GLB -.->|Failover| Warm

Key Services¶

Regional MIGs -- running at reduced capacity, scale up on failover trigger
Cloud SQL cross-region replicas -- continuously replicating
Global HTTP(S) LB -- automatic failover with health checks
Cloud Monitoring + Cloud Functions -- automated scale-up triggers

Configuration¶

RPO: minutes (async replication lag)
RTO: minutes (resources already running, just need scale-up)
Cost: moderate -- paying for reduced compute continuously plus full database replication
Automate scale-up via Cloud Monitoring alerting + Cloud Functions

5C. Active-Active Multi-Region¶

Architecture Summary¶

Full production workloads run simultaneously in two or more regions. A global load balancer distributes traffic to the nearest healthy region. Data is replicated synchronously or asynchronously depending on the service.

graph TD
    Users[Users Global] --> GLB[Global HTTP/S LB]
    GLB -->|Nearest region| Region1[Region 1: us-central1]
    GLB -->|Nearest region| Region2[Region 2: europe-west1]

    subgraph Region1
        MIG1[MIG: 10 instances]
        Spanner1[Spanner: Multi-Region Config]
    end

    subgraph Region2
        MIG2[MIG: 10 instances]
        Spanner2[Spanner: Multi-Region Config]
    end

    Spanner1 <-->|Synchronous replication| Spanner2

Key Services¶

Global External Application Load Balancer -- routes to nearest healthy region
Cloud Spanner -- globally distributed DB with synchronous multi-region replication
Cloud Bigtable -- replicated with configurable consistency
Multi-region Cloud Storage -- dual-region buckets
GKE Enterprise -- fleet-level multi-cluster management
Cloud DNS -- geo-based or latency-based routing

Configuration¶

RPO: near-zero (synchronous replication for critical data)
RTO: near-zero (traffic automatically rerouted by global LB)
Cost: highest -- full duplicate infrastructure in multiple regions
Use Cloud Spanner or Bigtable for data layers requiring synchronous cross-region replication
Use committed use discounts for base capacity; spot VMs for non-critical batch workloads

DR Comparison Matrix¶

Strategy	RPO	RTO	Relative Cost	Complexity	Best For
Pilot Light	Hours	Hours	Low	Low	Cost-conscious orgs tolerating downtime
Warm Standby	Minutes	Minutes	Medium	Medium	Most enterprises needing fast recovery
Active-Active	Near-zero	Near-zero	High	Very High	Mission-critical: payments, ecommerce

6. DMZ / Network Perimeter Patterns¶

Architecture Summary¶

GCP does not require a traditional DMZ subnet. External load balancers are Google-managed resources that sit outside your VPC. Backend instances live in private subnets with no external IPs. This enables a "DMZ-less" architecture that is the Google-recommended pattern.

graph TD
    Internet[Internet] --> ExtLB[External HTTP/S LB\nGoogle-managed, outside VPC]
    Internet --> IAP[Identity-Aware Proxy]

    ExtLB --> FW1[Firewall: allow LB health check ranges]
    IAP --> FW2[Firewall: allow IAP ranges\n35.235.240.0/20]

    subgraph VPC[VPC Network]
        FW1 --> AppSubnet[Private Subnet: App Tier\nNo external IPs]
        FW2 --> AppSubnet
        AppSubnet --> FW3[Firewall: allow from app tier only]
        FW3 --> DataSubnet[Private Subnet: Data Tier\nNo external IPs]
    end

    AppSubnet --> NAT[Cloud NAT\nfor outbound egress]
    AppSubnet --> PGA[Private Google Access\nfor GCP API access]

Key Services¶

External Load Balancers (HTTP(S), SSL Proxy, TCP Proxy, Network LB) -- sit outside VPC, distribute to private backends
Cloud NAT -- outbound internet for private VMs (no inbound)
Identity-Aware Proxy (IAP) -- zero-trust SSH/RDP and web app access, replaces bastion hosts
Private Google Access -- reach GCP APIs without external IPs
VPC Firewall Rules -- tier-based access control using network tags
Cloud Armor -- WAF and DDoS protection on external LBs

Recommended Configuration¶

Place all workloads in private subnets (no external IPs)
Use External LBs for all inbound traffic (they are outside VPC, no DMZ subnet needed)
Use Cloud NAT for all outbound traffic (updates, external API calls)
Use IAP tunneling for SSH/RDP access (no bastion host required)
Use Private Google Access for reaching GCP APIs from private subnets
Allow only LB health-check ranges and IAP ranges (35.235.240.0/20) through firewall
Attach Cloud Armor policies to external LBs for WAF protection

7. GCP Landing Zone¶

Architecture Summary¶

A Landing Zone is the foundational setup that establishes secure, scalable, well-governed baselines for an entire GCP organization. It covers resource hierarchy, IAM, networking, security, logging, and billing.

graph TD
    Org[Organization] --> LZ[Landing Zone Components]

    LZ --> Hierarchy[Resource Hierarchy\nOrg > Folders > Projects]
    LZ --> IAM[IAM & Groups\nLeast privilege via groups]
    LZ --> Network[Networking\nShared VPC + Cloud NAT + Hybrid]
    LZ --> Security[Security\nSCC + VPC SC + Binary Auth]
    LZ --> Logging[Logging & Monitoring\nAggregated sinks + dashboards]
    LZ --> Billing[Billing\nBudgets + labels + BigQuery export]
    LZ --> IaC[Infrastructure as Code\nTerraform + CFT modules]

Key Components¶

Component	Service	Purpose
Resource Hierarchy	Resource Manager	Org > Folders > Projects for policy inheritance
IAM	Cloud IAM + Groups	Role-based access via groups, custom roles for least privilege
Networking	Shared VPC + Cloud NAT	Centralized networking with private egress
Hybrid Connectivity	Cloud Interconnect / VPN	On-premises connectivity
Security Posture	Security Command Center Premium	Threat detection, vulnerability scanning
Data Protection	VPC Service Controls	API perimeter security, data exfiltration prevention
Container Security	Binary Authorization	Enforce signed container images in GKE
Web Security	Cloud Armor	DDoS and WAF for external-facing services
Logging	Cloud Logging	Aggregated log sinks to central project
Monitoring	Cloud Monitoring	Custom dashboards, alerting policies
Cost Management	Cloud Billing + BigQuery	Billing export, budgets, label-based attribution
Automation	Terraform + CFT + Cloud Build	IaC for all infrastructure; CI/CD pipelines
Access	Identity-Aware Proxy	Zero-trust access without VPN/bastion

Recommended Implementation Order¶

Set up Google Workspace / Cloud Identity and enable the Organization resource
Create the folder hierarchy (Production, Non-Production, Shared-Infrastructure)
Apply baseline Organization Policies
Create shared infrastructure projects (networking, security, logging)
Configure Shared VPC with subnets per environment and region
Set up IAM groups and roles at folder level
Enable Security Command Center, VPC Service Controls
Configure Cloud Interconnect / VPN for hybrid connectivity
Set up centralized logging sinks and monitoring dashboards
Implement Terraform modules (Cloud Foundation Toolkit) for project factory and guardrails
Create CI/CD pipelines for infrastructure changes

Real-World Example¶

A regulated financial institution adopts GCP. They deploy the full Landing Zone using the GCP Landing Zone Accelerator (open-source Terraform). The hierarchy separates Production and Non-Production at the top level. Shared VPC host projects provide networking. VPC Service Controls protect BigQuery and Cloud Storage perimeters. Security Command Center Premium detects misconfigurations. All infrastructure is managed through Terraform in a GitOps pipeline via Cloud Build. Project provisioning is self-service through a "project factory" module that enforces naming, labeling, budget, and org policy defaults.

GCP Landing Zone -- Architecture¶

1. Single Project with Single VPC¶

Architecture Summary¶

Key Services¶

Recommended Configuration¶

Real-World Example¶

2. Multi-VPC Architecture¶

2A. Shared VPC¶

Architecture Summary¶

Key Services¶

Recommended Configuration¶

Real-World Example¶

2B. VPC Network Peering¶

Architecture Summary¶

Key Services¶

Recommended Configuration¶

Real-World Example¶

2C. Private Service Connect¶

Architecture Summary¶

Key Services¶

Recommended Configuration¶

Real-World Example¶

3. Multi-Project Strategy¶

Architecture Summary¶

Key Services¶

Recommended Configuration¶

Real-World Example¶

4. Multi-Zone and Multi-Region Deployment¶

Architecture Summary¶

Key Services¶

Recommended Configuration¶

Real-World Example¶

5. Disaster Recovery Patterns¶

5A. Pilot Light¶

Architecture Summary¶

Key Services¶

Configuration¶

5B. Warm Standby¶

Architecture Summary¶

Key Services¶

Configuration¶

5C. Active-Active Multi-Region¶

Architecture Summary¶

Key Services¶

Configuration¶

DR Comparison Matrix¶

6. DMZ / Network Perimeter Patterns¶

Architecture Summary¶

Key Services¶

Recommended Configuration¶

7. GCP Landing Zone¶

Architecture Summary¶

Key Components¶

Recommended Implementation Order¶

Real-World Example¶

Sources¶