GCP Landing Zone -- Architecture¶
This note covers the detailed architecture for each major GCP project setup pattern, with component breakdowns, Mermaid diagrams, and key design decisions.
1. Single Project with Single VPC¶
Architecture Summary¶
The simplest GCP deployment: all resources live in one project with one VPC network. Subnets are regional (carved from the VPC's primary and secondary CIDR ranges). All internal communication happens via internal IP addresses.
graph TD
Internet[Internet] --> LB[External Load Balancer]
LB --> VPC[VPC Network - custom mode]
subgraph VPC
subgraph Region1[Region: us-central1]
Sub1[Subnet: 10.0.1.0/24 - App Tier]
Sub2[Subnet: 10.0.2.0/24 - Data Tier]
end
Sub1 --> Sub2
end
Sub1 --> NAT[Cloud NAT]
Sub1 --> PGA[Private Google Access]
PGA --> APIs[Google APIs]
Key Services¶
- VPC (custom mode) -- define your own subnet CIDR ranges
- Cloud NAT -- outbound internet for private VMs
- Private Google Access -- reach GCP APIs without external IPs
- VPC Firewall Rules -- tier-based traffic control via network tags
- Cloud Load Balancing -- distribute incoming traffic
Recommended Configuration¶
- Use custom-mode VPC (not auto-mode) for explicit subnet control
- Separate subnets per tier (app, data, management) within each region
- Enable VPC Flow Logs for all subnets
- Use Regional Managed Instance Groups for cross-zone HA
- Apply firewall rules using network tags, not IP ranges
Real-World Example¶
A startup running a Django web application with a Cloud SQL PostgreSQL backend. Everything lives in myapp-prod. Two subnets in us-central1: one for Compute Engine MIGs serving traffic, one for Cloud SQL private IP. Cloud NAT handles package updates. A Global HTTP(S) LB serves traffic from the MIG backends.
2. Multi-VPC Architecture¶
2A. Shared VPC¶
Architecture Summary¶
A centralized networking model where a host project owns the VPC networks, subnets, firewall rules, and routes. Multiple service projects attach to the host and deploy compute resources (VMs, GKE clusters) into shared subnets. Network administration is separated from service administration.
graph TD
Org[Organization] --> HostProj[Host Project: shared-net-prod]
Org --> SP1[Service Project: payments-team]
Org --> SP2[Service Project: orders-team]
HostProj --> VPC[Shared VPC Network]
VPC --> Sub1[Subnet: 10.0.1.0/24 - us-central1]
VPC --> Sub2[Subnet: 10.0.2.0/24 - us-east1]
SP1 -->|compute.networkUser| Sub1
SP2 -->|compute.networkUser| Sub2
VPC --> VPN[Cloud VPN / Interconnect]
VPN --> OnPrem[On-Premises Network]
Key Services¶
- Shared VPC (
compute.xpnAdminrole required) - IAM --
compute.networkUserfor subnet-level delegation - Cloud DNS -- private zones in host project, accessible by service projects
- Cloud Interconnect / Cloud VPN -- centralized hybrid connectivity in host project
Recommended Configuration¶
- One host project per environment (prod, non-prod) to isolate network policies
- Grant
compute.networkUserat subnet level (not project level) for least privilege - Use organization policy constraints:
constraints/compute.restrictSharedVpcHostProjects-- limit which projects can be hostsconstraints/compute.restrictSharedVpcSubnetworks-- limit which subnets service projects can use
- Keep Cloud VPN / Interconnect gateways in the host project for centralized routing
- Use Cloud Router with custom advertisement mode to advertise service project subnets to on-premises
Real-World Example¶
A financial services company with 8 application teams. The networking-host-prod project owns the production VPC. Each team has its own service project (payments-prod, trading-prod, etc.). The networking team controls firewall rules and routing centrally. Billing is attributed to each team's service project.
2B. VPC Network Peering¶
Architecture Summary¶
A peer-to-peer connectivity model where two VPC networks exchange routes to enable internal IP communication. Peered networks remain administratively separate. Peering is not transitive -- if A peers with B and B peers with C, A cannot reach C through B.
graph LR
VPCA[VPC A: 10.0.0.0/16] <-->|Peering| VPCB[VPC B: 10.1.0.0/16]
VPCB <-->|Peering| VPCC[VPC C: 10.2.0.0/16]
VPCB -->|Export custom routes| OnPrem[On-Premises via Cloud VPN in B]
noteA[VPC A cannot reach VPC C\nPeering is not transitive]
Key Services¶
- VPC Network Peering -- managed by
compute.networkAdminrole on each side - Cloud Router -- for exporting/importing custom routes through peering
Recommended Configuration¶
- Plan CIDR ranges carefully -- subnet ranges cannot overlap across peered VPCs
- Enable
--export-custom-routesand--import-custom-routesfor on-premises transit scenarios - Use a central "transit" VPC that peers with all others for hub-and-spoke topology
- Remember: firewall rules, network tags, and service accounts do not cross peering boundaries
- Use Cloud DNS peering zones or authorize the same private zone to all peered VPCs for DNS resolution
Real-World Example¶
A SaaS provider offers a managed analytics platform. Their VPC (saas-prod) peers with each customer's VPC. Customers can access the analytics service using internal IP addresses. Each customer's VPC remains fully isolated from other customers.
2C. Private Service Connect¶
Architecture Summary¶
A service-oriented connectivity model where consumers access managed services (Google APIs, third-party SaaS, or internal services) using internal IP addresses in their own VPC. Unlike peering, PSC provides unidirectional, service-level access without exposing the full VPC. Traffic uses NAT so no IP coordination is needed between consumer and producer.
graph TD
ConsumerVPC[Consumer VPC] -->|PSC Endpoint\ninternal IP| PSC[Private Service Connect]
PSC -->|Service Attachment| ProducerLB[Producer Load Balancer]
ProducerLB --> ProducerVPC[Producer VPC: Managed Service]
ConsumerVPC -->|PSC Endpoint\nAPI bundle| GoogleAPIs[Google APIs\nCloud Storage, BigQuery, etc.]
Key Services¶
- Private Service Connect endpoints -- forwarding rules mapping internal IPs to service attachments or Google API bundles
- Private Service Connect backends -- NEGs behind consumer load balancers for advanced traffic management
- Private Service Connect interfaces -- producer-initiated connections (bidirectional)
Recommended Configuration¶
- Use PSC endpoints for Google API access instead of Private Google Access when you need multiple internal IPs or per-service routing control
- Use PSC backends when you need custom URLs, TLS certificates, or failover between regional service endpoints
- Service producers should use
--consumer-accept-liststo control which projects can connect - Combine PSC with VPC Service Controls for defense-in-depth
Real-World Example¶
A company consumes a third-party data enrichment API hosted on GCP. Instead of routing traffic over the internet, they create a PSC endpoint in their VPC pointing to the vendor's service attachment. All traffic stays within Google's network. They also use PSC to access Cloud Storage and BigQuery APIs via internal IPs.
3. Multi-Project Strategy¶
Architecture Summary¶
The GCP resource hierarchy (Organization > Folders > Projects) provides policy inheritance and delegation. A well-structured hierarchy is the foundation of enterprise cloud governance.
graph TD
Org[Organization: example.com] --> FProd[Folder: Production]
Org --> FNonProd[Folder: Non-Production]
Org --> FShared[Folder: Shared-Infrastructure]
Org --> FSandbox[Folder: Sandboxes]
FProd --> FTeamA[Folder: Team-A]
FProd --> FTeamB[Folder: Team-B]
FTeamA --> ProdProjA[Project: prod-team-a-app]
FTeamB --> ProdProjB[Project: prod-team-b-app]
FNonProd --> FDev[Folder: Development]
FNonProd --> FStaging[Folder: Staging]
FDev --> DevProj[Project: dev-team-a]
FShared --> NetProj[Project: shared-networking]
FShared --> SecProj[Project: shared-security]
FShared --> LogProj[Project: shared-logging]
Key Services¶
- Resource Manager -- manages the org/folder/project hierarchy
- Organization Policies -- constraints applied at org, folder, or project level
- IAM -- roles granted at any hierarchy level, inherited downward
- Cloud Billing -- billing accounts linked to projects; labels for cost attribution
Recommended Configuration¶
Hierarchy Design: - Top-level folders by environment (Production, Non-Production, Shared-Infrastructure) - Second-level folders by team or business unit - Limit depth to 3-4 levels for manageable policy inheritance - Use Terraform (Cloud Foundation Toolkit) for all hierarchy management
Essential Organization Policies (set at org level):
| Constraint | Purpose |
|---|---|
| constraints/compute.requireOsLogin | Enforce OS Login for SSH |
| constraints/compute.disableSerialPortAccess | Disable serial port |
| constraints/iam.disableServiceAccountKeyCreation | Prefer Workload Identity |
| constraints/compute.vmExternalIpAccess | Restrict external IPs |
| constraints/gcp.resourceLocations | Restrict resource locations |
| constraints/compute.requireShieldedVm | Enforce Shielded VMs |
| constraints/iam.allowedPolicyMemberDomains | Restrict IAM to org domains |
IAM Best Practices: - Grant roles at the highest appropriate level (org > folder > project) - Use Google Groups (not individual users) for role bindings - Use custom roles for least privilege when predefined roles are too broad - Separate duties: project creator is not project owner is not billing admin
Real-World Example¶
A global retailer adopts GCP. They create top-level folders for Production, Non-Production, and Platform. The Platform folder contains shared networking (Shared VPC host project), security tooling (SCC, VPC Service Controls), and centralized logging. Each business unit (ecommerce, supply chain, retail analytics) has a sub-folder under Production with its own projects. Org policies enforce resourceLocations to US/EU regions and disable service account key creation.
4. Multi-Zone and Multi-Region Deployment¶
Architecture Summary¶
GCP regions contain 3-4 zones. Zonal resources (Compute Engine VMs, Persistent Disks) are scoped to a single zone. Regional resources (Cloud SQL HA, Regional MIGs, GKE multi-zone clusters) automatically span zones. Multi-region resources (Cloud Spanner, multi-region Cloud Storage) span regions.
graph TD
GLB[Global HTTP/S Load Balancer] --> Region1[Region: us-central1]
GLB --> Region2[Region: europe-west1]
subgraph Region1
MIG1[Regional MIG: 3 zones]
SQL1[Cloud SQL HA: regional]
GCS1[Cloud Storage: regional]
end
subgraph Region2
MIG2[Regional MIG: 3 zones]
SQL2[Cloud SQL HA: regional]
GCS2[Cloud Storage: regional]
end
SQL1 -->|Cross-region replica| SQL2
GCS1 -->|Dual-region bucket| GCS2
Key Services¶
- Regional MIGs -- auto-distribute VMs across 3+ zones with autoscaling
- Regional Persistent Disks -- synchronously replicate data across zones
- GKE (multi-zone node pools) -- spread nodes across zones for pod HA
- Global External Application Load Balancer -- L7 routing with health-check-driven failover
- Global External Network Load Balancer -- L4 for non-HTTP workloads
- Cloud DNS -- failover routing policies based on health checks
Recommended Configuration¶
Data Layer Strategy:
| Service | Multi-Zone | Multi-Region Strategy |
|---|---|---|
| Cloud SQL | Automatic with HA config | Cross-region read replicas; manual promote for failover |
| Cloud Spanner | Automatic | Built-in multi-region (TrueTime API, 99.999% SLA) |
| AlloyDB | Automatic | Cross-region read replicas |
| Cloud Bigtable | Automatic | Replication with configurable failover policies |
| Cloud Storage | Automatic (regional) | Dual-region or multi-region buckets |
| Firestore | Automatic (multi-region native mode) | Built-in |
Compute Strategy: - Use Regional MIGs, not zonal MIGs, for all production workloads - GKE: use multi-zone node pools; consider GKE Enterprise fleet for multi-cluster management - Cloud Run: deploy to multiple regions; use global LB for routing - Configure health checks with aggressive thresholds for fast failover
Real-World Example¶
A media streaming service deploys in us-central1 (primary) and europe-west1 (secondary). Regional MIGs in each region run the API tier. A Global HTTP(S) LB routes users to the nearest healthy region. Cloud Spanner in a multi-region configuration (nam6 or eu) handles the global data layer. Cloud Storage uses dual-region buckets for media assets.
5. Disaster Recovery Patterns¶
5A. Pilot Light¶
Architecture Summary¶
A minimal footprint of critical infrastructure runs continuously in a secondary region. Only core components (database replicas, IaC templates) are active. Application tier is provisioned on demand during a disaster.
graph LR
subgraph Primary[Primary Region: us-central1]
MIG_P[Full MIG: 10 instances]
SQL_P[Cloud SQL Primary]
end
subgraph DR[DR Region: us-east1]
SQL_R[Cloud SQL Read Replica - active]
IaC[Terraform config - stored]
MIG_D[MIG: 0 instances\nscale on failover]
end
SQL_P -->|Async replication| SQL_R
GLB[Global LB] -->|Healthy| Primary
GLB -.->|Failover| DR
Key Services¶
- Cloud SQL cross-region read replicas -- async replication, manual promote
- Persistent Disk snapshots -- scheduled cross-region snapshots
- Terraform / Cloud Build -- automated provisioning of compute tier on failover
- Global HTTP(S) LB -- health-check-driven traffic rerouting
Configuration¶
- RPO: minutes to hours (depends on replication lag)
- RTO: hours (includes provisioning time for compute tier)
- Cost: low -- only database replication and storage snapshots are continuously billed
- Test failover at least quarterly
5B. Warm Standby¶
Architecture Summary¶
A fully running but scaled-down copy of the production environment operates in the secondary region. During failover, resources scale up to production capacity.
graph LR
subgraph Primary[Primary Region: us-central1]
MIG_P[MIG: 10 instances]
SQL_P[Cloud SQL Primary]
end
subgraph Warm[Warm Standby: us-east1]
MIG_W[MIG: 2 instances - scaled down]
SQL_W[Cloud SQL Read Replica - active]
end
SQL_P -->|Async replication| SQL_W
GLB[Global LB] -->|100% traffic| Primary
GLB -.->|Failover| Warm
Key Services¶
- Regional MIGs -- running at reduced capacity, scale up on failover trigger
- Cloud SQL cross-region replicas -- continuously replicating
- Global HTTP(S) LB -- automatic failover with health checks
- Cloud Monitoring + Cloud Functions -- automated scale-up triggers
Configuration¶
- RPO: minutes (async replication lag)
- RTO: minutes (resources already running, just need scale-up)
- Cost: moderate -- paying for reduced compute continuously plus full database replication
- Automate scale-up via Cloud Monitoring alerting + Cloud Functions
5C. Active-Active Multi-Region¶
Architecture Summary¶
Full production workloads run simultaneously in two or more regions. A global load balancer distributes traffic to the nearest healthy region. Data is replicated synchronously or asynchronously depending on the service.
graph TD
Users[Users Global] --> GLB[Global HTTP/S LB]
GLB -->|Nearest region| Region1[Region 1: us-central1]
GLB -->|Nearest region| Region2[Region 2: europe-west1]
subgraph Region1
MIG1[MIG: 10 instances]
Spanner1[Spanner: Multi-Region Config]
end
subgraph Region2
MIG2[MIG: 10 instances]
Spanner2[Spanner: Multi-Region Config]
end
Spanner1 <-->|Synchronous replication| Spanner2
Key Services¶
- Global External Application Load Balancer -- routes to nearest healthy region
- Cloud Spanner -- globally distributed DB with synchronous multi-region replication
- Cloud Bigtable -- replicated with configurable consistency
- Multi-region Cloud Storage -- dual-region buckets
- GKE Enterprise -- fleet-level multi-cluster management
- Cloud DNS -- geo-based or latency-based routing
Configuration¶
- RPO: near-zero (synchronous replication for critical data)
- RTO: near-zero (traffic automatically rerouted by global LB)
- Cost: highest -- full duplicate infrastructure in multiple regions
- Use Cloud Spanner or Bigtable for data layers requiring synchronous cross-region replication
- Use committed use discounts for base capacity; spot VMs for non-critical batch workloads
DR Comparison Matrix¶
| Strategy | RPO | RTO | Relative Cost | Complexity | Best For |
|---|---|---|---|---|---|
| Pilot Light | Hours | Hours | Low | Low | Cost-conscious orgs tolerating downtime |
| Warm Standby | Minutes | Minutes | Medium | Medium | Most enterprises needing fast recovery |
| Active-Active | Near-zero | Near-zero | High | Very High | Mission-critical: payments, ecommerce |
6. DMZ / Network Perimeter Patterns¶
Architecture Summary¶
GCP does not require a traditional DMZ subnet. External load balancers are Google-managed resources that sit outside your VPC. Backend instances live in private subnets with no external IPs. This enables a "DMZ-less" architecture that is the Google-recommended pattern.
graph TD
Internet[Internet] --> ExtLB[External HTTP/S LB\nGoogle-managed, outside VPC]
Internet --> IAP[Identity-Aware Proxy]
ExtLB --> FW1[Firewall: allow LB health check ranges]
IAP --> FW2[Firewall: allow IAP ranges\n35.235.240.0/20]
subgraph VPC[VPC Network]
FW1 --> AppSubnet[Private Subnet: App Tier\nNo external IPs]
FW2 --> AppSubnet
AppSubnet --> FW3[Firewall: allow from app tier only]
FW3 --> DataSubnet[Private Subnet: Data Tier\nNo external IPs]
end
AppSubnet --> NAT[Cloud NAT\nfor outbound egress]
AppSubnet --> PGA[Private Google Access\nfor GCP API access]
Key Services¶
- External Load Balancers (HTTP(S), SSL Proxy, TCP Proxy, Network LB) -- sit outside VPC, distribute to private backends
- Cloud NAT -- outbound internet for private VMs (no inbound)
- Identity-Aware Proxy (IAP) -- zero-trust SSH/RDP and web app access, replaces bastion hosts
- Private Google Access -- reach GCP APIs without external IPs
- VPC Firewall Rules -- tier-based access control using network tags
- Cloud Armor -- WAF and DDoS protection on external LBs
Recommended Configuration¶
- Place all workloads in private subnets (no external IPs)
- Use External LBs for all inbound traffic (they are outside VPC, no DMZ subnet needed)
- Use Cloud NAT for all outbound traffic (updates, external API calls)
- Use IAP tunneling for SSH/RDP access (no bastion host required)
- Use Private Google Access for reaching GCP APIs from private subnets
- Allow only LB health-check ranges and IAP ranges (
35.235.240.0/20) through firewall - Attach Cloud Armor policies to external LBs for WAF protection
7. GCP Landing Zone¶
Architecture Summary¶
A Landing Zone is the foundational setup that establishes secure, scalable, well-governed baselines for an entire GCP organization. It covers resource hierarchy, IAM, networking, security, logging, and billing.
graph TD
Org[Organization] --> LZ[Landing Zone Components]
LZ --> Hierarchy[Resource Hierarchy\nOrg > Folders > Projects]
LZ --> IAM[IAM & Groups\nLeast privilege via groups]
LZ --> Network[Networking\nShared VPC + Cloud NAT + Hybrid]
LZ --> Security[Security\nSCC + VPC SC + Binary Auth]
LZ --> Logging[Logging & Monitoring\nAggregated sinks + dashboards]
LZ --> Billing[Billing\nBudgets + labels + BigQuery export]
LZ --> IaC[Infrastructure as Code\nTerraform + CFT modules]
Key Components¶
| Component | Service | Purpose |
|---|---|---|
| Resource Hierarchy | Resource Manager | Org > Folders > Projects for policy inheritance |
| IAM | Cloud IAM + Groups | Role-based access via groups, custom roles for least privilege |
| Networking | Shared VPC + Cloud NAT | Centralized networking with private egress |
| Hybrid Connectivity | Cloud Interconnect / VPN | On-premises connectivity |
| Security Posture | Security Command Center Premium | Threat detection, vulnerability scanning |
| Data Protection | VPC Service Controls | API perimeter security, data exfiltration prevention |
| Container Security | Binary Authorization | Enforce signed container images in GKE |
| Web Security | Cloud Armor | DDoS and WAF for external-facing services |
| Logging | Cloud Logging | Aggregated log sinks to central project |
| Monitoring | Cloud Monitoring | Custom dashboards, alerting policies |
| Cost Management | Cloud Billing + BigQuery | Billing export, budgets, label-based attribution |
| Automation | Terraform + CFT + Cloud Build | IaC for all infrastructure; CI/CD pipelines |
| Access | Identity-Aware Proxy | Zero-trust access without VPN/bastion |
Recommended Implementation Order¶
- Set up Google Workspace / Cloud Identity and enable the Organization resource
- Create the folder hierarchy (Production, Non-Production, Shared-Infrastructure)
- Apply baseline Organization Policies
- Create shared infrastructure projects (networking, security, logging)
- Configure Shared VPC with subnets per environment and region
- Set up IAM groups and roles at folder level
- Enable Security Command Center, VPC Service Controls
- Configure Cloud Interconnect / VPN for hybrid connectivity
- Set up centralized logging sinks and monitoring dashboards
- Implement Terraform modules (Cloud Foundation Toolkit) for project factory and guardrails
- Create CI/CD pipelines for infrastructure changes
Real-World Example¶
A regulated financial institution adopts GCP. They deploy the full Landing Zone using the GCP Landing Zone Accelerator (open-source Terraform). The hierarchy separates Production and Non-Production at the top level. Shared VPC host projects provide networking. VPC Service Controls protect BigQuery and Cloud Storage perimeters. Security Command Center Premium detects misconfigurations. All infrastructure is managed through Terraform in a GitOps pipeline via Cloud Build. Project provisioning is self-service through a "project factory" module that enforces naming, labeling, budget, and org policy defaults.