GCP Landing Zone -- Operations¶
Deployment recipes, Terraform configurations, CLI commands, and operational best practices for each architecture pattern.
Project and Hierarchy Management¶
Project Factory Pattern¶
Use Terraform with the Cloud Foundation Toolkit (CFT) project factory module to standardize project creation:
# Key elements of a project factory
module "project" {
source = "terraform-google-modules/project-factory/google"
version = "~> 17.0"
name = "prod-payments-api"
org_id = var.org_id
folder_id = google_folder.production.id
billing_account = var.billing_account_id
# Network configuration
shared_vpc = module.shared_vpc.project_id
shared_vpc_subnets = ["projects/shared-net-prod/regions/us-central1/subnetworks/app-subnet"]
# Labels for cost attribution
labels = {
team = "payments"
environment = "prod"
cost-center = "cc-1234"
managed-by = "terraform"
}
# APIs to enable
activate_apis = [
"compute.googleapis.com",
"container.googleapis.com",
"sqladmin.googleapis.com",
"monitoring.googleapis.com",
]
}
Folder Creation¶
# Create top-level folders under the organization
gcloud resource-manager folders create \
--display-name="Production" \
--organization=ORGANIZATION_ID
gcloud resource-manager folders create \
--display-name="Non-Production" \
--organization=ORGANIZATION_ID
gcloud resource-manager folders create \
--display-name="Shared-Infrastructure" \
--organization=ORGANIZATION_ID
# Create sub-folders
gcloud resource-manager folders create \
--display-name="Team-A" \
--folder=PRODUCTION_FOLDER_ID
Baseline Organization Policies¶
Apply these at the organization level using Terraform:
# Enforce OS Login
resource "google_organization_policy" "require_os_login" {
org_id = var.org_id
constraint = "constraints/compute.requireOsLogin"
boolean_policy {
enforced = true
}
}
# Restrict resource locations to US and EU
resource "google_organization_policy" "resource_locations" {
org_id = var.org_id
constraint = "constraints/gcp.resourceLocations"
list_policy {
allow {
values = [
"in/us-locations",
"in/eu-locations",
]
}
}
}
# Disable service account key creation
resource "google_organization_policy" "disable_sa_keys" {
org_id = var.org_id
constraint = "constraints/iam.disableServiceAccountKeyCreation"
boolean_policy {
enforced = true
}
}
# Restrict external IPs
resource "google_organization_policy" "vm_external_ip" {
org_id = var.org_id
constraint = "constraints/compute.vmExternalIpAccess"
list_policy {
deny {
all = true
}
}
}
# Enforce Shielded VMs
resource "google_organization_policy" "require_shielded_vm" {
org_id = var.org_id
constraint = "constraints/compute.requireShieldedVm"
boolean_policy {
enforced = true
}
}
Networking Operations¶
Shared VPC Setup¶
# 1. Enable the host project
gcloud compute shared-vpc enable shared-net-prod
# 2. Attach service projects
gcloud compute shared-vpc associated-projects add payments-prod \
--host-project=shared-net-prod
# 3. Grant subnet-level access to service project admins
gcloud compute networks subnets add-iam-policy-binding app-subnet \
--region=us-central1 \
--network=shared-vpc \
--member="group:[email protected]" \
--role="roles/compute.networkUser"
VPC Peering Setup¶
# Create peering from network-a to network-b
gcloud compute networks peerings create peer-a-to-b \
--network=network-a \
--peer-network=network-b \
--peer-project=project-b \
--export-custom-routes \
--import-custom-routes
# Create reciprocal peering from network-b to network-a
gcloud compute networks peerings create peer-b-to-a \
--network=network-b \
--peer-network=network-a \
--peer-project=project-a \
--export-custom-routes \
--import-custom-routes
# Verify peering status
gcloud compute networks peerings list --network=network-a
Cloud NAT Configuration¶
# Create Cloud Router (required for Cloud NAT)
gcloud compute routers create nat-router \
--network=shared-vpc \
--region=us-central1
# Create Cloud NAT on the router
gcloud compute routers nats create nat-gateway \
--router=nat-router \
--region=us-central1 \
--nat-all-subnet-ip-ranges \
--auto-allocate-nat-ips
Private Service Connect for Google APIs¶
# Create a PSC endpoint to access Google APIs bundle
gcloud compute forwarding-rules create google-api-endpoint \
--network=shared-vpc \
--subnet=api-subnet \
--region=us-central1 \
--target-google-apis-bundle=ALL_GOOGLE_APIS \
--address=10.0.100.10
Compute and Orchestration¶
Regional Managed Instance Group¶
# Create instance template
gcloud compute instance-templates create app-template \
--machine-type=e2-standard-4 \
--image-family=debian-12 \
--image-project=debian-cloud \
--network=shared-vpc \
--subnet=app-subnet \
--region=us-central1 \
--no-address \
--tags=app-server
# Create regional MIG
gcloud compute instance-groups managed create app-mig \
--template=app-template \
--region=us-central1 \
--size=3 \
--health-check=http-health-check
# Set autoscaling
gcloud compute instance-groups managed set-autoscaling app-mig \
--region=us-central1 \
--min-num-replicas=3 \
--max-num-replicas=20 \
--target-cpu-utilization=0.7
GKE Multi-Zone Cluster in Shared VPC¶
gcloud container clusters create prod-cluster \
--network=shared-vpc \
--subnetwork=gke-subnet \
--region=us-central1 \
--num-nodes=2 \
--node-locations=us-central1-a,us-central1-b,us-central1-c \
--enable-ip-alias \
--enable-autoscaling \
--min-nodes=1 \
--max-nodes=10 \
--enable-shielded-nodes \
--workload-pool=prod-project.svc.id.goog \
--enable-binary-authorization
Data Layer Operations¶
Cloud SQL HA with Cross-Region Replica¶
# Create HA instance (automatic regional failover)
gcloud sql instances create prod-db \
--database-version=POSTGRES_15 \
--tier=db-custom-4-16384 \
--region=us-central1 \
--availability-type=REGIONAL \
--network=shared-vpc \
--no-assign-ip
# Create cross-region read replica for DR
gcloud sql instances create prod-db-replica \
--master-instance-name=prod-db \
--region=us-east1 \
--tier=db-custom-4-16384 \
--network=shared-vpc \
--no-assign-ip
Cloud Spanner Multi-Region¶
# Create multi-region Spanner instance
gcloud spanner instances create prod-spanner \
--config=nam6 \
--description="Production Spanner" \
--nodes=3
# nam6 = us-central1 + us-east1 (multi-region, 99.999% SLA)
# nam-eur07 = europe-west1 + europe-west4
Dual-Region Cloud Storage¶
# Create dual-region bucket for DR
gcloud storage buckets create gs://prod-assets-bucket \
--location=US-CENTRAL1+US-EAST1 \
--default-storage-class=STANDARD
DR Operations¶
Failover Runbook (Warm Standby)¶
- Detect: Cloud Monitoring alert fires (primary region health check failures)
- Verify: On-call SRE confirms primary region is degraded
- Scale up DR region:
- Shift traffic: Global LB automatically routes to healthy DR region based on health checks
- Verify: Confirm application is serving traffic from DR region
- Communicate: Notify stakeholders of failover
- Post-incident: After primary recovers, re-establish replication and plan failback
Automated Failover (Active-Active)¶
For active-active, failover is automatic. The global load balancer stops sending traffic to the unhealthy region based on health check results. No manual intervention required. The key operational task is testing:
# Run a game day / DR test
# 1. Disable health check on primary region backends
# 2. Verify traffic shifts to secondary region
# 3. Re-enable health check
# 4. Verify traffic returns to normal distribution
Monitoring and Observability¶
Centralized Logging Sink¶
# Create aggregated log sink at organization level
gcloud logging sinks create org-logs-sink \
bigquery.googleapis.com/projects/shared-logging/datasets/org_logs \
--organization=ORGANIZATION_ID \
--include-children \
--log-filter='severity>=WARNING'
Recommended Alerting Policies¶
| Alert | Condition | Severity |
|---|---|---|
| Region health check failures | >50% backends unhealthy for 2 min | Critical |
| Cross-region replication lag | Lag > 60 seconds for 5 min | Warning |
| Cloud NAT port exhaustion | Allocated ports >80% for 5 min | Warning |
| IAM policy changes | Any change to org-level IAM | Info |
| Budget threshold | Spend >80% of monthly budget | Warning |
Troubleshooting¶
Shared VPC: Service Project Cannot Reach Subnet¶
Check that the Service Project Admin has compute.networkUser on the specific subnet:
gcloud compute networks subnets get-iam-policy app-subnet \
--region=us-central1 \
--network=shared-vpc
VPC Peering: Routes Not Propagating¶
Verify both sides have matching import/export settings:
gcloud compute networks peerings list --network=network-a
# Check: exportCustomRoutes=true, importCustomRoutes=true on both sides
Cloud NAT: Connection Failures¶
Check NAT allocation and port exhaustion:
DR: Cloud SQL Replica Promote Fails¶
Ensure the replica is in a healthy state before promoting:
gcloud sql instances describe prod-db-replica --format="value(status)"
# Must be RUNNABLE before promote