Skip to content

Alibaba Cloud -- Operations

Deployment patterns, CLI recipes, monitoring, and troubleshooting for day-2 operations on Alibaba Cloud (Aliyun).


Deployment Patterns

Resource Directory & Landing Zone Deployment

Cloud Governance Center provides a Landing Zone setup wizard that provisions the core account structure automatically. For IaC-driven deployments, use either ROS (Resource Orchestration Service) or the Terraform Alibaba Cloud provider.

# Create a Resource Directory folder (OU) via CLI
aliyun resourcemanager CreateFolder \
  --ParentFolderId fd-root \
  --FolderName "Workloads"

# Create a member account under that folder
aliyun resourcemanager CreateResourceAccount \
  --DisplayName "bu-a-prod" \
  --FolderId fd-XXXXXXXX \
  --AccountNamePrefix bu-a-prod

# Apply a control policy to an OU
aliyun resourcemanager AttachControlPolicy \
  --PolicyId cp-XXXXXXXX \
  --TargetId fd-XXXXXXXX

Terraform Provider

The alicloud Terraform provider covers most services. A typical multi-account setup uses the alicloud_resource_manager_* resources combined with alicloud_cen_* for networking.

provider "alicloud" {
  region     = "cn-hangzhou"
  access_key = var.access_key
  secret_key = var.secret_key
}

resource "alicloud_vpc" "main" {
  vpc_name   = "prod-vpc"
  cidr_block = "10.0.0.0/16"
}

resource "alicloud_vswitch" "app" {
  vpc_id     = alicloud_vpc.main.id
  cidr_block = "10.0.1.0/24"
  zone_id    = "cn-hangzhou-h"
}

CLI & SDK

Installation

# Install aliyun CLI (macOS)
brew install aliyun-cli

# Configure a profile
aliyun configure set \
  --profile prod \
  --mode AK \
  --region cn-hangzhou \
  --access-key-id LTAI5tXXXXXXXXXX \
  --access-key-secret XXXXXXXXXXXXXXXXXXXXXXXX

ECS (Elastic Compute Service)

# List all ECS instances in cn-hangzhou
aliyun ecs DescribeInstances --RegionId cn-hangzhou --output cols=InstanceId,InstanceName,Status

# Start a stopped instance
aliyun ecs StartInstance --InstanceId i-bp1xxxxxxxxxxxxxxxxx

# Create a snapshot of a disk
aliyun ecs CreateSnapshot --DiskId d-bp1xxxxxxxxxxxxxxxxx --SnapshotName "pre-deploy-2026-04-17"

VPC & Networking

# List VPCs in a region
aliyun vpc DescribeVpcs --RegionId cn-hangzhou

# Describe route table entries for a VPC
aliyun vpc DescribeRouteTableList --VpcId vpc-bp1xxxxxxxxxxxxxxxxx

# Create a security group rule (allow HTTPS inbound)
aliyun ecs AuthorizeSecurityGroup \
  --SecurityGroupId sg-bp1xxxxxxxxxxxxxxxxx \
  --IpProtocol tcp \
  --PortRange 443/443 \
  --SourceCidrIp 0.0.0.0/0 \
  --Policy Accept

SLB (Server Load Balancer)

# List all SLB instances
aliyun slb DescribeLoadBalancers --RegionId cn-hangzhou

# Add a backend server to an SLB
aliyun slb AddBackendServers \
  --LoadBalancerId lb-bp1xxxxxxxxxxxxxxxxx \
  --BackendServers '[{"ServerId":"i-bp1xxxxxxxxxxxxxxxxx","Weight":"100"}]'

RDS (ApsaraDB for RDS)

# List all RDS instances
aliyun rds DescribeDBInstances --RegionId cn-hangzhou

# Create a manual backup
aliyun rds CreateBackup --DBInstanceId rm-bp1xxxxxxxxxxxxxxxxx --BackupMethod Physical

# Switch to a standby instance (planned failover)
aliyun rds SwitchDBInstanceHA --DBInstanceId rm-bp1xxxxxxxxxxxxxxxxx --NodeId xxxxx

CEN (Cloud Enterprise Network)

# List CEN instances
aliyun cbn DescribeCens

# List Transit Router route table entries
aliyun cbn ListTransitRouterRouteTableAssociations \
  --TransitRouterId tr-bp1xxxxxxxxxxxxxxxxx

# Attach a VPC to a Transit Router
aliyun cbn CreateTransitRouterVpcAttachment \
  --CenId cen-xxxxxxxxxxxxxxxxx \
  --TransitRouterId tr-bp1xxxxxxxxxxxxxxxxx \
  --VpcId vpc-bp1xxxxxxxxxxxxxxxxx \
  --ZoneMappings '[{"ZoneId":"cn-hangzhou-h","VSwitchId":"vsw-bp1xxxxxxxxxxxxxxxxx"}]'

Monitoring & Alerting

CloudMonitor

CloudMonitor collects host-level and service-level metrics automatically. Custom metrics can be pushed via the PutCustomMetric API.

# List available metric definitions for ECS
aliyun cms DescribeMetricMetaList --Namespace acs_ecs_dashboard

# Query CPU utilization for an ECS instance (last hour)
aliyun cms DescribeMetricLast \
  --Namespace acs_ecs_dashboard \
  --MetricName CPUUtilization \
  --Dimensions '[{"instanceId":"i-bp1xxxxxxxxxxxxxxxxx"}]' \
  --Period 300

# Create an alarm rule for CPU > 80% over 3 periods
aliyun cms PutResourceMetricRule \
  --RuleId cpu-high-prod \
  --RuleName "CPU > 80%" \
  --Namespace acs_ecs_dashboard \
  --MetricName CPUUtilization \
  --Escalations.Critical.ComparisonOperator GreaterThanThreshold \
  --Escalations.Critical.Threshold 80 \
  --Escalations.Critical.Times 3 \
  --Period 300 \
  --ContactGroups '["ops-team"]'

SLS (Simple Log Service)

SLS is Alibaba Cloud's centralized log management service. It provides real-time log collection, search, dashboards, and alerting.

# Create a log project
aliyun sls CreateProject --body '{"projectName":"prod-logs","description":"Production logging"}'

# Create a logstore within the project
aliyun sls CreateLogStore \
  --project prod-logs \
  --body '{"logstoreName":"app-logs","ttl":90,"shardCount":2}'

# Query logs (SLS query language)
aliyun sls GetLogs \
  --project prod-logs \
  --logstore app-logs \
  --from 1713340800 \
  --to 1713344400 \
  --query "status >= 500 | SELECT count(*) as error_count, host GROUP BY host"

Centralized logging across accounts

Use SLS cross-account log delivery to stream ActionTrail and application logs from all member accounts to a central Log Archive account's SLS project. Configure log delivery rules in the Resource Directory management account.


Troubleshooting

CEN Connectivity Issues

Symptom Likely Cause Resolution
VPC-to-VPC ping fails across CEN Missing route table association or propagation Check TR route tables: aliyun cbn ListTransitRouterRouteTables --TransitRouterId tr-xxx. Verify the VPC attachment is associated with the correct route table and that route propagation is enabled.
Cross-region traffic drops No bandwidth package or bandwidth exhausted Verify bandwidth package allocation: aliyun cbn DescribeCenBandwidthPackages --CenId cen-xxx. Purchase or increase bandwidth.
Asymmetric routing through firewall VPC Custom route tables not directing return traffic Ensure both inbound and outbound custom route tables point return traffic through the firewall VPC's TR attachment.

Cross-Region Replication Failures

# Check DTS synchronization status
aliyun dts DescribeSynchronizationJobStatus \
  --SynchronizationJobId dtsi-bp1xxxxxxxxxxxxxxxxx

# Check OSS CRR status
aliyun oss GetBucketReplication --bucket source-bucket --region cn-hangzhou

OSS CRR requires versioning

Cross-region replication will fail silently if versioning is not enabled on both source and destination buckets. Always verify versioning status before enabling CRR.

NAT Gateway Troubleshooting

Symptom Likely Cause Resolution
Private instances cannot reach Internet No SNAT entry for the vSwitch Create an SNAT entry: aliyun vpc CreateSnatEntry --SnatTableId stb-xxx --SnatIp 47.x.x.x --SourceVSwitchId vsw-xxx
SNAT connections exhausted High concurrency exceeding NAT Gateway capacity Upgrade NAT Gateway specification or add additional EIPs to the SNAT pool
DNAT port forwarding not working Security Group blocking the forwarded port Verify the target ECS instance's Security Group allows inbound traffic on the DNAT port

General Diagnostic Commands

# Describe an ECS instance's network interfaces
aliyun ecs DescribeNetworkInterfaces \
  --InstanceId i-bp1xxxxxxxxxxxxxxxxx \
  --RegionId cn-hangzhou

# Check security group rules applied to an instance
aliyun ecs DescribeSecurityGroupAttribute \
  --SecurityGroupId sg-bp1xxxxxxxxxxxxxxxxx \
  --RegionId cn-hangzhou

# Verify VPC flow logs are enabled
aliyun vpc DescribeFlowLogs --RegionId cn-hangzhou --ResourceId vpc-bp1xxxxxxxxxxxxxxxxx