Highly available DNS service with health checking and routing policies.
Routing Policies:
Health Checks: Monitor endpoint health, trigger failover. Can monitor endpoints, other health checks, or CloudWatch alarms.
| Strategy | RPO | RTO | Cost | Description |
|---|---|---|---|---|
| Backup and Restore | Hours | Hours | Lowest | Back up data, restore when needed |
| Pilot Light | Minutes | Tens of minutes | Low | Core services running at minimum, scale up when needed |
| Warm Standby | Seconds-Minutes | Minutes | Medium | Scaled-down version running, scale up when needed |
| Active-Active (Multi-Site) | Near zero | Near zero | Highest | Full production in multiple regions |
RPO (Recovery Point Objective): Maximum acceptable data loss measured in time. RTO (Recovery Time Objective): Maximum acceptable downtime.
Create a Multi-AZ RDS MySQL instance and verify failover behavior.
20 minutes
Step 1: Create a DB subnet group
aws rds create-db-subnet-group \
--db-subnet-group-name saa-db-subnet-group \
--db-subnet-group-description "SAA Study DB Subnet Group" \
--subnet-ids <subnet-1> <subnet-2>
Step 2: Create a Multi-AZ RDS instance
aws rds create-db-instance \
--db-instance-identifier saa-study-db \
--db-instance-class db.t3.micro \
--engine mysql \
--master-username admin \
--master-user-password StudyPass123 \
--allocated-storage 20 \
--multi-az \
--db-subnet-group-name saa-db-subnet-group \
--no-publicly-accessible
Step 3: Check the instance status and AZ
aws rds describe-db-instances \
--db-instance-identifier saa-study-db \
--query 'DBInstances[0].[DBInstanceStatus,AvailabilityZone,MultiAZ,SecondaryAvailabilityZone]'
Step 4: Simulate failover (optional)
aws rds reboot-db-instance \
--db-instance-identifier saa-study-db \
--force-failover
aws rds delete-db-instance \
--db-instance-identifier saa-study-db \
--skip-final-snapshot
# Wait for deletion to complete
aws rds delete-db-subnet-group --db-subnet-group-name saa-db-subnet-group
| # | Question | Answer |
|---|---|---|
| 1 | What is the difference between RPO and RTO? | RPO: max acceptable data loss (time). RTO: max acceptable downtime. |
| 2 | Which DR strategy has the lowest cost? | Backup and Restore |
| 3 | Which DR strategy has near-zero RPO and RTO? | Active-Active (Multi-Site) |
| 4 | Can you read from an RDS Multi-AZ standby? | No. The standby is for failover only. Use read replicas for read scaling. |
| 5 | How many copies of data does Aurora maintain? | 6 copies across 3 AZs |
| 6 | What does RDS Proxy do? | Pools database connections, reduces failover time by up to 66%, enforces IAM auth |
| 7 | What is Route 53 failover routing? | Active-passive failover using health checks to route traffic to healthy endpoints |
| 8 | What is immutable infrastructure? | Replace instances instead of updating in place. Use AMIs and Auto Scaling. |
| 9 | What does AWS X-Ray do? | Distributed tracing for analyzing requests across microservices |
| 10 | What is Aurora Global Database RPO? | 1 second |
A company requires a disaster recovery solution with an RTO of less than 1 minute and an RPO of less than 5 seconds for their critical database. Which solution meets these requirements?
Correct: C
Aurora Global Database provides cross-region replication with RPO of 1 second and RTO under 1 minute. RDS Multi-AZ provides HA within a region but not cross-region DR with sub-minute RTO. Cross-region read replicas have higher promotion time. DynamoDB Global Tables are for DynamoDB, not RDS workloads.
Domain: 2 — Design Resilient Architectures Task: 2.2
A company has a web application deployed in us-east-1. They want to route users to the closest region for lowest latency, with automatic failover if a region becomes unhealthy. Which Route 53 routing policy should they use?
Correct: C
Latency-based routing directs users to the region with the lowest latency. Combined with health checks, it provides automatic failover to the next-best region if the primary becomes unhealthy. Simple routing does not support health checks. Weighted routing distributes by percentage, not latency. Geolocation routes by location, not latency.
Domain: 2 — Design Resilient Architectures Task: 2.2
A company has a Lambda-based application that connects to an RDS database. During traffic spikes, the database runs out of connections. Which service should the architect recommend?
Correct: B
RDS Proxy pools and shares database connections, preventing connection exhaustion during traffic spikes. It is specifically designed for Lambda-to-RDS scenarios where many short-lived connections are created. ElastiCache caches data but does not manage connections. Migrating to DynamoDB or Aurora Serverless is a larger change than needed.
Domain: 2 — Design Resilient Architectures Task: 2.2
A company wants to deploy their application across multiple AZs with automatic recovery if an AZ fails. The application runs on EC2 instances. Which combination provides this capability?
Correct: B
An Auto Scaling group spanning multiple AZs automatically replaces unhealthy instances and maintains desired capacity across AZs. The ALB distributes traffic to healthy instances. Single AZ deployment has no AZ-level resilience. Elastic IPs do not provide automatic recovery. CloudWatch alarms detect issues but do not automatically recover.
Domain: 2 — Design Resilient Architectures Task: 2.2
A company needs a DR strategy for their production environment. They want to minimize cost while maintaining the ability to recover within 10 minutes. Which DR strategy is most appropriate?
Correct: C
Warm Standby runs a scaled-down version of the production environment that can be quickly scaled up. It provides RTO in minutes, meeting the 10-minute requirement. Backup and Restore has RTO in hours. Pilot Light may take longer than 10 minutes to scale up. Active-Active is the most expensive option and exceeds the requirement.
Domain: 2 — Design Resilient Architectures Task: 2.2