AWS Certified Developer Associate (DVA-C02) > Domain 4: Troubleshooting and Optimization > Root Cause Analysis > 4.1.2 Interpret Metrics, Logs, and Traces

4.1.2 Interpret Metrics, Logs, and Traces

Interpret Application Metrics, Logs, and Traces

Key Metrics per Service

Service	Key Metrics
Lambda	Invocations, Duration, Errors, Throttles, ConcurrentExecutions, IteratorAge
API Gateway	Count, Latency, IntegrationLatency, 4XXError, 5XXError, CacheMissCount, CacheHitCount
DynamoDB	ConsumedRCU, ConsumedWCU, ThrottledRequests
SQS	ApproximateAgeOfOldestMessage, NumberOfMessagesVisible
ECS	CPUUtilization, MemoryUtilization
ALB	TargetResponseTime, HTTPCode_Target_5XX_Count
RDS	CPUUtilization, DatabaseConnections, FreeableMemory

What Metrics Tell You

Metric Pattern	Meaning
Lambda Errors ↑	Code bugs or downstream failures
Lambda Throttles ↑	Concurrency limit reached
Lambda IteratorAge ↑	Consumer falling behind (Kinesis/Streams)
API GW 5XXError ↑	Backend failures
API GW Latency ↑	Slow backend or cold starts
API GW IntegrationLatency ↑	Lambda/backend processing slow
DynamoDB ThrottledRequests ↑	Hot partition or low capacity
SQS AgeOfOldestMessage ↑	Consumer too slow

Scenario: API Gateway + Lambda Performance

Câu hỏi thi: API Gateway với Lambda proxy integration mất nhiều thời gian hơn expected. Metrics nào cần monitor?

Đáp án:

IntegrationLatency - Thời gian Lambda xử lý
Latency - Tổng thời gian API Gateway (bao gồm IntegrationLatency + overhead)

X-Ray Service Map

Client → API Gateway (5ms) → Lambda (250ms) → DynamoDB (10ms)
                                             → S3 (50ms)
                                             → External API (500ms) ← bottleneck!

Color coding: Green (OK), Yellow (errors), Red (faults)
Latency breakdown per service
Error/fault rates per node

X-Ray Filter Expressions

# Find traces related to specific paths
service("my-api") AND http.url CONTAINS "/orders"

# Find traces for specific users
annotation.user_id = "12345"

# Find slow requests
responsetime > 5

# Find errors
error = true OR fault = true

RDS Enhanced Monitoring

Provides metrics in real time cho database instance OS
View metrics qua RDS console hoặc consume JSON output từ CloudWatch Logs
Metrics: CPU, memory, file system, disk I/O

Exam Tip:

IteratorAge = stream processing lag
ApproximateAgeOfOldestMessage = SQS consumer lag
IntegrationLatency = backend processing time
Latency = total API Gateway time
X-Ray Service Map shows bottlenecks visually
X-Ray filter expressions để tìm traces theo paths/users