4.1.2 Interpret Metrics, Logs, and Traces

Interpret Application Metrics, Logs, and Traces

Key Metrics per Service

ServiceKey Metrics
LambdaInvocations, Duration, Errors, Throttles, ConcurrentExecutions, IteratorAge
API GatewayCount, Latency, IntegrationLatency, 4XXError, 5XXError, CacheMissCount, CacheHitCount
DynamoDBConsumedRCU, ConsumedWCU, ThrottledRequests
SQSApproximateAgeOfOldestMessage, NumberOfMessagesVisible
ECSCPUUtilization, MemoryUtilization
ALBTargetResponseTime, HTTPCode_Target_5XX_Count
RDSCPUUtilization, DatabaseConnections, FreeableMemory

What Metrics Tell You

Metric PatternMeaning
Lambda Errors ↑Code bugs or downstream failures
Lambda Throttles ↑Concurrency limit reached
Lambda IteratorAge ↑Consumer falling behind (Kinesis/Streams)
API GW 5XXError ↑Backend failures
API GW Latency ↑Slow backend or cold starts
API GW IntegrationLatency ↑Lambda/backend processing slow
DynamoDB ThrottledRequests ↑Hot partition or low capacity
SQS AgeOfOldestMessage ↑Consumer too slow

Scenario: API Gateway + Lambda Performance

Câu hỏi thi: API Gateway với Lambda proxy integration mất nhiều thời gian hơn expected. Metrics nào cần monitor?

Đáp án:

  • IntegrationLatency - Thời gian Lambda xử lý
  • Latency - Tổng thời gian API Gateway (bao gồm IntegrationLatency + overhead)

X-Ray Service Map

Client → API Gateway (5ms) → Lambda (250ms) → DynamoDB (10ms)
                                             → S3 (50ms)
                                             → External API (500ms) ← bottleneck!
  • Color coding: Green (OK), Yellow (errors), Red (faults)
  • Latency breakdown per service
  • Error/fault rates per node

X-Ray Filter Expressions

# Find traces related to specific paths
service("my-api") AND http.url CONTAINS "/orders"

# Find traces for specific users
annotation.user_id = "12345"

# Find slow requests
responsetime > 5

# Find errors
error = true OR fault = true

RDS Enhanced Monitoring

  • Provides metrics in real time cho database instance OS
  • View metrics qua RDS console hoặc consume JSON output từ CloudWatch Logs
  • Metrics: CPU, memory, file system, disk I/O

Exam Tip:

  • IteratorAge = stream processing lag
  • ApproximateAgeOfOldestMessage = SQS consumer lag
  • IntegrationLatency = backend processing time
  • Latency = total API Gateway time
  • X-Ray Service Map shows bottlenecks visually
  • X-Ray filter expressions để tìm traces theo paths/users