| Service | Key Metrics |
|---|---|
| Lambda | Invocations, Duration, Errors, Throttles, ConcurrentExecutions, IteratorAge |
| API Gateway | Count, Latency, IntegrationLatency, 4XXError, 5XXError, CacheMissCount, CacheHitCount |
| DynamoDB | ConsumedRCU, ConsumedWCU, ThrottledRequests |
| SQS | ApproximateAgeOfOldestMessage, NumberOfMessagesVisible |
| ECS | CPUUtilization, MemoryUtilization |
| ALB | TargetResponseTime, HTTPCode_Target_5XX_Count |
| RDS | CPUUtilization, DatabaseConnections, FreeableMemory |
| Metric Pattern | Meaning |
|---|---|
| Lambda Errors ↑ | Code bugs or downstream failures |
| Lambda Throttles ↑ | Concurrency limit reached |
| Lambda IteratorAge ↑ | Consumer falling behind (Kinesis/Streams) |
| API GW 5XXError ↑ | Backend failures |
| API GW Latency ↑ | Slow backend or cold starts |
| API GW IntegrationLatency ↑ | Lambda/backend processing slow |
| DynamoDB ThrottledRequests ↑ | Hot partition or low capacity |
| SQS AgeOfOldestMessage ↑ | Consumer too slow |
Câu hỏi thi: API Gateway với Lambda proxy integration mất nhiều thời gian hơn expected. Metrics nào cần monitor?
Đáp án:
Client → API Gateway (5ms) → Lambda (250ms) → DynamoDB (10ms)
→ S3 (50ms)
→ External API (500ms) ← bottleneck!
# Find traces related to specific paths
service("my-api") AND http.url CONTAINS "/orders"
# Find traces for specific users
annotation.user_id = "12345"
# Find slow requests
responsetime > 5
# Find errors
error = true OR fault = true
Exam Tip: