4.1.3 Query Logs to Find Relevant Data

Query Logs to Find Relevant Data

CloudWatch Logs Insights

-- Find errors
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 20

-- P99 latency
stats pct(@duration, 99) as p99 by bin(1h)

-- Cold starts count
filter @message like /Init Duration/
| stats count(*) as coldStarts by bin(5m)

-- Top 10 slowest invocations
fields @duration, @requestId
| sort @duration desc
| limit 10

-- Error rate
stats count(*) as total,
      sum(strcontains(@message, "ERROR")) as errors
| display errors/total * 100 as error_rate

-- Find specific request
fields @timestamp, @message
| filter @requestId = "abc-123-def"

Log Structure

Log Group: /aws/lambda/my-function
  └── Log Stream: 2025/01/15/[$LATEST]abc123
       └── Log Events

Log Retention

SettingMô tả
1 day → 10 yearsConfigurable
Never expireDefault (costs accumulate!)
Export to S3Long-term archival
Subscription filterStream to Lambda/Kinesis/OpenSearch

CloudTrail Logs (API Audit)

fields @timestamp, eventName, errorCode, errorMessage, userIdentity.arn
| filter errorCode = "AccessDenied"
| sort @timestamp desc
| limit 20

Amazon Athena (Query Logs trong S3)

Athena là serverless interactive query service, dùng SQL để query data trực tiếp trong S3.

FeatureMô tả
EnginePresto-based SQL
PricingPay per query ($5/TB scanned)
Data formatsCSV, JSON, Parquet, ORC, Avro
SchemaGlue Data Catalog
Use caseAd-hoc queries trên S3 data, log analysis

Athena vs CloudWatch Logs Insights

AthenaCloudWatch Logs Insights
Data sourceS3CloudWatch Logs
Query languageStandard SQLCustom query syntax
Best forLarge-scale log analysis, archived logsReal-time log troubleshooting
CostPer TB scannedPer GB scanned
PerformanceTốt hơn cho large datasetsTốt cho recent logs

Common Use Cases cho Developer

-- Query CloudTrail logs exported to S3
SELECT eventname, sourceipaddress, errorcode
FROM cloudtrail_logs
WHERE errorcode = 'AccessDenied'
  AND eventtime > '2025-01-01'
ORDER BY eventtime DESC;

-- Query ALB access logs trong S3
SELECT target_status_code, count(*) as cnt
FROM alb_logs
WHERE target_status_code >= 500
GROUP BY target_status_code;

Optimize Athena Queries

TechniqueBenefit
Parquet/ORC formatColumnar → scan ít data hơn
PartitioningChỉ scan relevant partitions
CompressionGiảm data scanned
LIMIT clauseGiảm output size

Exam Tip: Logs Insights = ad-hoc queries on CloudWatch Logs. Athena = SQL queries trên S3 data (archived logs, large datasets). CloudTrail for API call auditing. Parquet format giảm Athena cost. Set log retention — default never expires.