2.3.5 Data Masking and Sanitization
Data Masking and Sanitization
Masking Patterns
Masking trong Application Code
import re
def mask_email(email):
parts = email.split('@')
return parts[0][0] + '***@' + parts[1]
def mask_credit_card(cc):
return '**** **** **** ' + cc[-4:]
def mask_log_message(message):
# Mask emails
message = re.sub(r'[\w.-]+@[\w.-]+', '***@***.***', message)
# Mask credit cards
message = re.sub(
r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b',
'**** **** **** ****', message
)
return message
Logging Best Practices
import logging
logger = logging.getLogger()
def handler(event, context):
# ✅ GOOD — Log only non-sensitive identifiers
logger.info(f"Processing order: {event.get('orderId')}")
logger.info(f"User action: {event.get('action')}")
# ❌ BAD — KHÔNG log raw event
# logger.info(f"Event: {event}")
# logger.info(f"User: {event.get('email')}")
Những gì KHÔNG BAO GIỜ log
- Passwords, API keys, tokens
- Credit card numbers (PCI DSS violation)
- Social Security Numbers
- Full email addresses
- Medical records (HIPAA violation)
- Raw request/response bodies chứa PII
CloudWatch Logs Protection
| Feature | Mô tả |
|---|
| Encryption | Default AWS managed key, optional KMS CMK |
| Retention | Set policy (1 day → 10 years, hoặc never expire) |
| Access control | IAM policies on log groups |
| Data protection | CloudWatch Logs data protection policies |
CloudWatch Logs Data Protection
- Tự động detect và mask sensitive data trong logs
- Supported: Credit cards, SSN, email, addresses
- Managed data identifiers hoặc custom patterns
Exam Tip: KHÔNG log PII/credentials. Mask trước khi log. CloudWatch Logs data protection cho auto-masking. Set log retention policy — đừng giữ logs vĩnh viễn.