2.3.5 Data Masking and Sanitization

Data Masking and Sanitization

Masking Patterns

Data TypeOriginalMasked
Emailjohn@example.comj***@example.com
Credit Card4111 1111 1111 1234**** **** **** 1234
Phone+1-555-123-4567+1-555--*
SSN123-45-6789*--6789

Masking trong Application Code

import re

def mask_email(email):
    parts = email.split('@')
    return parts[0][0] + '***@' + parts[1]

def mask_credit_card(cc):
    return '**** **** **** ' + cc[-4:]

def mask_log_message(message):
    # Mask emails
    message = re.sub(r'[\w.-]+@[\w.-]+', '***@***.***', message)
    # Mask credit cards
    message = re.sub(
        r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b',
        '**** **** **** ****', message
    )
    return message

Logging Best Practices

import logging

logger = logging.getLogger()

def handler(event, context):
    # ✅ GOOD — Log only non-sensitive identifiers
    logger.info(f"Processing order: {event.get('orderId')}")
    logger.info(f"User action: {event.get('action')}")

    # ❌ BAD — KHÔNG log raw event
    # logger.info(f"Event: {event}")
    # logger.info(f"User: {event.get('email')}")

Những gì KHÔNG BAO GIỜ log

  • Passwords, API keys, tokens
  • Credit card numbers (PCI DSS violation)
  • Social Security Numbers
  • Full email addresses
  • Medical records (HIPAA violation)
  • Raw request/response bodies chứa PII

CloudWatch Logs Protection

FeatureMô tả
EncryptionDefault AWS managed key, optional KMS CMK
RetentionSet policy (1 day → 10 years, hoặc never expire)
Access controlIAM policies on log groups
Data protectionCloudWatch Logs data protection policies

CloudWatch Logs Data Protection

  • Tự động detect và mask sensitive data trong logs
  • Supported: Credit cards, SSN, email, addresses
  • Managed data identifiers hoặc custom patterns

Exam Tip: KHÔNG log PII/credentials. Mask trước khi log. CloudWatch Logs data protection cho auto-masking. Set log retention policy — đừng giữ logs vĩnh viễn.