Compliance¶

Overview¶

Ensuring ML solutions meet compliance and governance requirements.

Amazon Macie¶

Discover and protect sensitive data in S3.

Key Features¶

Automatic sensitive data discovery
PII detection
Custom data identifiers
Findings and alerts

import boto3

macie = boto3.client("macie2")

# Enable Macie
macie.enable_macie()

# Create classification job
macie.create_classification_job(
    name="ml-data-scan",
    s3JobDefinition={
        "bucketDefinitions": [{
            "accountId": "123456789012",
            "buckets": ["my-ml-bucket"]
        }]
    },
    jobType="ONE_TIME"
)

AWS Config¶

Track resource compliance.

SageMaker Config Rules¶

Rule	Description
sagemaker-endpoint-configuration-kms-key-configured	Endpoint uses KMS
sagemaker-notebook-instance-inside-vpc	Notebook in VPC
sagemaker-notebook-no-direct-internet-access	No direct internet

Custom Config Rule¶

# Lambda function for custom rule
def evaluate_compliance(configuration_item):
    if configuration_item["resourceType"] != "AWS::SageMaker::Endpoint":
        return "NOT_APPLICABLE"

    # Check endpoint configuration
    config = configuration_item["configuration"]
    if config.get("kmsKeyId"):
        return "COMPLIANT"
    return "NON_COMPLIANT"

Data Governance¶

AWS Lake Formation¶

Fine-grained access control for data lakes.

Row-level security
Column-level security
Data sharing across accounts

SageMaker Governance¶

Feature	Purpose
Model Cards	Document model details
Model Registry	Version and approve models
Lineage Tracking	Track data to model

Model Cards¶

Document model information for governance.

from sagemaker.model_card import ModelCard, ModelOverview

model_card = ModelCard(
    name="customer-churn-model",
    status="Draft",
    model_overview=ModelOverview(
        model_description="Predicts customer churn probability",
        model_creator="Data Science Team",
        problem_type="Binary Classification"
    ),
    intended_uses=IntendedUses(
        purpose_of_model="Identify at-risk customers",
        intended_uses="Customer retention campaigns",
        factors_affecting_model_efficiency="Data recency",
        risk_rating="Medium"
    )
)

model_card.create()

Audit Logging¶

CloudTrail for Compliance¶

-- Find all model deployments
SELECT eventTime, userIdentity.userName, requestParameters.endpointName
FROM cloudtrail_logs
WHERE eventName = 'CreateEndpoint'
  AND eventTime > '2024-01-01'
ORDER BY eventTime DESC

Exam Tips¶

!!! warning "Key Points" - Macie for sensitive data discovery - Config rules for compliance checking - Lake Formation for data governance - Model Cards for model documentation - CloudTrail for audit logging