Lab 01: SageMaker Data Wrangler¶
Domain: 1 - Data Preparation
Difficulty: Easy
Time: 30 minutes
Objective¶
Learn to use SageMaker Data Wrangler for visual data preparation and feature engineering.
Prerequisites¶
- SageMaker Studio access
- Sample dataset (CSV)
Steps¶
Step 1: Access Data Wrangler¶
- Open SageMaker Studio
- From the launcher, select "New data flow"
- Name your flow:
customer-churn-prep
Step 2: Import Data¶
- Click "Import data"
- Select "Amazon S3"
- Navigate to your dataset or use a sample:
- Click "Import"
Step 3: Explore Data¶
- Click on the dataset node
- Select "Add analysis"
- Choose "Table summary" to view statistics
- Create a "Histogram" for numerical columns
Step 4: Add Transformations¶
- Click "+" after the data node
- Select "Add transform"
- Apply these transformations:
- Handle missing: Fill missing values
- Encode categorical: One-hot encode
State - Drop columns: Remove
Phone - Custom transform: Create new feature
Step 5: Export Flow¶
- Click "Export"
- Choose export destination:
- "Export to S3" for data
- "Export to Pipeline" for automation
- "Export to Python code" for reuse
Verification¶
- Transformed dataset exported to S3
- Data quality improved (no missing values)
- Features properly encoded
Cleanup¶
- Close Data Wrangler flow
- Delete exported files if not needed
- Stop Studio instance
Key Takeaways¶
!!! note "Exam Points" - Data Wrangler is visual, no-code data preparation - Supports 300+ built-in transformations - Can export to Pipelines for automation - Part of SageMaker Studio