1.3.5 Serialize & Deserialize Data

Serialize and Deserialize Data

DynamoDB Data Types

CategoryTypeKý hiệuVí dụ
ScalarStringS{"S": "hello"}
ScalarNumberN{"N": "42"}
ScalarBinaryB{"B": "base64data"}
ScalarBooleanBOOL{"BOOL": true}
ScalarNullNULL{"NULL": true}
DocumentListL{"L": [{"S": "a"}, {"N": "1"}]}
DocumentMapM{"M": {"key": {"S": "value"}}}
SetString SetSS{"SS": ["a", "b", "c"]}
SetNumber SetNS{"NS": ["1", "2", "3"]}
SetBinary SetBS{"BS": ["base64a", "base64b"]}

Low-level vs High-level API

import boto3

# Low-level client — cần specify type descriptors
client = boto3.client('dynamodb')
client.put_item(
    TableName='Users',
    Item={
        'id': {'S': 'user-123'},
        'age': {'N': '25'},
        'tags': {'SS': ['dev', 'aws']}
    }
)

# High-level resource — auto serialize/deserialize
table = boto3.resource('dynamodb').Table('Users')
table.put_item(
    Item={
        'id': 'user-123',
        'age': 25,
        'tags': {'dev', 'aws'}
    }
)

S3 Data Formats

FormatTypeUse Case
JSONText, semi-structuredAPIs, config, flexible schema
CSVText, tabularSimple data exchange
ParquetBinary, columnarAnalytics, Athena queries
AvroBinary, row-basedStreaming, schema evolution
ORCBinary, columnarHive/EMR workloads

S3 Select

  • Query trực tiếp trên S3 objects bằng SQL
  • Chỉ trả về data cần thiết → giảm data transfer
  • Hỗ trợ: CSV, JSON, Parquet
import boto3

s3 = boto3.client('s3')
response = s3.select_object_content(
    Bucket='my-bucket',
    Key='data.csv',
    Expression="SELECT * FROM s3object WHERE age > 25",
    ExpressionType='SQL',
    InputSerialization={'CSV': {'FileHeaderInfo': 'USE'}},
    OutputSerialization={'JSON': {}}
)

Exam Tip: High-level resource API tự động serialize/deserialize. S3 Select giảm data transfer cost. Parquet cho analytics workloads (Athena, Redshift Spectrum).