1.3.5 Serialize & Deserialize Data
Serialize and Deserialize Data
DynamoDB Data Types
| Category | Type | Ký hiệu | Ví dụ |
|---|
| Scalar | String | S | {"S": "hello"} |
| Scalar | Number | N | {"N": "42"} |
| Scalar | Binary | B | {"B": "base64data"} |
| Scalar | Boolean | BOOL | {"BOOL": true} |
| Scalar | Null | NULL | {"NULL": true} |
| Document | List | L | {"L": [{"S": "a"}, {"N": "1"}]} |
| Document | Map | M | {"M": {"key": {"S": "value"}}} |
| Set | String Set | SS | {"SS": ["a", "b", "c"]} |
| Set | Number Set | NS | {"NS": ["1", "2", "3"]} |
| Set | Binary Set | BS | {"BS": ["base64a", "base64b"]} |
Low-level vs High-level API
import boto3
# Low-level client — cần specify type descriptors
client = boto3.client('dynamodb')
client.put_item(
TableName='Users',
Item={
'id': {'S': 'user-123'},
'age': {'N': '25'},
'tags': {'SS': ['dev', 'aws']}
}
)
# High-level resource — auto serialize/deserialize
table = boto3.resource('dynamodb').Table('Users')
table.put_item(
Item={
'id': 'user-123',
'age': 25,
'tags': {'dev', 'aws'}
}
)
| Format | Type | Use Case |
|---|
| JSON | Text, semi-structured | APIs, config, flexible schema |
| CSV | Text, tabular | Simple data exchange |
| Parquet | Binary, columnar | Analytics, Athena queries |
| Avro | Binary, row-based | Streaming, schema evolution |
| ORC | Binary, columnar | Hive/EMR workloads |
S3 Select
- Query trực tiếp trên S3 objects bằng SQL
- Chỉ trả về data cần thiết → giảm data transfer
- Hỗ trợ: CSV, JSON, Parquet
import boto3
s3 = boto3.client('s3')
response = s3.select_object_content(
Bucket='my-bucket',
Key='data.csv',
Expression="SELECT * FROM s3object WHERE age > 25",
ExpressionType='SQL',
InputSerialization={'CSV': {'FileHeaderInfo': 'USE'}},
OutputSerialization={'JSON': {}}
)
Exam Tip: High-level resource API tự động serialize/deserialize. S3 Select giảm data transfer cost. Parquet cho analytics workloads (Athena, Redshift Spectrum).