AWS Certified Developer Associate (DVA-C02) > Domain 1: Development with AWS Services > Use data stores in application development > 1.3.5 Serialize & Deserialize Data

1.3.5 Serialize & Deserialize Data

Serialize and Deserialize Data

DynamoDB Data Types

Category	Type	Ký hiệu	Ví dụ
Scalar	String	S	`{"S": "hello"}`
Scalar	Number	N	`{"N": "42"}`
Scalar	Binary	B	`{"B": "base64data"}`
Scalar	Boolean	BOOL	`{"BOOL": true}`
Scalar	Null	NULL	`{"NULL": true}`
Document	List	L	`{"L": [{"S": "a"}, {"N": "1"}]}`
Document	Map	M	`{"M": {"key": {"S": "value"}}}`
Set	String Set	SS	`{"SS": ["a", "b", "c"]}`
Set	Number Set	NS	`{"NS": ["1", "2", "3"]}`
Set	Binary Set	BS	`{"BS": ["base64a", "base64b"]}`

Low-level vs High-level API

import boto3

# Low-level client — cần specify type descriptors
client = boto3.client('dynamodb')
client.put_item(
    TableName='Users',
    Item={
        'id': {'S': 'user-123'},
        'age': {'N': '25'},
        'tags': {'SS': ['dev', 'aws']}
    }
)

# High-level resource — auto serialize/deserialize
table = boto3.resource('dynamodb').Table('Users')
table.put_item(
    Item={
        'id': 'user-123',
        'age': 25,
        'tags': {'dev', 'aws'}
    }
)

S3 Data Formats

Format	Type	Use Case
JSON	Text, semi-structured	APIs, config, flexible schema
CSV	Text, tabular	Simple data exchange
Parquet	Binary, columnar	Analytics, Athena queries
Avro	Binary, row-based	Streaming, schema evolution
ORC	Binary, columnar	Hive/EMR workloads

S3 Select

Query trực tiếp trên S3 objects bằng SQL
Chỉ trả về data cần thiết → giảm data transfer
Hỗ trợ: CSV, JSON, Parquet

import boto3

s3 = boto3.client('s3')
response = s3.select_object_content(
    Bucket='my-bucket',
    Key='data.csv',
    Expression="SELECT * FROM s3object WHERE age > 25",
    ExpressionType='SQL',
    InputSerialization={'CSV': {'FileHeaderInfo': 'USE'}},
    OutputSerialization={'JSON': {}}
)

Exam Tip: High-level resource API tự động serialize/deserialize. S3 Select giảm data transfer cost. Parquet cho analytics workloads (Athena, Redshift Spectrum).