Kiến thức cần nắm:
Giải thích chi tiết:
Bedrock API Invocation Patterns:
| Pattern | API | Use Case |
|---|---|---|
| Synchronous | InvokeModel | Real-time responses, chatbots |
| Streaming | InvokeModelWithResponseStream | Progressive text display |
| Asynchronous | StartAsyncInvoke | Long-running tasks, batch |
| Converse | Converse / ConverseStream | Multi-turn conversations |
Synchronous invocation với boto3:
import boto3
import json
client = boto3.client('bedrock-runtime')
response = client.invoke_model(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
contentType='application/json',
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Explain RAG in 3 sentences"}
]
})
)
result = json.loads(response['body'].read())
print(result['content'][0]['text'])
Asynchronous pattern với SQS:
Client → API Gateway → SQS → Lambda → Bedrock
↓
DynamoDB (store result)
↓
Client polls / SNS notification
Kiến thức cần nắm:
Exam Tip: Streaming responses là best practice cho user experience. Bedrock hỗ trợ streaming qua InvokeModelWithResponseStream API. Đây là kiến thức thường xuất hiện trong đề thi.
Giải thích chi tiết:
Streaming với Bedrock:
response = client.invoke_model_with_response_stream(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
contentType='application/json',
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Write a poem about AWS"}
]
})
)
for event in response['body']:
chunk = json.loads(event['chunk']['bytes'])
if chunk['type'] == 'content_block_delta':
print(chunk['delta']['text'], end='', flush=True)
Converse API (recommended cho multi-turn):
response = client.converse_stream(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
messages=[
{"role": "user", "content": [{"text": "Hello!"}]}
]
)
Ưu điểm Converse API:
Kiến thức cần nắm:
Giải thích chi tiết:
Resilience Patterns:
| Pattern | Implementation | Khi nào dùng |
|---|---|---|
| Retry with backoff | AWS SDK built-in | Transient errors (429, 503) |
| Circuit breaker | Step Functions | Repeated failures |
| Fallback model | Application logic | Primary model unavailable |
| Rate limiting | API Gateway | Protect against burst traffic |
| Cross-region | Bedrock Cross Region Inference | Regional outages |
Exponential Backoff:
import boto3
from botocore.config import Config
config = Config(
retries={
'max_attempts': 5,
'mode': 'adaptive' # Adaptive retry mode
}
)
client = boto3.client('bedrock-runtime', config=config)
Fallback Pattern:
Primary Model (Claude Sonnet) → Timeout/Error
↓
Fallback Model (Claude Haiku) → Response
↓ (if also fails)
Cached Response / Error Message
Kiến thức cần nắm:
Giải thích chi tiết:
Routing Strategies:
Static Routing — Map request types to specific models
/api/classify → Haiku/api/generate → Sonnet/api/analyze → OpusDynamic Content-Based Routing:
Request → Classifier (small model)
↓
Simple → Small Model (fast, cheap)
Medium → Medium Model (balanced)
Complex → Large Model (powerful, expensive)
Metric-Based Routing: