1.1.13 Resilient Third-Party Integrations

Implement Resilient Application Code for Third-Party Service Integrations

Retry Logic

Exponential Backoff with Jitter

Attempt 1: wait random(0, 1s)
Attempt 2: wait random(0, 2s)
Attempt 3: wait random(0, 4s)
Attempt 4: wait random(0, 8s)
Attempt 5: wait random(0, 16s)
import time
import random
import requests

def call_with_retry(url, max_retries=5):
    for attempt in range(max_retries):
        try:
            response = requests.get(url, timeout=5)
            response.raise_for_status()
            return response.json()
        except (requests.exceptions.RequestException) as e:
            if attempt == max_retries - 1:
                raise
            wait = min(2 ** attempt, 30) # Cap at 30s
            jitter = random.uniform(0, wait)
            time.sleep(jitter)
  • Jitter tránh thundering herd (nhiều clients retry cùng lúc)
  • AWS SDK đã built-in exponential backoff cho AWS API calls
  • Cần tự implement cho third-party API calls

Circuit Breaker Pattern

States

CLOSED → (failures > threshold) → OPEN
OPEN → (timeout expires) → HALF-OPEN
HALF-OPEN → (success) → CLOSED
HALF-OPEN → (failure) → OPEN
StateBehavior
ClosedRequests đi qua bình thường, đếm failures
OpenFail fast, không gọi service, trả fallback response
Half-OpenCho phép 1 vài requests thử, nếu OK → Closed
  • Ngăn cascade failures khi downstream service bị lỗi
  • Giảm load lên service đang gặp vấn đề
  • Cho phép service thời gian recover

Timeout Pattern

import requests

# Luôn set timeout cho external calls
try:
    response = requests.get(
        'https://api.third-party.com/data',
        timeout=(3, 10)  # (connect_timeout, read_timeout)
    )
except requests.exceptions.Timeout:
    return fallback_response()
  • Không bao giờ chờ vô hạn
  • Connect timeout: Thời gian chờ kết nối
  • Read timeout: Thời gian chờ response

Bulkhead Pattern

  • Isolate resources cho từng dependency
  • Nếu 1 service fail, không ảnh hưởng resources của services khác
  • Ví dụ: Thread pools riêng, connection pools riêng cho mỗi external service

Fallback Responses

StrategyMô tả
Cached dataTrả về data từ cache khi service unavailable
Default valueTrả về giá trị mặc định
Graceful degradationGiảm functionality thay vì fail hoàn toàn
Queue for laterĐưa request vào queue để xử lý sau

Error Handling Patterns

class ThirdPartyServiceError(Exception):
    pass

class ServiceUnavailableError(ThirdPartyServiceError):
    pass

def call_external_service(data):
    try:
        response = requests.post(API_URL, json=data, timeout=5)
        if response.status_code == 429:  # Rate limited
            raise ServiceUnavailableError("Rate limited")
        elif response.status_code >= 500:
            raise ServiceUnavailableError(f"Server error: {response.status_code}")
        response.raise_for_status()
        return response.json()
    except requests.exceptions.ConnectionError:
        raise ServiceUnavailableError("Connection failed")
    except requests.exceptions.Timeout:
        raise ServiceUnavailableError("Request timed out")

Exam Tip: Retry + exponential backoff = xử lý transient errors. Circuit breaker = ngăn cascade failures. Timeout = không chờ vô hạn. Đề bài hỏi “resilient integration” → kết hợp cả 3 patterns. AWS SDK đã có retry built-in, chỉ cần implement thêm cho third-party calls.