FastAPI & Docker 배포

개요

이번 세션에서는 Jupyter 노트북의 에이전트를 FastAPI와 Docker를 사용해 프로덕션 준비된 REST API로 전환합니다.

왜 FastAPI인가?

특징	장점
Async 네이티브	많은 동시 요청 처리
자동 문서화	Swagger UI 기본 제공
타입 안전성	Pydantic 검증
빠름	가장 빠른 Python 프레임워크 중 하나

프로젝트 구조

main.py
Dockerfile
requirements.txt
.env

API 구축하기

Step 1: 데이터 모델 정의

from pydantic import BaseModel
 
class AgentRequest(BaseModel):
    query: str
    model: str = "gpt-4o-mini"
 
class AgentResponse(BaseModel):
    response: str
    tool_calls: int = 0

Step 2: 에이전트 함수 생성

from openai import OpenAI
 
client = OpenAI()
 
def run_simple_agent(query: str, model: str) -> str:
    """
    쿼리를 처리하는 간단한 에이전트.
    프로덕션에서는 여기에 전체 ReActAgent를 import하세요.
    """
    try:
        completion = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": "You are a helpful API agent."},
                {"role": "user", "content": query}
            ]
        )
        return completion.choices[0].message.content
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

Step 3: 엔드포인트 정의

from fastapi import FastAPI, HTTPException
 
app = FastAPI(
    title="LLM Agent API",
    description="프로덕션 준비된 Agent API",
    version="1.0.0"
)
 
@app.get("/health")
def health_check():
    """로드 밸런서용 헬스 체크 엔드포인트"""
    return {"status": "ok", "service": "llm-agent-api"}
 
@app.post("/v1/agent/chat", response_model=AgentResponse)
def chat_endpoint(request: AgentRequest):
    """메인 채팅 엔드포인트"""
    answer = run_simple_agent(request.query, request.model)
    return AgentResponse(response=answer)

Step 4: 로컬 실행

# 의존성 설치
pip install fastapi uvicorn openai python-dotenv
 
# 서버 실행
uvicorn main:app --reload --port 8000

http://localhost:8000/docs에서 인터랙티브 API 문서를 확인하세요.

API Docker화 하기

Dockerfile

# 공식 경량 Python 이미지 사용
FROM python:3.11-slim
 
# 작업 디렉토리 설정
WORKDIR /app
 
# 의존성 설치
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
 
# 애플리케이션 코드 복사
COPY main.py .
COPY .env .
 
# 포트 노출
EXPOSE 8000
 
# 애플리케이션 실행
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

requirements.txt

fastapi>=0.104.0
uvicorn>=0.24.0
openai>=1.0.0
python-dotenv>=1.0.0
pydantic>=2.0.0

빌드 및 실행

# 이미지 빌드
docker build -t llm-agent-api .
 
# 컨테이너 실행
docker run -p 8000:8000 --env-file .env llm-agent-api

프로덕션 고려사항

1. 환경 변수

API 키를 절대 커밋하지 마세요. 환경 변수 사용:

import os
from dotenv import load_dotenv
 
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

2. 에러 처리

from fastapi import HTTPException
from tenacity import retry, stop_after_attempt, wait_exponential
 
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
def call_llm_with_retry(messages):
    try:
        return client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages
        )
    except Exception as e:
        raise HTTPException(status_code=503, detail="LLM 서비스 이용 불가")

3. 레이트 리미팅

from slowapi import Limiter
from slowapi.util import get_remote_address
 
limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
 
@app.post("/v1/agent/chat")
@limiter.limit("10/minute")
def chat_endpoint(request: Request, agent_request: AgentRequest):
    ...

4. 로깅 & 모니터링

import logging
from datetime import datetime
 
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
 
@app.middleware("http")
async def log_requests(request: Request, call_next):
    start_time = datetime.now()
    response = await call_next(request)
    duration = (datetime.now() - start_time).total_seconds()
 
    logger.info(f"{request.method} {request.url.path} - {response.status_code} - {duration:.3f}s")
    return response

프로덕션 아키텍처

배포 옵션

Docker Compose Kubernetes AWS ECS Google Cloud Run

Docker Compose 예시

version: '3.8'
services:
  agent-api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

API 테스트

curl 사용

# 헬스 체크
curl http://localhost:8000/health
 
# 채팅 요청
curl -X POST http://localhost:8000/v1/agent/chat \
  -H "Content-Type: application/json" \
  -d '{"query": "2+2는 뭐야?", "model": "gpt-4o-mini"}'

Python 사용

import httpx
 
response = httpx.post(
    "http://localhost:8000/v1/agent/chat",
    json={"query": "프랑스의 수도는?"}
)
print(response.json())

모범 사례

해야 할 것:

컨테이너 오케스트레이션용 헬스 체크 사용
그레이스풀 셧다운 구현
자주 사용되는 쿼리 캐싱
커넥션 풀링 사용

⚠️

하지 말아야 할 것:

API 키 하드코딩
에러 처리 생략
LLM 제공자의 레이트 리밋 무시
로깅 없이 배포

참고 자료 & 추가 학습

FastAPI 문서 Docker 모범 사례 12-Factor App

다음 단계

에이전트를 API로 배포하는 방법을 배웠습니다! 이제 캡스톤 프로젝트로 이동하여 이 코스의 모든 내용을 결합한 완전한 프로덕션 시스템을 구축하세요.

코드 실행

cd week4_production/serving_api
docker build -t llm-agent-api .
docker run -p 8000:8000 --env-file .env llm-agent-api

그런 다음 http://localhost:8000/docs에서 API를 테스트하세요.

01. Guardrails & HITL 02. Serving API