Capstone Project: Enterprise Knowledge Agent

Congratulations on reaching the final project! This capstone combines everything you've learned across all 4 weeks.

Project Overview

Build an Enterprise Knowledge Base Agent that can:

Answer questions from company documents (RAG)
Use multiple specialized sub-agents
Apply guardrails for safe responses
Serve via REST API
Evaluate its own performance

Architecture Components

1. Query Router (Week 3)

Route queries to specialized agents:

ROUTING_PROMPT = """
Classify this query into one of: FAQ, TECHNICAL, POLICY
 
Query: {query}
 
Classification:
"""
 
def route_query(query: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": ROUTING_PROMPT.format(query=query)}
        ]
    )
    return response.choices[0].message.content.strip()

2. Specialized Agents (Week 3)

Each agent has domain-specific knowledge:

AGENTS = {
    "FAQ": {
        "system_prompt": "You answer frequently asked questions about our company.",
        "knowledge_base": "faq_docs/"
    },
    "TECHNICAL": {
        "system_prompt": "You provide technical support for our products.",
        "knowledge_base": "tech_docs/"
    },
    "POLICY": {
        "system_prompt": "You explain company policies. Never make promises.",
        "knowledge_base": "policy_docs/"
    }
}

3. RAG Pipeline (Week 4)

Retrieve relevant documents:

from chromadb import Client
 
def retrieve_context(query: str, collection_name: str, k: int = 3) -> str:
    collection = chroma_client.get_collection(collection_name)
    results = collection.query(query_texts=[query], n_results=k)
    return "\n".join(results["documents"][0])

4. Guardrails (Week 4)

Validate responses:

def apply_guardrails(response: str, agent_type: str) -> tuple[bool, str]:
    # Policy agent should never promise specific outcomes
    if agent_type == "POLICY":
        forbidden = ["guarantee", "promise", "will definitely", "100%"]
        for word in forbidden:
            if word.lower() in response.lower():
                return False, f"Policy violation: '{word}'"
 
    # Check for PII
    if contains_pii(response):
        return False, "Response contains PII"
 
    return True, "OK"

5. LLM-as-Judge Evaluation (Week 4)

Score responses:

class EvalResult(BaseModel):
    relevance: int = Field(ge=1, le=5)
    accuracy: int = Field(ge=1, le=5)
    safety: int = Field(ge=1, le=5)
    reasoning: str
 
def evaluate_response(query: str, response: str, context: str) -> EvalResult:
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": EVAL_PROMPT},
            {"role": "user", "content": f"Query: {query}\nContext: {context}\nResponse: {response}"}
        ],
        response_format=EvalResult
    )
    return completion.choices[0].message.parsed

6. FastAPI Endpoint (Week 4)

Serve the complete system:

@app.post("/v1/knowledge/query")
async def knowledge_query(request: QueryRequest):
    # 1. Route
    agent_type = route_query(request.query)
 
    # 2. Retrieve
    context = retrieve_context(request.query, AGENTS[agent_type]["knowledge_base"])
 
    # 3. Generate
    response = generate_response(request.query, context, agent_type)
 
    # 4. Guardrail
    is_safe, reason = apply_guardrails(response, agent_type)
    if not is_safe:
        return {"response": FALLBACK_RESPONSES[agent_type], "blocked": True}
 
    # 5. Evaluate (async, for logging)
    asyncio.create_task(log_evaluation(request.query, response, context))
 
    return {"response": response, "agent": agent_type, "blocked": False}

Implementation Checklist

Set Up the Project Structure

capstone/
├── agents/
│   ├── router.py
│   ├── faq_agent.py
│   ├── tech_agent.py
│   └── policy_agent.py
├── rag/
│   ├── indexer.py
│   └── retriever.py
├── safety/
│   ├── guardrails.py
│   └── evaluator.py
├── api/
│   ├── main.py
│   └── models.py
├── data/
│   ├── faq_docs/
│   ├── tech_docs/
│   └── policy_docs/
├── tests/
│   └── test_agents.py
├── Dockerfile
└── docker-compose.yml

Index Your Knowledge Base

Use ChromaDB to embed and store documents

Implement the Router

Create a classification system for query routing

Build Specialized Agents

Each agent with its own system prompt and knowledge access

Add Guardrails

Input validation, output filtering, PII detection

Implement Evaluation

LLM-as-Judge for quality monitoring

Create the API

FastAPI endpoints with proper error handling

Dockerize Everything

Docker Compose for easy deployment

Write Tests

Unit tests for each component

Evaluation Criteria

Component	Weight	Criteria
Routing	15%	Correct classification accuracy
RAG Quality	20%	Relevant context retrieval
Agent Responses	25%	Accurate, helpful answers
Guardrails	15%	Safe, policy-compliant output
API Design	15%	Clean, documented endpoints
Code Quality	10%	Well-organized, tested code

Bonus Challenges

Dynamic Model Routing

Route simple queries to gpt-4o-mini and complex ones to gpt-4o:

def select_model(query: str, complexity_score: float) -> str:
    if complexity_score > 0.7:
        return "gpt-4o"
    return "gpt-4o-mini"

Skills Applied

This project demonstrates mastery of:

Week	Skills
Week 1	ReAct pattern, Tool calling, Structured output
Week 2	RAG, Memory systems, Advanced reasoning
Week 3	Multi-agent systems, MCP, CrewAI concepts
Week 4	Evaluation, Guardrails, Production deployment

What You've Learned

Congratulations! You've completed the LLM Agent Cookbook!

You now have the skills to:

Build agents from scratch using the ReAct pattern
Implement tool calling with structured outputs
Create RAG pipelines for knowledge retrieval
Design multi-agent systems with proper orchestration
Apply guardrails for safe AI applications
Evaluate agent performance systematically
Deploy production-ready AI services

References & Further Reading

LangChain LangGraph LangSmith ChromaDB

Continue Learning

Anthropic Courses: anthropic.com/courses (opens in a new tab)
OpenAI Cookbook: cookbook.openai.com (opens in a new tab)
DeepLearning.AI: deeplearning.ai (opens in a new tab)

Academic Papers

"ReAct: Synergizing Reasoning and Acting" - Yao et al., 2022
- arXiv:2210.03629 (opens in a new tab)
"Retrieval-Augmented Generation for Knowledge-Intensive NLP" - Lewis et al., 2020
- arXiv:2005.11401 (opens in a new tab)
"A Survey on Large Language Model based Autonomous Agents" - Wang et al., 2023
- arXiv:2308.11432 (opens in a new tab)