02. Reflection Agents

Why Reflection Matters

When humans make mistakes, we think: "Why was I wrong?" and try not to repeat the same error. However, basic LLMs are stateless—they don't remember previous attempts and often produce the same wrong answer.

Reflection Agents critique their own work and use that critique to improve. It's like a writer drafting and then continuously revising their work.

The Generator-Reflector Pattern

The simplest reflection pattern splits roles into two:

Role	Responsibility
Generator	Creates initial output or revises based on feedback
Reflector	Critiques the output and provides improvement suggestions

Implementing the Generator

def generator(topic: str, previous_draft: str = None, critique: str = None) -> str:
    """
    Generates a draft or revises it based on critique.
    """
    if previous_draft and critique:
        prompt = f"""Topic: {topic}
Previous Draft:
{previous_draft}
 
Critique/Feedback:
{critique}
 
Please rewrite the draft improving it based on the critique above.
"""
    else:
        prompt = f"Topic: {topic}\nWrite a short essay draft on the topic."
 
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

Implementing the Reflector

def reflector(draft: str) -> str:
    """
    Reads the draft and provides constructive critique.
    """
    prompt = f"""Read the following draft and provide constructive criticism.
Evaluate logic, clarity, and style. Point out up to 3 areas for improvement.
Focus on what needs to be fixed rather than praise.
 
Draft:
{draft}
"""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

The Reflexion Loop

Combine Generator and Reflector in an iterative loop:

Running the Loop

# Step 1: First Draft
print("✍️ Writing Draft V1...")
draft_v1 = generator(topic)
 
# Step 2: Reflection
print("🤔 Reflecting...")
critique_v1 = reflector(draft_v1)
 
# Step 3: Refinement
print("✍️ Writing Draft V2 (Refined)...")
draft_v2 = generator(topic, previous_draft=draft_v1, critique=critique_v1)

Example: Essay Improvement

Topic: "Can AI replace human creativity?"

Draft V1:

"The question of whether AI can replace human creativity is complex..."

Critique V1:

Clarify the argument structure

Add specific examples

Improve stylistic flow

Draft V2 (After Reflection):

"AI, while capable of generating creative outputs, cannot fully replicate the emotional and contextual underpinnings of human creativity. Technologies like machine learning analyze patterns, but this differs from human creative experience rooted in emotions and culture..."

Key Insight: Each reflection cycle produces measurably better output by addressing specific weaknesses identified by the Reflector.

Automated Reflexion Agent

Automate the loop until a stopping condition is met:

class ReflexionAgent:
    def __init__(self, max_iterations: int = 3):
        self.max_iterations = max_iterations
 
    def run(self, topic: str):
        current_draft = generator(topic)
        print(f"📝 Initial Draft:\n{current_draft}\n")
 
        for i in range(self.max_iterations):
            print(f"--- Iteration {i+1} ---")
 
            # 1. Critique
            critique = reflector(current_draft)
            print(f"🔍 Critique:\n{critique}\n")
 
            # 2. Check stopping condition
            if "no improvements needed" in critique.lower():
                print("✨ Reflector is satisfied. Stopping.")
                break
 
            # 3. Refine
            current_draft = generator(
                topic,
                previous_draft=current_draft,
                critique=critique
            )
            print(f"✍️ Refined Draft:\n{current_draft}\n")
 
        return current_draft

Self-Debugging Code Agent

Reflection is especially powerful for coding tasks:

Implementation

def code_generator(prompt: str, error_msg: str = None, previous_code: str = None) -> str:
    if error_msg:
        full_prompt = f"""The following code resulted in an error.
Code:
{previous_code}
 
Error:
{error_msg}
 
Please fix the code. Return only the code.
"""
    else:
        full_prompt = f"Write Python code to: {prompt}. Return only the code."
 
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": full_prompt}]
    )
    return clean_code(response.choices[0].message.content)
 
def execute_code(code: str):
    """Execute and return (success, output/error)"""
    try:
        exec_globals = {}
        exec(code, exec_globals)
        return True, "Success"
    except Exception as e:
        return False, str(e)
 
# Self-correcting loop
code = code_generator(problem)
success, output = execute_code(code)
 
if not success:
    print("⚠️ Error detected! Self-correcting...")
    fixed_code = code_generator(problem, error_msg=output, previous_code=code)

Advanced Stopping Criteria

Instead of fixed iterations, use scored feedback:

def scored_reflector(draft: str) -> tuple[str, int]:
    """Returns (critique, score from 1-5)"""
    prompt = f"""Rate the draft 1-5 and provide critique.
Format: Score: X\nCritique: ...
 
Draft:
{draft}
"""
    response = client.chat.completions.create(...)
    # Parse score and critique from response
    return critique, score
 
# In the loop:
critique, score = scored_reflector(current_draft)
if score >= 4:
    print("✅ Quality threshold reached!")
    break

Reflection vs Other Patterns

Pattern	Approach	Best For
ReAct	Interleaved reasoning	Interactive exploration
Plan-Execute	Plan first, then act	Multi-step workflows
Reflection	Generate, critique, refine	Quality improvement
Debate	Multiple agents argue	Handling controversial topics

Hands-on Practice

In the notebook, you will:

Build Generator & Reflector

Implement the two-role pattern for essay writing

Run the Reflexion Loop

Watch an essay improve through multiple iterations

Create a Self-Debugging Agent

Build a code agent that fixes its own errors

Experiment with Stopping Criteria

Implement score-based loop termination

Key Takeaways

Self-critique improves quality - LLMs can effectively criticize their own outputs
Separation of concerns - Generator and Reflector roles should be distinct
Iteration beats one-shot - Multiple refinement cycles consistently improve results
Stopping conditions matter - Use scores or quality thresholds, not just iteration counts

References & Further Reading

Reflexion Paper Self-Refine Paper Constitutional AI

Academic Papers

"Reflexion: Language Agents with Verbal Reinforcement Learning" - Shinn et al., 2023
- arXiv:2303.11366 (opens in a new tab)
- Foundation for verbal self-reflection in agents
"Self-Refine: Iterative Refinement with Self-Feedback" - Madaan et al., 2023
- arXiv:2303.17651 (opens in a new tab)
- Shows iterative self-feedback improves generation across tasks
"Constitutional AI: Harmlessness from AI Feedback" - Anthropic, 2022
- arXiv:2212.08073 (opens in a new tab)
- Using AI to critique and improve AI responses
"Language Models can Solve Computer Tasks" - Kim et al., 2023
- arXiv:2303.17491 (opens in a new tab)
- Self-debugging approaches for code generation

Related Concepts

Critique Models: LLM Critics (opens in a new tab)
Self-Consistency: Sampling-based self-verification (opens in a new tab)

Next Steps

Now that you understand reflection, explore Structured Reasoning to learn about advanced decomposition and Tree of Thoughts!

01. RAG & Memory 04. Modern Stack (Graph)