English
Week 2: Reasoning
02. LangGraph

02. Reflection Agents

Why Reflection Matters

When humans make mistakes, we think: "Why was I wrong?" and try not to repeat the same error. However, basic LLMs are stateless—they don't remember previous attempts and often produce the same wrong answer.

Reflection Agents critique their own work and use that critique to improve. It's like a writer drafting and then continuously revising their work.

The Generator-Reflector Pattern

The simplest reflection pattern splits roles into two:

RoleResponsibility
GeneratorCreates initial output or revises based on feedback
ReflectorCritiques the output and provides improvement suggestions

Implementing the Generator

def generator(topic: str, previous_draft: str = None, critique: str = None) -> str:
    """
    Generates a draft or revises it based on critique.
    """
    if previous_draft and critique:
        prompt = f"""Topic: {topic}
Previous Draft:
{previous_draft}
 
Critique/Feedback:
{critique}
 
Please rewrite the draft improving it based on the critique above.
"""
    else:
        prompt = f"Topic: {topic}\nWrite a short essay draft on the topic."
 
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

Implementing the Reflector

def reflector(draft: str) -> str:
    """
    Reads the draft and provides constructive critique.
    """
    prompt = f"""Read the following draft and provide constructive criticism.
Evaluate logic, clarity, and style. Point out up to 3 areas for improvement.
Focus on what needs to be fixed rather than praise.
 
Draft:
{draft}
"""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

The Reflexion Loop

Combine Generator and Reflector in an iterative loop:

Running the Loop

# Step 1: First Draft
print("✍️ Writing Draft V1...")
draft_v1 = generator(topic)
 
# Step 2: Reflection
print("🤔 Reflecting...")
critique_v1 = reflector(draft_v1)
 
# Step 3: Refinement
print("✍️ Writing Draft V2 (Refined)...")
draft_v2 = generator(topic, previous_draft=draft_v1, critique=critique_v1)

Example: Essay Improvement

Topic: "Can AI replace human creativity?"

Draft V1:

"The question of whether AI can replace human creativity is complex..."

Critique V1:

  1. Clarify the argument structure
  2. Add specific examples
  3. Improve stylistic flow

Draft V2 (After Reflection):

"AI, while capable of generating creative outputs, cannot fully replicate the emotional and contextual underpinnings of human creativity. Technologies like machine learning analyze patterns, but this differs from human creative experience rooted in emotions and culture..."

Key Insight: Each reflection cycle produces measurably better output by addressing specific weaknesses identified by the Reflector.

Automated Reflexion Agent

Automate the loop until a stopping condition is met:

class ReflexionAgent:
    def __init__(self, max_iterations: int = 3):
        self.max_iterations = max_iterations
 
    def run(self, topic: str):
        current_draft = generator(topic)
        print(f"📝 Initial Draft:\n{current_draft}\n")
 
        for i in range(self.max_iterations):
            print(f"--- Iteration {i+1} ---")
 
            # 1. Critique
            critique = reflector(current_draft)
            print(f"🔍 Critique:\n{critique}\n")
 
            # 2. Check stopping condition
            if "no improvements needed" in critique.lower():
                print("✨ Reflector is satisfied. Stopping.")
                break
 
            # 3. Refine
            current_draft = generator(
                topic,
                previous_draft=current_draft,
                critique=critique
            )
            print(f"✍️ Refined Draft:\n{current_draft}\n")
 
        return current_draft

Self-Debugging Code Agent

Reflection is especially powerful for coding tasks:

Implementation

def code_generator(prompt: str, error_msg: str = None, previous_code: str = None) -> str:
    if error_msg:
        full_prompt = f"""The following code resulted in an error.
Code:
{previous_code}
 
Error:
{error_msg}
 
Please fix the code. Return only the code.
"""
    else:
        full_prompt = f"Write Python code to: {prompt}. Return only the code."
 
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": full_prompt}]
    )
    return clean_code(response.choices[0].message.content)
 
def execute_code(code: str):
    """Execute and return (success, output/error)"""
    try:
        exec_globals = {}
        exec(code, exec_globals)
        return True, "Success"
    except Exception as e:
        return False, str(e)
 
# Self-correcting loop
code = code_generator(problem)
success, output = execute_code(code)
 
if not success:
    print("⚠️ Error detected! Self-correcting...")
    fixed_code = code_generator(problem, error_msg=output, previous_code=code)

Advanced Stopping Criteria

Instead of fixed iterations, use scored feedback:

def scored_reflector(draft: str) -> tuple[str, int]:
    """Returns (critique, score from 1-5)"""
    prompt = f"""Rate the draft 1-5 and provide critique.
Format: Score: X\nCritique: ...
 
Draft:
{draft}
"""
    response = client.chat.completions.create(...)
    # Parse score and critique from response
    return critique, score
 
# In the loop:
critique, score = scored_reflector(current_draft)
if score >= 4:
    print("✅ Quality threshold reached!")
    break

Reflection vs Other Patterns

PatternApproachBest For
ReActInterleaved reasoningInteractive exploration
Plan-ExecutePlan first, then actMulti-step workflows
ReflectionGenerate, critique, refineQuality improvement
DebateMultiple agents argueHandling controversial topics

Hands-on Practice

In the notebook, you will:

Build Generator & Reflector

Implement the two-role pattern for essay writing

Run the Reflexion Loop

Watch an essay improve through multiple iterations

Create a Self-Debugging Agent

Build a code agent that fixes its own errors

Experiment with Stopping Criteria

Implement score-based loop termination

Key Takeaways

  1. Self-critique improves quality - LLMs can effectively criticize their own outputs
  2. Separation of concerns - Generator and Reflector roles should be distinct
  3. Iteration beats one-shot - Multiple refinement cycles consistently improve results
  4. Stopping conditions matter - Use scores or quality thresholds, not just iteration counts

References & Further Reading

Academic Papers

Related Concepts

Next Steps

Now that you understand reflection, explore Structured Reasoning to learn about advanced decomposition and Tree of Thoughts!