02. Reflection Agents
Why Reflection Matters
When humans make mistakes, we think: "Why was I wrong?" and try not to repeat the same error. However, basic LLMs are stateless—they don't remember previous attempts and often produce the same wrong answer.
Reflection Agents critique their own work and use that critique to improve. It's like a writer drafting and then continuously revising their work.
The Generator-Reflector Pattern
The simplest reflection pattern splits roles into two:
| Role | Responsibility |
|---|---|
| Generator | Creates initial output or revises based on feedback |
| Reflector | Critiques the output and provides improvement suggestions |
Implementing the Generator
def generator(topic: str, previous_draft: str = None, critique: str = None) -> str:
"""
Generates a draft or revises it based on critique.
"""
if previous_draft and critique:
prompt = f"""Topic: {topic}
Previous Draft:
{previous_draft}
Critique/Feedback:
{critique}
Please rewrite the draft improving it based on the critique above.
"""
else:
prompt = f"Topic: {topic}\nWrite a short essay draft on the topic."
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.contentImplementing the Reflector
def reflector(draft: str) -> str:
"""
Reads the draft and provides constructive critique.
"""
prompt = f"""Read the following draft and provide constructive criticism.
Evaluate logic, clarity, and style. Point out up to 3 areas for improvement.
Focus on what needs to be fixed rather than praise.
Draft:
{draft}
"""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.contentThe Reflexion Loop
Combine Generator and Reflector in an iterative loop:
Running the Loop
# Step 1: First Draft
print("✍️ Writing Draft V1...")
draft_v1 = generator(topic)
# Step 2: Reflection
print("🤔 Reflecting...")
critique_v1 = reflector(draft_v1)
# Step 3: Refinement
print("✍️ Writing Draft V2 (Refined)...")
draft_v2 = generator(topic, previous_draft=draft_v1, critique=critique_v1)Example: Essay Improvement
Topic: "Can AI replace human creativity?"
Draft V1:
"The question of whether AI can replace human creativity is complex..."
Critique V1:
- Clarify the argument structure
- Add specific examples
- Improve stylistic flow
Draft V2 (After Reflection):
"AI, while capable of generating creative outputs, cannot fully replicate the emotional and contextual underpinnings of human creativity. Technologies like machine learning analyze patterns, but this differs from human creative experience rooted in emotions and culture..."
Key Insight: Each reflection cycle produces measurably better output by addressing specific weaknesses identified by the Reflector.
Automated Reflexion Agent
Automate the loop until a stopping condition is met:
class ReflexionAgent:
def __init__(self, max_iterations: int = 3):
self.max_iterations = max_iterations
def run(self, topic: str):
current_draft = generator(topic)
print(f"📝 Initial Draft:\n{current_draft}\n")
for i in range(self.max_iterations):
print(f"--- Iteration {i+1} ---")
# 1. Critique
critique = reflector(current_draft)
print(f"🔍 Critique:\n{critique}\n")
# 2. Check stopping condition
if "no improvements needed" in critique.lower():
print("✨ Reflector is satisfied. Stopping.")
break
# 3. Refine
current_draft = generator(
topic,
previous_draft=current_draft,
critique=critique
)
print(f"✍️ Refined Draft:\n{current_draft}\n")
return current_draftSelf-Debugging Code Agent
Reflection is especially powerful for coding tasks:
Implementation
def code_generator(prompt: str, error_msg: str = None, previous_code: str = None) -> str:
if error_msg:
full_prompt = f"""The following code resulted in an error.
Code:
{previous_code}
Error:
{error_msg}
Please fix the code. Return only the code.
"""
else:
full_prompt = f"Write Python code to: {prompt}. Return only the code."
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": full_prompt}]
)
return clean_code(response.choices[0].message.content)
def execute_code(code: str):
"""Execute and return (success, output/error)"""
try:
exec_globals = {}
exec(code, exec_globals)
return True, "Success"
except Exception as e:
return False, str(e)
# Self-correcting loop
code = code_generator(problem)
success, output = execute_code(code)
if not success:
print("⚠️ Error detected! Self-correcting...")
fixed_code = code_generator(problem, error_msg=output, previous_code=code)Advanced Stopping Criteria
Instead of fixed iterations, use scored feedback:
def scored_reflector(draft: str) -> tuple[str, int]:
"""Returns (critique, score from 1-5)"""
prompt = f"""Rate the draft 1-5 and provide critique.
Format: Score: X\nCritique: ...
Draft:
{draft}
"""
response = client.chat.completions.create(...)
# Parse score and critique from response
return critique, score
# In the loop:
critique, score = scored_reflector(current_draft)
if score >= 4:
print("✅ Quality threshold reached!")
breakReflection vs Other Patterns
| Pattern | Approach | Best For |
|---|---|---|
| ReAct | Interleaved reasoning | Interactive exploration |
| Plan-Execute | Plan first, then act | Multi-step workflows |
| Reflection | Generate, critique, refine | Quality improvement |
| Debate | Multiple agents argue | Handling controversial topics |
Hands-on Practice
In the notebook, you will:
Build Generator & Reflector
Implement the two-role pattern for essay writing
Run the Reflexion Loop
Watch an essay improve through multiple iterations
Create a Self-Debugging Agent
Build a code agent that fixes its own errors
Experiment with Stopping Criteria
Implement score-based loop termination
Key Takeaways
- Self-critique improves quality - LLMs can effectively criticize their own outputs
- Separation of concerns - Generator and Reflector roles should be distinct
- Iteration beats one-shot - Multiple refinement cycles consistently improve results
- Stopping conditions matter - Use scores or quality thresholds, not just iteration counts
References & Further Reading
Academic Papers
-
"Reflexion: Language Agents with Verbal Reinforcement Learning" - Shinn et al., 2023
- arXiv:2303.11366 (opens in a new tab)
- Foundation for verbal self-reflection in agents
-
"Self-Refine: Iterative Refinement with Self-Feedback" - Madaan et al., 2023
- arXiv:2303.17651 (opens in a new tab)
- Shows iterative self-feedback improves generation across tasks
-
"Constitutional AI: Harmlessness from AI Feedback" - Anthropic, 2022
- arXiv:2212.08073 (opens in a new tab)
- Using AI to critique and improve AI responses
-
"Language Models can Solve Computer Tasks" - Kim et al., 2023
- arXiv:2303.17491 (opens in a new tab)
- Self-debugging approaches for code generation
Related Concepts
- Critique Models: LLM Critics (opens in a new tab)
- Self-Consistency: Sampling-based self-verification (opens in a new tab)
Next Steps
Now that you understand reflection, explore Structured Reasoning to learn about advanced decomposition and Tree of Thoughts!