English
Week 2: Reasoning
01. RAG & Memory

01. Planning Agents & Chain of Thought

Why Planning Matters

The ReAct pattern from Week 1 works well for simple tasks, but complex problems require a different approach. When agents improvise step-by-step, they can:

  • Get stuck in infinite loops
  • Lose track of the overall goal
  • Make myopic decisions that hurt long-term outcomes

Planning Agents solve this by creating a complete plan before taking action—just like how humans think before they act.

Chain of Thought (CoT) Prompting

The foundation of planning is Chain of Thought—prompting the LLM to "think step by step."

The Magic Phrase

Simply adding "Let's think step by step" dramatically improves reasoning:

# Standard Prompting
prompt_standard = f"Question: {problem}\nAnswer:"
 
# CoT Prompting
prompt_cot = f"Question: {problem}\nLet's think step by step."

Why CoT Works

AspectStandard PromptChain of Thought
ProcessDirect answerExplicit reasoning steps
AccuracyProne to errors on complex problemsHigher accuracy
TransparencyBlack boxVisible reasoning
Error DetectionHard to identifyErrors visible in reasoning

Research Insight: Wei et al. (2022) showed that CoT prompting can improve performance on math word problems from 17.9% to 78.7% accuracy on the GSM8K benchmark.

CoT in Practice

problem = """
John had 5 apples. He gave 2 to Mary and ate 1.
Then he got 3 more from Mike, but dropped 1 on the way home.
How many apples does John have now?
"""
 
def get_completion(prompt):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        temperature=0
    )
    return response.choices[0].message.content
 
# With CoT
response = get_completion(f"{problem}\nLet's think step by step.")
# Output:
# 1. John starts with 5 apples
# 2. Gives 2 to Mary: 5 - 2 = 3 apples
# 3. Eats 1: 3 - 1 = 2 apples
# 4. Gets 3 from Mike: 2 + 3 = 5 apples
# 5. Drops 1: 5 - 1 = 4 apples
# Answer: 4 apples

Plan-and-Execute Pattern

Building on CoT, the Plan-and-Execute pattern separates planning from execution:

Architecture Components

Planner

Analyzes the query and creates a structured plan with ordered steps

Executor

Executes each step using available tools

Synthesizer

Combines results from all steps into a final answer

Implementing the Planner

Use Pydantic for structured output:

from pydantic import BaseModel, Field
from typing import List
 
class PlanStep(BaseModel):
    id: int = Field(description="Step number (starts from 1)")
    description: str = Field(description="What to do in this step")
    tool: str = Field(description="Tool to use (search or calculate)")
    args: str = Field(description="Arguments for the tool")
 
class Plan(BaseModel):
    steps: List[PlanStep] = Field(description="Ordered list of steps")
 
def create_plan(query: str) -> Plan:
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Create a step-by-step plan using available tools."},
            {"role": "user", "content": query}
        ],
        response_format=Plan
    )
    return completion.choices[0].message.parsed

Implementing the Executor

def execute_plan(plan: Plan, tools: dict) -> str:
    results = {}
 
    for step in plan.steps:
        if step.tool in tools:
            result = tools[step.tool](step.args)
            results[step.id] = result
            print(f"Step {step.id}: {result}")
 
    # Synthesize final answer
    return synthesize_results(results)

Plan-and-Execute vs ReAct

FeatureReActPlan-and-Execute
ApproachInterleaved thinking and actingPlan first, then execute
FlexibilityHighly adaptiveFollows predetermined plan
Best ForExploration, interactive tasksMulti-step analysis, recipes
WeaknessCan get lost (myopic)Rigid if plan is wrong
RecoveryNatural adaptationRequires explicit replanning

Advanced: Replanning

When execution fails, a Replanner can adjust the strategy:

def execute_with_replanning(plan: Plan, max_replans: int = 2):
    for attempt in range(max_replans):
        results, failed_step = execute_plan(plan)
 
        if failed_step is None:
            return results  # Success!
 
        # Replan from the failed step
        plan = replan(plan, failed_step, results)
 
    return results  # Best effort

Hands-on Practice

In the notebook, you will:

Experience CoT

Compare standard prompting vs. CoT prompting on logic puzzles

Build a Planner

Create a Pydantic-based planner that outputs structured plans

Implement Execution

Execute plans using mock search and calculator tools

Run the Full Pipeline

Combine planning and execution for complex queries

Key Takeaways

  1. CoT is the foundation - "Think step by step" dramatically improves reasoning
  2. Separate concerns - Planning and execution are different cognitive tasks
  3. Structured output - Use Pydantic/JSON Schema for reliable plan formats
  4. Plan recovery - Implement replanning for robust agents

References & Further Reading

Academic Papers

Next Steps

Now that you understand planning, head to Reflection Agents to learn how agents can critique and improve their own outputs!