Chapter 9: Human-in-the-Loop
💻 Code: start from the
09-hitlbranch of the companion repo. The branch’snotes/09-HITL.mdhas the code you’ll write in this chapter. The finished app is on thedonebranch.
The Safety Layer
We’ve built an agent with seven tools. Four of them can modify your system: write_file, delete_file, run_command, and execute_code. Right now, the agent auto-approves everything — if the LLM says “delete this file,” it happens immediately.
Human-in-the-Loop (HITL) means the agent pauses before dangerous operations and asks the user: “I want to do this. Should I proceed?”
This is the final piece. After this chapter, you’ll have a complete, safe CLI agent.
The Architecture
HITL fits into the agent loop we built in Chapter 4. The flow becomes:
1. LLM requests tool call
2. Is this tool dangerous?
- No (read_file, list_files, web_search) → Execute immediately
- Yes (write_file, delete_file, run_command, execute_code) → Ask for approval
3. User approves → Execute
User rejects → Stop the loop, return what we have
4. Continue
The approval mechanism uses the on_tool_approval callback we defined in our AgentCallbacks dataclass back in Chapter 1.
Building the Terminal UI
Now we need a terminal interface where users can:
- Type messages
- See streaming responses
- See tool calls happening
- Approve or reject dangerous tools
- See token usage
We’ll use Rich for output formatting and Prompt Toolkit for interactive input. Together, they give us a polished terminal experience.
Quick Primer: Rich + Prompt Toolkit
If you haven’t used these libraries:
Rich handles output — colors, panels, tables, spinners, markdown rendering:
from rich.console import Console
from rich.panel import Panel
console = Console()
console.print("[bold green]Hello[/bold green] from Rich!")
console.print(Panel("This is a panel", title="Info"))
Prompt Toolkit handles input — interactive prompts with history, key bindings, and async support:
from prompt_toolkit import prompt
user_input = prompt(">>> ")
Think of Rich as console.log on steroids and Prompt Toolkit as input() on steroids.
The Spinner
Create src/ui/spinner.py:
from rich.console import Console
from rich.spinner import Spinner as RichSpinner
from rich.live import Live
class Spinner:
"""A terminal spinner for showing loading state."""
def __init__(self, label: str = "Thinking..."):
self.console = Console()
self.label = label
self.live = None
def start(self):
self.live = Live(
RichSpinner("dots", text=f" {self.label}"),
console=self.console,
refresh_per_second=10,
)
self.live.start()
def stop(self):
if self.live:
self.live.stop()
self.live = None
The Message List
Create src/ui/message_list.py:
from rich.console import Console
from rich.text import Text
console = Console()
def print_message(role: str, content: str) -> None:
"""Print a chat message with color coding."""
if role == "user":
label = Text("› You", style="bold blue")
else:
label = Text("› Assistant", style="bold green")
console.print(label)
console.print(f" {content}")
console.print()
Tool Call Display
Create src/ui/tool_call.py:
from rich.console import Console
from rich.text import Text
console = Console()
def print_tool_start(name: str, args: dict = None) -> None:
"""Show a tool call starting."""
summary = ""
if args:
for key in ("path", "command", "query", "code", "content"):
if key in args and isinstance(args[key], str):
value = args[key]
if len(value) > 50:
value = value[:50] + "..."
summary = f"({value})"
break
console.print(f" ⚡ [bold yellow]{name}[/bold yellow]{summary} ...", end="")
def print_tool_end(name: str, result: str) -> None:
"""Show a tool call completed."""
console.print(" [green]✓[/green]")
truncated = result[:100] + "..." if len(result) > 100 else result
console.print(f" [dim]→ {truncated}[/dim]")
Token Usage Display
Create src/ui/token_usage.py:
from rich.console import Console
from rich.panel import Panel
from src.types import TokenUsageInfo
console = Console()
def print_token_usage(usage: TokenUsageInfo) -> None:
"""Display token usage with color-coded percentage."""
threshold_percent = round(usage.threshold * 100)
usage_percent = f"{usage.percentage:.1f}"
# Color based on usage
if usage.percentage >= usage.threshold * 100:
color = "red"
elif usage.percentage >= usage.threshold * 100 * 0.75:
color = "yellow"
else:
color = "green"
text = f"Tokens: [{color} bold]{usage_percent}%[/{color} bold] [dim](threshold: {threshold_percent}%)[/dim]"
console.print(Panel(text, border_style="dim"))
The Tool Approval Component
This is the HITL component — the heart of this chapter. Create src/ui/tool_approval.py:
import json
from rich.console import Console
from rich.panel import Panel
from prompt_toolkit import prompt
from prompt_toolkit.key_binding import KeyBindings
console = Console()
MAX_PREVIEW_LINES = 5
def format_args_preview(args: dict) -> tuple[str, int]:
"""Format args as JSON preview with line limit."""
formatted = json.dumps(args, indent=2)
lines = formatted.split("\n")
if len(lines) <= MAX_PREVIEW_LINES:
return formatted, 0
preview = "\n".join(lines[:MAX_PREVIEW_LINES])
extra = len(lines) - MAX_PREVIEW_LINES
return preview, extra
def get_args_summary(args) -> str:
"""Get a one-line summary of the most meaningful arg."""
if not isinstance(args, dict):
return str(args)
for key in ("path", "filePath", "command", "query", "code", "content"):
if key in args and isinstance(args[key], str):
value = args[key]
if len(value) > 50:
return value[:50] + "..."
return value
keys = list(args.keys())
if keys and isinstance(args[keys[0]], str):
value = args[keys[0]]
if len(value) > 50:
return value[:50] + "..."
return value
return ""
def request_approval(tool_name: str, args: dict) -> bool:
"""Show tool approval prompt and return True if approved."""
console.print()
console.print("[bold yellow]Tool Approval Required[/bold yellow]")
summary = get_args_summary(args)
summary_text = f" [dim]({summary})[/dim]" if summary else ""
console.print(f" [bold cyan]{tool_name}[/bold cyan]{summary_text}")
preview, extra = format_args_preview(args)
console.print(f" [dim]{preview}[/dim]")
if extra > 0:
console.print(f" [dim]... +{extra} more lines[/dim]")
console.print()
while True:
try:
answer = prompt(" Approve? [Y/n] ").strip().lower()
if answer in ("", "y", "yes"):
return True
if answer in ("n", "no"):
return False
console.print(" [dim]Please enter Y or N[/dim]")
except (KeyboardInterrupt, EOFError):
return False
The approval component:
- Shows the tool name in cyan
- Shows a one-line summary — for
run_command, the command; forwrite_file, the path - Shows the full args as formatted JSON (truncated to 5 lines)
- Prompts Y/n — Enter defaults to Yes, Ctrl+C defaults to No
The Main App
Create src/ui/app.py — the component that wires everything together:
import asyncio
from typing import Any
from rich.console import Console
from prompt_toolkit import prompt as pt_prompt
from prompt_toolkit.patch_stdout import patch_stdout
from src.agent.run import run_agent
from src.types import AgentCallbacks, TokenUsageInfo
from src.ui.message_list import print_message
from src.ui.tool_call import print_tool_start, print_tool_end
from src.ui.tool_approval import request_approval
from src.ui.token_usage import print_token_usage
from src.ui.spinner import Spinner
console = Console()
def run_app():
"""Main application loop."""
console.print("[bold magenta]🤖 AI Agent[/bold magenta] [dim](type 'exit' to quit)[/dim]")
console.print()
conversation_history: list[dict[str, Any]] = []
token_usage_info: TokenUsageInfo | None = None
while True:
# Get user input
try:
user_input = pt_prompt("> ").strip()
except (KeyboardInterrupt, EOFError):
console.print("\nGoodbye!")
break
if not user_input:
continue
if user_input.lower() in ("exit", "quit"):
console.print("Goodbye!")
break
print_message("user", user_input)
# Track streaming state
streaming_text = ""
spinner = Spinner()
spinner_active = False
def on_token(token: str):
nonlocal streaming_text, spinner_active
if spinner_active:
spinner.stop()
spinner_active = False
console.print("[bold green]› Assistant[/bold green]")
console.print(" ", end="")
streaming_text += token
console.print(token, end="", highlight=False)
def on_tool_call_start(name: str, args: Any):
nonlocal spinner_active
if spinner_active:
spinner.stop()
spinner_active = False
print_tool_start(name, args if isinstance(args, dict) else {})
def on_tool_call_end(name: str, result: str):
print_tool_end(name, result)
def on_complete(response: str):
nonlocal spinner_active
if spinner_active:
spinner.stop()
spinner_active = False
if streaming_text:
console.print() # Newline after streamed text
console.print()
async def on_tool_approval(name: str, args: Any) -> bool:
return request_approval(name, args if isinstance(args, dict) else {})
def on_token_usage(usage: TokenUsageInfo):
nonlocal token_usage_info
token_usage_info = usage
# Start spinner
spinner.start()
spinner_active = True
try:
new_history = run_agent(
user_input,
conversation_history,
AgentCallbacks(
on_token=on_token,
on_tool_call_start=on_tool_call_start,
on_tool_call_end=on_tool_call_end,
on_complete=on_complete,
on_tool_approval=on_tool_approval,
on_token_usage=on_token_usage,
),
)
conversation_history = new_history
except Exception as e:
if spinner_active:
spinner.stop()
console.print(f"\n [red]Error: {e}[/red]")
console.print()
# Show token usage
if token_usage_info:
print_token_usage(token_usage_info)
streaming_text = ""
Entry Point
Update src/main.py:
from dotenv import load_dotenv
load_dotenv()
from src.ui.app import run_app
def main():
run_app()
if __name__ == "__main__":
main()
UI Barrel
Create src/ui/__init__.py:
from src.ui.app import run_app
from src.ui.message_list import print_message
from src.ui.tool_call import print_tool_start, print_tool_end
from src.ui.spinner import Spinner
How the HITL Flow Works
Let’s trace through a concrete scenario:
User types: “Create a file called hello.txt with ‘Hello World’”
run_agentstarts, streams tokens, LLM decides to callwrite_file- The agent loop hits
callbacks.on_tool_approval("write_file", {...}) - The callback calls
request_approval()which prints the approval prompt - The user sees:
Tool Approval Required
write_file(hello.txt)
{
"path": "hello.txt",
"content": "Hello World"
}
Approve? [Y/n]
- User presses Enter (Y is default) → returns
True - The agent loop continues →
execute_tool("write_file", ...)runs → file is created - The LLM generates its final response
If the user had typed “n”:
request_approvalreturnsFalserejected = Truein the agent loop- The loop breaks immediately
Running the Complete Agent
python -m src.main
You now have a fully functional CLI AI agent with:
- Multi-turn conversations
- Streaming responses
- 7 tools (read, write, list, delete, shell, code execution, web search)
- Human approval for dangerous operations
- Token usage tracking
- Automatic conversation compaction
Try some prompts:
> What files are in this project?
> Read the pyproject.toml and tell me about it
> Create a file called test.txt with "Hello from the agent"
> Run ls -la to see all files
> Search the web for the latest Python version
For the write_file and run_command calls, you’ll be prompted to approve before they execute.
Summary
In this chapter you:
- Built a complete terminal UI with Rich and Prompt Toolkit
- Implemented human-in-the-loop approval for dangerous tools
- Created components for message display, tool calls, input, and token usage
- Assembled the complete application
Congratulations — you’ve built a CLI AI agent from scratch. Every line of code, from the first pip install to the final approval prompt, is something you wrote and understand.
What’s Next?
Here are some ideas for extending the agent:
- Persistent memory — Save conversation summaries to disk
- Custom tools — Add tools for your specific workflow
- Better approval UX — Allow editing tool args before approving
- Multi-model support — Switch between OpenAI, Anthropic, and others
- Plugin system — Let users add tools without modifying core code
The architecture supports all of these.
Happy building.