Building CLI AI Agents from Scratch

A hands-on guide to building a fully functional AI agent with tool calling, evaluations, context management, and human-in-the-loop safety — all from scratch using TypeScript.

Inspired by and adapted from Hendrixer/agents-v2 and the AI Agents v2 course on Frontend Masters by Scott Moss.

💻 Companion code repo: Hendrixer/agents-v2. The repo has one branch per chapter — check out lesson-01 to start, and each lesson-XX branch is the starter for chapter XX (i.e. the completed state of the previous chapter). The done branch has the finished app.

What You’ll Build

By the end of this book, you’ll have a working CLI AI agent that can:

Read, write, and manage files on your filesystem
Execute shell commands
Search the web
Execute code in multiple languages
Manage long conversations with automatic context compaction
Ask for human approval before performing dangerous operations
Be tested with single-turn and multi-turn evaluations

Tech Stack

TypeScript — Type-safe development
Vercel AI SDK — Universal LLM interface with streaming and tool calling
OpenAI — LLM provider (gpt-5-mini)
React + Ink — Terminal UI framework
Zod — Schema validation for tool parameters
ShellJS — Cross-platform shell commands
Laminar — Observability and evaluation framework

Prerequisites

Required:

Node.js 20+
An OpenAI API key (platform.openai.com)
Basic TypeScript/JavaScript knowledge (variables, functions, async/await, imports)
Comfort running commands in a terminal (npm install, npm run)

Not required:

Prior experience building CLI tools
React knowledge (a primer is included in Chapter 9)
AI/ML background — we explain everything from first principles
A Laminar API key (optional, for tracking eval results over time)

Chapter 1: Introduction to AI Agents

What are AI agents? How do they differ from simple chatbots? Set up the project from scratch and make your first LLM call.

Chapter 2: Tool Calling

Define tools with Zod schemas and teach your agent to use them. Understand structured function calling and how LLMs decide which tools to invoke.

Chapter 3: Single-Turn Evaluations

Build an evaluation framework to test whether your agent selects the right tools. Write golden, secondary, and negative test cases.

Chapter 4: The Agent Loop

Implement the core agent loop — stream responses, detect tool calls, execute them, feed results back, and repeat until the task is done.

Chapter 5: Multi-Turn Evaluations

Test full agent conversations with mocked tools. Use LLM-as-judge to score output quality. Evaluate tool ordering and forbidden tool avoidance.

Chapter 6: File System Tools

Add real filesystem tools — read, write, list, and delete files. Handle errors gracefully and give your agent the ability to work with your codebase.

Chapter 7: Web Search & Context Management

Add web search capabilities. Implement token estimation, context window tracking, and automatic conversation compaction to handle long conversations.

Chapter 8: Shell Tool

Give your agent the power to run shell commands. Add a code execution tool that writes to temp files and runs them. Understand the security implications.

Chapter 9: Human-in-the-Loop

Build an approval system for dangerous operations. Create a terminal UI with React and Ink that lets users approve or reject tool calls before execution.

Chapter 10: Going to Production

What’s missing between your learning agent and a production agent. Error recovery, sandboxing, rate limiting, prompt injection defense, agent planning, multi-agent orchestration, a production readiness checklist, and recommended reading for going deeper.

How to Read This Book

Each chapter builds on the previous one. You’ll write every line of code yourself, starting from npm init and ending with a fully functional CLI agent.

Code blocks show exactly what to type. When we modify an existing file, we’ll show the full updated file so you always have a clear picture of the current state.

By the end, your project will look like this:

agents-v2/
├── src/
│   ├── agent/
│   │   ├── run.ts              # Core agent loop
│   │   ├── executeTool.ts      # Tool dispatcher
│   │   ├── tools/
│   │   │   ├── index.ts        # Tool registry
│   │   │   ├── file.ts         # File operations
│   │   │   ├── shell.ts        # Shell commands
│   │   │   ├── webSearch.ts    # Web search
│   │   │   └── codeExecution.ts # Code runner
│   │   ├── context/
│   │   │   ├── index.ts        # Context exports
│   │   │   ├── tokenEstimator.ts
│   │   │   ├── compaction.ts
│   │   │   └── modelLimits.ts
│   │   └── system/
│   │       ├── prompt.ts       # System prompt
│   │       └── filterMessages.ts
│   ├── ui/
│   │   ├── App.tsx             # Main terminal app
│   │   ├── index.tsx           # UI exports
│   │   └── components/
│   │       ├── MessageList.tsx
│   │       ├── ToolCall.tsx
│   │       ├── ToolApproval.tsx
│   │       ├── Input.tsx
│   │       ├── TokenUsage.tsx
│   │       └── Spinner.tsx
│   ├── types.ts
│   ├── index.ts
│   └── cli.ts
├── evals/
│   ├── types.ts
│   ├── evaluators.ts
│   ├── executors.ts
│   ├── utils.ts
│   ├── mocks/tools.ts
│   ├── file-tools.eval.ts
│   ├── shell-tools.eval.ts
│   ├── agent-multiturn.eval.ts
│   └── data/
│       ├── file-tools.json
│       ├── shell-tools.json
│       └── agent-multiturn.json
├── package.json
└── tsconfig.json

Let’s get started.

Building AI Agents — TypeScript Edition