Chapter 8: Shell Tool & Code Execution
The Most Dangerous Tool
A shell tool turns the agent from “a thing that reads and writes files” into “a thing that can do anything you can do at a terminal.” That’s an enormous capability boost — and the source of every horror story you’ve heard about agents wiping their authors’ machines.
This chapter is short on lines of code and long on guardrails. We’ll add two tools:
Shell— Run an arbitrary shell command. Requires approval. Has a timeout.RunCode— Write a snippet to a temp file and execute it with a chosen interpreter. Requires approval.
Both lean heavily on os/exec and context.WithTimeout.
The Shell Tool
Create tools/shell.go:
package tools
import (
"context"
"encoding/json"
"errors"
"fmt"
"os/exec"
"strings"
"time"
"github.com/yourname/agents-go/api"
)
const (
defaultShellTimeout = 30 * time.Second
maxOutputBytes = 16 * 1024
)
type Shell struct{}
func (Shell) Name() string { return "shell" }
func (Shell) RequiresApproval() bool { return true }
func (Shell) Definition() api.ToolDefinition {
return api.ToolDefinition{
Type: "function",
Name: "shell",
Description: "Execute a shell command and return its combined stdout and stderr. Use for running build tools, tests, git, and other CLI utilities. The command runs with a 30 second timeout.",
Parameters: json.RawMessage(`{
"type": "object",
"properties": {
"command": {"type": "string", "description": "The shell command to execute"}
},
"required": ["command"]
}`),
}
}
func (Shell) Execute(args json.RawMessage) (string, error) {
var params struct {
Command string `json:"command"`
}
if err := json.Unmarshal(args, ¶ms); err != nil {
return "", fmt.Errorf("invalid arguments: %w", err)
}
if strings.TrimSpace(params.Command) == "" {
return "", errors.New("missing 'command' argument")
}
ctx, cancel := context.WithTimeout(context.Background(), defaultShellTimeout)
defer cancel()
cmd := exec.CommandContext(ctx, "sh", "-c", params.Command)
output, err := cmd.CombinedOutput()
if errors.Is(ctx.Err(), context.DeadlineExceeded) {
return fmt.Sprintf("Error: command timed out after %s", defaultShellTimeout), nil
}
truncated := truncate(string(output), maxOutputBytes)
if err != nil {
var exitErr *exec.ExitError
if errors.As(err, &exitErr) {
return fmt.Sprintf("Exit code %d\n\n%s", exitErr.ExitCode(), truncated), nil
}
return fmt.Sprintf("Error running command: %v\n\n%s", err, truncated), nil
}
if truncated == "" {
return "(no output)", nil
}
return truncated, nil
}
func truncate(s string, max int) string {
if len(s) <= max {
return s
}
return s[:max] + fmt.Sprintf("\n\n[output truncated — %d bytes total]", len(s))
}
A handful of patterns are doing real work:
exec.CommandContext— Binds the command to acontext.Context. When the context’s deadline expires, Go sendsSIGKILLto the process andcmd.Waitreturns. No goroutine plumbing required.sh -c— Runs the command through a shell so the model can use pipes, redirects, and environment variables naturally. The downside is that everything happens in one process tree the model controls — there’s no sandboxing here. We’ll talk about that in Chapter 10.CombinedOutput— Captures stdout and stderr together. Tools likego testprint results to stdout but errors to stderr; the model needs to see both interleaved to make sense of failures.exec.ExitError— A non-zero exit isn’t a Go error in the bug sense. We surface the exit code and the output as a normal tool result so the model can react.- Output truncation — A
find /left running could fill the context window with garbage. We cap at 16KB and tell the model when we did.
The Code Execution Tool
Shell can already run scripts via python -c "...", but escaping multi-line code through JSON arguments is painful. RunCode makes the common case clean: write the code to a temp file and run it.
// continued in tools/shell.go
type RunCode struct{}
func (RunCode) Name() string { return "run_code" }
func (RunCode) RequiresApproval() bool { return true }
func (RunCode) Definition() api.ToolDefinition {
return api.ToolDefinition{
Type: "function",
Name: "run_code",
Description: "Write a code snippet to a temp file and execute it with the given interpreter. Useful for quick computations, experiments, or one-off scripts. 30 second timeout.",
Parameters: json.RawMessage(`{
"type": "object",
"properties": {
"language": {
"type": "string",
"description": "Language to run. Supported: python, node, bash, go.",
"enum": ["python", "node", "bash", "go"]
},
"code": {"type": "string", "description": "The source code to execute"}
},
"required": ["language", "code"]
}`),
}
}
func (RunCode) Execute(args json.RawMessage) (string, error) {
var params struct {
Language string `json:"language"`
Code string `json:"code"`
}
if err := json.Unmarshal(args, ¶ms); err != nil {
return "", fmt.Errorf("invalid arguments: %w", err)
}
if params.Code == "" {
return "", errors.New("missing 'code' argument")
}
cfg, ok := languageRunners[params.Language]
if !ok {
return fmt.Sprintf("Error: unsupported language %q", params.Language), nil
}
tmpFile, err := writeTemp(cfg.extension, params.Code)
if err != nil {
return fmt.Sprintf("Error writing temp file: %v", err), nil
}
defer removeTemp(tmpFile)
ctx, cancel := context.WithTimeout(context.Background(), defaultShellTimeout)
defer cancel()
cmdArgs := append(append([]string(nil), cfg.args...), tmpFile)
cmd := exec.CommandContext(ctx, cfg.binary, cmdArgs...)
output, err := cmd.CombinedOutput()
if errors.Is(ctx.Err(), context.DeadlineExceeded) {
return fmt.Sprintf("Error: code execution timed out after %s", defaultShellTimeout), nil
}
truncated := truncate(string(output), maxOutputBytes)
if err != nil {
var exitErr *exec.ExitError
if errors.As(err, &exitErr) {
return fmt.Sprintf("Exit code %d\n\n%s", exitErr.ExitCode(), truncated), nil
}
return fmt.Sprintf("Error running code: %v\n\n%s", err, truncated), nil
}
if truncated == "" {
return "(no output)", nil
}
return truncated, nil
}
type runner struct {
binary string
args []string
extension string
}
var languageRunners = map[string]runner{
"python": {binary: "python3", extension: ".py"},
"node": {binary: "node", extension: ".js"},
"bash": {binary: "bash", extension: ".sh"},
"go": {binary: "go", args: []string{"run"}, extension: ".go"},
}
Add the temp file helpers:
import (
"os"
)
func writeTemp(extension, content string) (string, error) {
f, err := os.CreateTemp("", "agent-run-*"+extension)
if err != nil {
return "", err
}
if _, err := f.WriteString(content); err != nil {
f.Close()
os.Remove(f.Name())
return "", err
}
if err := f.Close(); err != nil {
os.Remove(f.Name())
return "", err
}
return f.Name(), nil
}
func removeTemp(path string) {
_ = os.Remove(path)
}
Notes:
os.CreateTempwith a*in the pattern — The*is replaced by random characters, guaranteeing a unique name. We pass the extension after the*so the file ends with.py,.go, etc.- Cleanup on every error path — If we fail mid-write, we remove the partial file. If
Executereturns normally, the deferredremoveTemphandles it. - Append-with-copy for
cmdArgs—append(append([]string(nil), cfg.args...), tmpFile)builds a fresh slice instead of mutatingcfg.argsin the map. A subtle Go gotcha:appendmay or may not reuse the underlying array, so mutating shared slices is a bug waiting to happen.
Registering the Tools
Update main.go:
registry.Register(tools.Shell{})
registry.Register(tools.RunCode{})
A prompt that exercises both:
api.NewUserMessage("Write a Python script that prints the first ten Fibonacci numbers, run it, and tell me the output."),
Expected output (abbreviated):
[tool call] run_code({"language":"python","code":"a, b = 0, 1\nfor _ in range(10):\n print(a)\n a, b = b, a + b\n"})
[tool result] 0
1
1
2
3
5
8
13
21
34
The first ten Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34.
Why You Should Be Nervous
Right now there is no sandboxing. A misbehaving model can:
- Delete your home directory with
rm -rf ~ - Exfiltrate secrets via
curl ... < ~/.aws/credentials - Mine cryptocurrency in the background
- Install software, modify your shell config, …
The mitigations we already have are real but limited:
RequiresApproval() == true— In Chapter 9 the user will approve every shell call before it runs.context.WithTimeout— Caps wall-clock damage of any single call.- Output truncation — Caps token-budget damage.
The mitigations we don’t have are:
- A chroot, container, or VM around the agent process
- A read-only filesystem layer
- Network egress blocking
- A user with reduced privileges
We’ll talk about each of those in Chapter 10. For now: only run this agent in a directory you wouldn’t mind losing, on a machine you wouldn’t mind reinstalling, and approve every tool call by hand.
A Brief Word on os/exec Pitfalls
A few things that bite people writing shell tools:
- Don’t call
cmd.Output()andcmd.CombinedOutput()aftercmd.Start()— They internally callRun. Pick one entry point. - Don’t reuse a
Cmd—exec.Cmdis one-shot. Build a new one per execution. - Watch out for
PATH—exec.LookPath(whichexec.Commandcalls) uses the parent process’sPATH. If the agent is launched from an environment that doesn’t seepython3ornode,RunCodewill fail. SIGKILLon timeout means no graceful shutdown — The killed process won’t flush buffers, run defers, or clean up its own temp files. For anything more complicated than these tools, prefercontext.WithCancelplus an explicitSIGTERMfirst.
Summary
In this chapter you:
- Wrote a
shelltool that runs commands throughsh -cwith a timeout - Wrote a
run_codetool that writes snippets to temp files for several languages - Used
exec.CommandContextto bind subprocesses to deadlines - Truncated output to keep runaway commands from blowing up the context window
- Marked both tools as requiring approval — and faced up to how dangerous they still are without sandboxing
Next we’ll build the terminal UI and finally wire that approval flow into something a human can actually click through.