← Paulo de Vries Open source · MIT · zero dependencies

agent-guardrails

Make AI agents fail safe. Wrap an agent's actions — tool calls, shell commands, model outputs — with invariants, cost caps, and a bounded tool scope. A blocking violation halts the action before it runs.

npm i @paulodevries/agent-guardrails

~3 KB · Node 18+ · shipping to npm + GitHub now

Why

AI agents demo beautifully and fail in production — rarely on model quality, almost always on the operating layer around the model: unbounded scope, runaway cost, destructive tool calls, leaked secrets, no halt when the agent goes off the rails. The agents that actually ship share three properties: bounded scope, guardrails, and a hard stop on violation. This is that layer, in one tiny file.

Quickstart

import { createGuard, allowTools, denyDestructive, noSecrets, maxCost }
  from '@paulodevries/agent-guardrails';

const guard = createGuard([
  allowTools(['search', 'read_file', 'bash']), // bounded scope
  denyDestructive(),                            // no rm -rf /, DROP DATABASE…
  noSecrets(),                                  // never let a credential through
  maxCost(2.00),                                // cap cumulative spend
]);

// execute() only runs if every guardrail passes:
const { result } = await guard.run(
  { tool: 'bash', input: 'ls -la', cost: 0.001 },
  (action) => runTool(action),
);

// A blocking violation throws before runTool is ever called:
await guard.run({ tool: 'bash', input: 'rm -rf /' }, runTool); // ✗

Built-in guardrails

allowTools(list)tool calls outside an allowlist (bounded scope)
denyDestructive()rm -rf /, mkfs, fork bombs, DROP DATABASE, force-push main, sudo…
noSecrets()actions containing API keys / tokens / private keys
maxCost(budget)cumulative spend over a budget (the cost-spiral problem)
maxCalls(limit)runaway loops — caps total executed actions
validate(fn)actions/outputs that fail a schema or predicate
warnOn(fn)non-blocking — flags an action without stopping it

Writing your own is one function: { name, check(action) } returning true, a reason string, or { reason, severity:'warn' }.

Built from patterns proven running an autonomous multi-agent system unattended across 40+ projects. Pairs with agent-eval → (the test-time half) as a two-piece reliability toolkit. The thinking behind both: Agent reliability is a guardrails problem, not a model problem →. Feedback + PRs welcome.