agent-guardrails
Make AI agents fail safe. Wrap an agent's actions — tool calls, shell commands, model outputs — with invariants, cost caps, and a bounded tool scope. A blocking violation halts the action before it runs.
npm i @paulodevries/agent-guardrailsWhy
AI agents demo beautifully and fail in production — rarely on model quality, almost always on the operating layer around the model: unbounded scope, runaway cost, destructive tool calls, leaked secrets, no halt when the agent goes off the rails. The agents that actually ship share three properties: bounded scope, guardrails, and a hard stop on violation. This is that layer, in one tiny file.
Quickstart
import { createGuard, allowTools, denyDestructive, noSecrets, maxCost }
from '@paulodevries/agent-guardrails';
const guard = createGuard([
allowTools(['search', 'read_file', 'bash']), // bounded scope
denyDestructive(), // no rm -rf /, DROP DATABASE…
noSecrets(), // never let a credential through
maxCost(2.00), // cap cumulative spend
]);
// execute() only runs if every guardrail passes:
const { result } = await guard.run(
{ tool: 'bash', input: 'ls -la', cost: 0.001 },
(action) => runTool(action),
);
// A blocking violation throws before runTool is ever called:
await guard.run({ tool: 'bash', input: 'rm -rf /' }, runTool); // ✗
Built-in guardrails
| allowTools(list) | tool calls outside an allowlist (bounded scope) |
| denyDestructive() | rm -rf /, mkfs, fork bombs, DROP DATABASE, force-push main, sudo… |
| noSecrets() | actions containing API keys / tokens / private keys |
| maxCost(budget) | cumulative spend over a budget (the cost-spiral problem) |
| maxCalls(limit) | runaway loops — caps total executed actions |
| validate(fn) | actions/outputs that fail a schema or predicate |
| warnOn(fn) | non-blocking — flags an action without stopping it |
Writing your own is one function: { name, check(action) } returning true, a reason string, or { reason, severity:'warn' }.
Built from patterns proven running an autonomous multi-agent system unattended across 40+ projects. Pairs with agent-eval → (the test-time half) as a two-piece reliability toolkit. The thinking behind both: Agent reliability is a guardrails problem, not a model problem →. Feedback + PRs welcome.