Configuration Reference

Complete reference for axis.config.{json|js|mjs|ts}.

Full Example

AXIS is configured via an axis.config.* file in your project root. JSON is the default; JavaScript and TypeScript configs are also supported and let you compose your config programmatically. Here is a JSON example showing all available fields:

{
  "scenarios": "./scenarios",
  "agents": [
    "claude-code",
    {
      "agent": "gemini",
      "model": "gemini-2.5-pro",
      "scenarios": ["cms/*"],
      "flags": { "yolo": true }
    }
  ],
  "settings": {
    "concurrency": 4,
    "scoring_weights": {
      "goal_achievement": 0.4,
      "environment": 0.2,
      "service": 0.2,
      "agent": 0.2
    },
    "limits": {
      "run": { "time_minutes": 60, "tokens": 2000000 },
      "scenario": { "time_minutes": 10, "tokens": 200000 }
    }
  },
  "env": ["ANTHROPIC_API_KEY", "GEMINI_API_KEY"],
  "mcp_servers": {
    "filesystem": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
    }
  },
  "judging": {
    "agents": ["claude-code", "codex"]
  },
  "skills": ["./skills/deploy"],
  "adapters": {
    "my-agent": "./adapters/my-agent.ts"
  },
  "beforeAll": [
    { "action": "run_script", "command": "docker compose up -d test-db" }
  ],
  "afterAll": [
    { "action": "run_script", "command": "echo \"done: $AXIS_COMPLETED/$AXIS_TOTAL (report: $AXIS_REPORT_DIR)\"" }
  ]
}

Config File Formats

AXIS resolves the config file by extension, in priority order: axis.config.tsaxis.config.jsaxis.config.mjsaxis.config.json. Use --config <path> to point at a specific file.

Extension Loader Notes
.json Native JSON parse Static config, no executable logic.
.js / .mjs / .cjs Native dynamic import ESM module. Default export is the config object or a function returning one.
.ts / .mts / .cts Loaded via jiti No build step needed. Type-only imports are stripped at runtime.

JavaScript / TypeScript configs

JS and TS configs let you build the config programmatically. Useful for sharing logic across scenarios, deriving values from environment variables, or generating large numbers of scenarios from a fixture set. The module's default export must be either the config object directly or a (sync or async) function that returns one:

// axis.config.ts
import type { AxisConfig, InlineScenario } from "@netlify/axis";
import applyLimits from "./scenarios/apply-limits.js";
import authorScenario from "./scenarios/author-scenario.js";

const dynamicScenarios: InlineScenario[] = ["alpha", "beta", "gamma"].map((id) => ({
  key: "smoke-" + id,
  name: "Smoke test " + id,
  prompt: "Do thing for " + id,
  rubric: [{ check: "Did the thing" }],
}));

export default {
  scenarios: [
    "./scenarios",
    applyLimits,
    authorScenario,
    ...dynamicScenarios,
  ],
  agents: ["claude-code"],
  settings: {
    limits: { run: { time_minutes: 60 } },
  },
} satisfies AxisConfig;

Or as a function (sync or async):

// axis.config.ts
import type { AxisConfig } from "@netlify/axis";

export default async () => {
  const fixtures = await loadFixtures();
  const config: AxisConfig = {
    scenarios: fixtures.map(buildScenario),
    agents: ["claude-code"],
  };
  return config;
};
Generating an axis.config file

axis init --format ts (or --format js) scaffolds a typed config file alongside a sample JSON scenario. Without --format, AXIS produces a .json config to preserve back-compat.

Top-Level Fields

Field Type Required Description
scenarios string | (string | InlineScenario)[] No A path to the scenarios directory, or an array of paths and/or inline scenario objects. Inline entries must include a key; entries loaded from files take their key from the file path. Defaults to "./scenarios" when omitted. Array entries may also be git repo URLs; see Remote scenarios. See Authoring scenarios for the full schema.
agents (string | AgentConfig)[] Yes Agent names or full agent configurations.
settings object No Concurrency, scoring weight, and limit overrides.
env string[] No Additional environment variables to pass through to agent processes.
mcp_servers object No MCP servers available to all agents.
judging object No Precedence-ordered list of judge agents for scoring. See Judging Agents. When omitted, each run is judged by its own agent.
skills string[] No Skills available to all agents.
adapters object No Custom agent module paths, keyed by agent name.
artifacts string[] No Glob patterns of files to capture from each scenario's workspace after teardown. Merged with per-scenario artifacts.
beforeAll LifecycleAction[] No Lifecycle actions that run once before any scenarios start. See Run-Level Lifecycle.
afterAll LifecycleAction[] No Lifecycle actions that run once after every scenario has been scored and the report is finalized. See Run-Level Lifecycle.

Run-Level Lifecycle

beforeAll and afterAll are run-level counterparts to a scenario's setup and teardown: they fire once per run rather than once per scenario. Use them to spin up shared infrastructure before any agents start, or to upload the final report and send a completion notification after everything is scored.

Both fields accept the same lifecycle action types as scenario hooks (run_script and copy), and scripts run with the config directory as their working directory.

{
  "beforeAll": [
    { "action": "run_script", "command": "docker compose up -d test-postgres" }
  ],
  "afterAll": [
    { "action": "run_script", "command": "./scripts/notify-slack.sh" },
    { "action": "run_script", "command": "docker compose down" }
  ]
}

A typical afterAll script can use the AXIS_* environment variables below to assemble a summary message:

#!/usr/bin/env bash
# scripts/notify-slack.sh
curl -X POST "$SLACK_WEBHOOK" \
  -H 'Content-Type: application/json' \
  -d "{\"text\": \"AXIS run: $AXIS_COMPLETED/$AXIS_TOTAL passed in $AXIS_DURATION_MS ms (report: $AXIS_REPORT_DIR)\"}"
Run-Level Lifecycle Details
  • Hooks fire from the axis CLI only; the programmatic run() API does not invoke them. Library users own their own orchestration.
  • beforeAll runs before the report directory is created. A failure (non-zero exit) aborts the entire run with no report on disk.
  • afterAll runs after every scenario has been scored and the report is finalized, so $AXIS_REPORT_DIR/report.json is readable. A failure causes a non-zero CLI exit but does not erase the report.
  • Both hooks honour the per-action 3-minute timeout. Each action runs sequentially; the first non-zero exit aborts the phase.

Run-level lifecycle environment variables

Both phases get the shared $AXIS_OUTPUT markdown sink and a AXIS_PHASE discriminator (beforeAll or afterAll). afterAll additionally receives summary stats and the path to the finalized report:

Variable Phase Value
AXIS_PHASE Both Either beforeAll or afterAll.
AXIS_OUTPUT Both Path to a per-phase markdown file. Anything written here surfaces in the CLI log.
AXIS_REPORT_DIR afterAll Absolute path to the just-written .axis/reports/{reportId}/ directory. report.json, report.html, and the per-scenario JSON files are all on disk by the time this script runs.
AXIS_TOTAL afterAll Number of jobs executed (agent × scenario combinations).
AXIS_COMPLETED afterAll Number of jobs that finished successfully.
AXIS_FAILED afterAll Number of jobs that failed.
AXIS_DURATION_MS afterAll Total run duration in milliseconds.

Agent Configuration

Each entry in the agents array can be a simple string (agent name with defaults) or a full configuration object.

Field Type Required Description
agent string Yes Agent name: claude-code, codex, gemini, goose, etc.
model string No Model override passed to the agent CLI.
scenarios string[] No Subset of scenarios to run. Supports glob patterns like cms/*.
skills string[] No Agent-specific skills (merged with top-level skills).
flags object No CLI flags passed to the agent, e.g. {"full-auto": true}.
command string No Custom CLI command (for custom agents).

Scoring Weights

Override the default dimension weights under settings.scoring_weights. Values must sum to 1.0. See Scoring Framework for what each dimension measures.

Field Type Required Description
goal_achievement number No Goal Achievement weight. Default: 0.4.
environment number No Environment weight. Default: 0.2.
service number No Service weight. Default: 0.2.
agent number No Agent weight. Default: 0.2.
{
  "settings": {
    "scoring_weights": {
      "goal_achievement": 0.5,
      "environment": 0.2,
      "service": 0.2,
      "agent": 0.1
    }
  }
}

Limits

Limits control how much time and tokens a run or individual scenario can consume. This prevents runaway agents from consuming unbounded resources. Limits can be configured at three levels:

Default behavior

Even without any limits configured, each scenario has a default time limit of 15 minutes. You can override this by setting settings.limits.scenario.time_minutes or by adding limits.time_minutes to individual scenarios.

Limit fields

Field Type Description
time_minutes number Maximum wall-clock time in minutes. Accepts fractional values (e.g. 0.5 for 30 seconds). Default: 15 per scenario.
tokens number Maximum total tokens (input + output + cache). Must be a positive integer. No default.

Overall run limits

Set settings.limits.run to cap the total time or tokens across the entire run. When an overall limit is reached, all remaining and currently running jobs are immediately terminated and marked as failed.

{
  "settings": {
    "limits": {
      "run": { "time_minutes": 60, "tokens": 2000000 }
    }
  }
}

Per-scenario limits

Set settings.limits.scenario to define default per-job budgets. These can be overridden by adding a limits field directly in a scenario file.

// axis.config.json: default for all scenarios
{
  "settings": {
    "limits": {
      "scenario": { "time_minutes": 10, "tokens": 200000 }
    }
  }
}

// scenarios/expensive-task.json: override for one scenario
{
  "name": "Expensive task",
  "prompt": "...",
  "rubric": "...",
  "limits": { "time_minutes": 30, "tokens": 500000 }
}
Token limit accuracy

Token limits are enforced using a conservative estimate during execution (based on streamed assistant text). The actual token count may slightly exceed the limit before the job is terminated. The authoritative token count from the agent's API is used for overall run limit tracking.

MCP Servers

Configure Model Context Protocol servers that are automatically wired into each agent environment. AXIS supports both stdio (local process) and HTTP (remote endpoint) servers.

Field Type Required Description
type "stdio" | "http" Yes Server transport type.
command string Yes Command to start the server process (stdio only).
args string[] No Arguments passed to the command.
env object No Environment variables for the server process.
url string Yes Remote server endpoint URL (http only).
headers object No HTTP headers (supports ${VAR} env interpolation).
{
  "mcp_servers": {
    "filesystem": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
      "env": { "LOG_LEVEL": "info" }
    },
    "remote-api": {
      "type": "http",
      "url": "https://mcp.example.com/tools",
      "headers": { "Authorization": "Bearer ${TOKEN}" }
    }
  }
}

Each NDJSON-style agent writes MCP configuration in its native format before spawning. ACP-based adapters (claude-sdk, codex-sdk, gemini, and every other ACP agent) pass MCP servers through the ACP session/new call instead.

Agent Config File Location
claude-code .mcp.json Workspace root
codex config.toml CODEX_HOME

Judging Agents

By default, every run is judged by the same agent that produced it, so the agent under test scores its own work. Set judging.agents to a precedence-ordered list of judge candidates. For each run, AXIS picks the first entry whose adapter name differs from the run's own agent so a fresh perspective evaluates the work. If every entry matches the run's own agent, the first entry is used.

Each entry accepts the same shorthand as agents: a string (agent name) or a full AgentConfig object with model and adapter flags. Every candidate judge must be installed and have any required environment variables set; AXIS validates this during pre-flight before any jobs run.

{
  "agents": ["claude-code", "codex", "gemini"],
  "judging": {
    "agents": ["claude-code", "codex"]
  }
}

With the config above, runs by claude-code are judged by codex (first entry whose adapter differs), and runs by codex or gemini are judged by claude-code. Pin a specific model on any candidate by passing the full AgentConfig form:

{
  "judging": {
    "agents": [
      { "agent": "claude-code", "model": "opus" },
      "codex"
    ]
  }
}

Skills

Skills extend agent capabilities with reusable instruction sets. Specify them at the top level (shared across all agents), per agent, or per scenario.

Format Example Description
Local path ./skills/deploy Relative to the config file.
GitHub shorthand netlify/axis-skill-deploy owner/repo format, cloned automatically.
Full URL https://github.com/owner/repo GitHub repository URL, cloned automatically.
{
  "skills": [
    "./skills/deploy",
    "netlify/axis-skill-deploy",
    "https://github.com/owner/repo"
  ]
}

Remote skills are cached in .axis/skills-cache/. Use --refresh-skills to force re-clone.

Environment Variables

The env field lists additional environment variables to pass through to agent processes. The following are always passed through by default:

Category Variables
API keys ANTHROPIC_API_KEY, CODEX_API_KEY, GEMINI_API_KEY
System PATH, USER, SHELL, LANG, TERM, TMPDIR
{
  "env": ["MY_CUSTOM_TOKEN", "DATABASE_URL"]
}

Scenarios

Scenarios live in the configured scenarios directory as .json, .js, or .ts files, or are listed inline in axis.config.{js,ts}. The filename (without extension) becomes the scenario key, and nested directories create namespaced keys: scenarios/cms/create-post.ts maps to cms/create-post.

Scenarios can define variants to run the same task under different configurations (skills, MCP servers, prompts, etc.) without duplicating files. Each variant produces a separate job with a key like create-post@variant-name.

See Writing Scenarios for the complete scenario schema, the authoring formats, rubric design guidance, and examples.

Remote Scenarios

Remote scenarios let one AXIS project pull a scenario library straight out of another git repository, instead of vendoring or copy-pasting scenario files. A team can publish a canonical set of scenarios (with their setup scripts, fixtures, MCP servers, and skills) once, and every downstream project consumes it by listing the repo URL in its scenarios array. When the upstream library changes, the next AXIS run picks it up automatically; there's no version to bump and no files to re-sync.

This is especially useful for:

Using a remote scenarios repo

Add a git repository URL to the scenarios array. Local paths and remote URLs can be mixed freely.

{
  "scenarios": [
    "./scenarios",
    "https://github.com/netlify/all-scenarios"
  ],
  "agents": ["claude-code"]
}

On each run AXIS clones the repo into .axis/remotes/<reversed-host>/<owner>/<repo>/, reads that repo's axis.config.*, and inlines its scenarios entries into the parent, resolved to absolute paths inside the clone. Inline scenario objects from a remote repo are passed through unchanged. If the cloned repo has no axis.config.* at its root, the whole repo is walked as a scenarios directory (equivalent to listing the clone path directly).

From here on, the runner behaves as if every scenario had been local from the start: discovery, filtering, lifecycle, scoring, and reporting are all unchanged.

Supporting config that comes with the scenarios

Remote scenarios usually depend on more than just their own files. Their setup scripts need certain env vars exported, they expect specific MCP servers to be configured, they rely on shared skills, and so on. To avoid forcing every parent project to re-declare all of that, AXIS folds a few supporting fields from the remote repo's axis.config.* into the parent config. The parent always wins on collisions.

Field Merge semantics
env Set union of var names; parent first then remote.
mcp_servers Keyed merge; the parent's value wins when both declare the same server name.
skills Ordered union with dedup, parent first. Local-path entries from the remote (./...) are rewritten to absolute paths inside the clone directory; URL and owner/repo shorthand entries pass through unchanged.
artifacts Glob patterns concatenated and deduped, parent first.
adapters Keyed merge with parent precedence; remote module paths are rewritten to absolute paths inside the clone directory so the remote adapter loads correctly.

Other top-level fields from the remote repo are ignored: agents, settings, judging, beforeAll, afterAll, and name. These belong to the parent project: it decides which agents to test, how to score them, and what run-level lifecycle to fire.

Freshness and caching

AXIS always runs git pull --ff-only on each invocation when the clone already exists, and does a shallow git clone the first time. There is no opt-in caching flag. The trade-off favours always-fresh scenario libraries over offline runs.

Dependencies

Remote scenarios authored as .ts/.js modules often import workspace helpers and external packages. If the cloned repo has a package.json and no node_modules/, AXIS runs npm install (or pnpm install / yarn install based on the lockfile) automatically before walking. Install failures are logged but do not abort the run; modules whose imports fail are reported and skipped.

Nested remote references

By default, a remote repo's scenarios may not itself list further remote URLs; AXIS errors out with the offending URL named. Increase settings.remotes.maxDepth to allow nesting. Cycles (A → B → A) are always rejected regardless of depth.

{
  "scenarios": ["./scenarios", "https://github.com/netlify/all-scenarios"],
  "agents": ["claude-code"],
  "settings": {
    "remotes": { "maxDepth": 2 }
  }
}