Skip to content
Perstack

Testing Experts

Experts are probabilistic — the same query can produce different results. This guide covers strategies for testing effectively.

Run your Expert locally before publishing:

Terminal window
npx perstack start my-expert "test query"
  • Happy path — expected inputs and workflows
  • Edge cases — unusual inputs, empty data, large files
  • Error handling — missing files, invalid formats, network failures
  • Delegation — if your Expert delegates, test the full chain

Use JSON output to see exactly what happened:

Terminal window
npx perstack run my-expert "query"

Each event shows:

  • Tool calls and results
  • Checkpoint state
  • Timing information

Checkpoints enable deterministic replay of the runtime portion. Checkpoints are stored in perstack/jobs/{jobId}/runs/{runId}/ — see Runtime for the full directory structure.

Continue a paused run:

Terminal window
npx perstack run my-expert --continue

This resumes from the last checkpoint — useful for:

  • Debugging a specific step
  • Testing recovery behavior
  • Iterating on long-running tasks

Examine checkpoints to understand what the Expert “saw” at each step:

  • Message history
  • Tool call decisions
  • Intermediate state

For automated testing, mock the LLM to get deterministic behavior:

import { run } from "@perstack/runtime"
const result = await run(params, {
// Mock eventListener for assertions
eventListener: (event) => {
if (event.type === "callTools") {
expect(event.toolCalls[0].toolName).toBe("expectedTool")
}
}
})

The runtime is deterministic — only LLM responses are probabilistic. Mock the LLM layer for unit tests; use real LLMs for integration tests.

Before publishing:

  • Works with typical queries
  • Handles edge cases gracefully
  • Delegates correctly (if applicable)
  • Skills work as expected
  • Error messages are helpful
  • Description accurately reflects behavior