For complex agents requiring detailed specifications, multi-skill workflows, and nuanced human collaboration.
An Agent Story extends the User Story paradigm to capture autonomous and semi-autonomous AI behavior. Where User Stories focus on human intent ("As a user, I want..."), Agent Stories must capture emergent behavior, conditional autonomy, and collaborative intelligence.
The format follows a principle of progressive disclosure: the core story remains simple and readable, while structured annotations capture complexity only where it exists.
For simpler agents or early-stage design, see Agent Story Format: Light.
This core remains human-readable and captures the essential narrative. Everything else is annotation.
AGENT STORY: [ID] As [Agent Role], triggered by [Event], I [Action/Goal], so that [Outcome/Value]. Autonomy: [Full | Supervised | Collaborative | Directed]
Add only the annotations relevant to your agent. Each section is optional.
trigger: type: [message | resource_change | schedule | cascade | manual] source: [Description of event source] conditions: [Optional guard conditions] examples: - [Concrete example of triggering event]
For agents with defined stages or workflows:
behavior: type: [workflow | adaptive | hybrid] # For workflow/hybrid types: stages: - name: [Stage Name] purpose: [What this stage accomplishes] transitions: - to: [Next Stage] when: [Condition] # For adaptive/hybrid types: capabilities: - [High-level capability the agent can invoke] planning: [none | local | delegated | emergent]
reasoning: strategy: [rule_based | llm_guided | hybrid] decision_points: - name: [Decision Name] inputs: [What information informs this decision] approach: [How the decision is made] fallback: [What happens if decision fails] iteration: enabled: [true | false] max_attempts: [number] retry_conditions: [When to retry]
memory: working: - [Ephemeral context maintained during execution] persistent: - name: [Memory Store Name] type: [kb | vector | relational | kv] purpose: [Why this memory exists] updates: [read_only | append | full_crud] learning: - type: [feedback_loop | reinforcement | fine_tuning] signal: [What triggers learning]
tools: - name: [Tool/MCP Server Name] purpose: [Why the agent uses this] permissions: [read | write | execute | admin] conditions: [Optional: when tool is available/used]
skills: - name: [Skill Name] domain: [Knowledge domain this skill operates in] proficiencies: - [Specific competency within the skill] tools_used: [Tools this skill leverages, if any] quality_bar: [What competent execution looks like] acquired: [built_in | learned | delegated]
Skills are composable units of competency that can be reused across agents. They bundle domain knowledge, behavioral patterns, tool proficiency, and quality standards into a coherent capability.
Skill acquisition types:
human_interaction: mode: [in_the_loop | on_the_loop | out_of_loop] checkpoints: - name: [Checkpoint Name] trigger: [When human involvement is required] type: [approval | input | review | escalation] timeout: [What happens if human doesn't respond] escalation: conditions: [When to escalate to human] channel: [How escalation occurs]
collaboration: role: [supervisor | worker | peer] # For supervisors: coordinates: - agent: [Worker Agent ID/Type] via: [Communication protocol] for: [What tasks are delegated] # For workers: reports_to: [Supervisor Agent ID/Type] # For all: peers: - agent: [Peer Agent ID/Type] interaction: [Request/Response | Pub/Sub | Shared State]
acceptance: functional: - [Observable behavior that indicates success] quality: - [Non-functional requirements: latency, accuracy, etc.] guardrails: - [Constraints the agent must never violate]
Understanding how elements relate to each other is critical for proper modeling.
+---------------------------------------------------------------------+ | AGENT | | | | Owns directly: | | |-- trigger (1..*) - What activates this agent | | |-- behavior (1) - How the agent is structured | | |-- memory (0..1) - Agent-level state and learning | | |-- human_interaction (0..1) - How humans collaborate | | |-- collaboration (0..1) - How other agents collaborate | | +-- acceptance (1) - Success criteria for the agent | | | | Composes: | | +-- skills (1..*) - Competencies the agent has | | | | | | Each skill owns: | | |-- proficiencies (1..*) - What the skill enables | | |-- tools_used (0..*) - Tools this skill leverages | | |-- quality_bar (1) - Standard for this skill | | +-- acquired (1) - How skill was obtained | | | | References (shared resources): | | +-- tools (1..*) - Available to agent, used by skills | | | | Contains: | | +-- reasoning (0..1) - Can exist at agent OR skill level | | | +---------------------------------------------------------------------+
| Element | Owned By | Cardinality | Notes |
|---|---|---|---|
| trigger | Agent | 1..* | An agent must have at least one trigger |
| behavior | Agent | 1 | One behavior model per agent |
| tools | Agent | 1..* | Declared at agent level, referenced by skills |
| skills | Agent | 1..* | An agent must have at least one skill |
| proficiencies | Skill | 1..* | Each skill must specify what it can do |
| tools_used | Skill | 0..* | Skills reference agent-level tools |
| quality_bar | Skill | 1 | Every skill needs a measurable standard |
| reasoning | Agent or Skill | 0..1 | Can be defined at agent level or per-skill |
| memory | Agent | 0..1 | Shared across all skills |
| human_interaction | Agent | 0..1 | Defined at agent level |
| collaboration | Agent | 0..1 | Agent-to-agent relationships |
| acceptance | Agent | 1 | Agent-level success criteria |
+---------------------------------------------------------------+ | Claims Processing Agent | | | | +-------------+ +-------------+ +-----------------------+ | | | Tools | | Memory | | Human Interaction | | | | (shared) | | (shared) | | (shared) | | | +------+------+ +------+------+ +-----------+-----------+ | | | | | | | v v v | | +------------------------------------------------------------+ | | Skills | | | +--------------+ +--------------+ +--------------------+ | | | | Damage | | Fraud | | Customer | | | | | Assessment | | Detection | | Communication | | | | | | | | | | | | | | uses: Doc | | uses: Doc | | uses: Comms Tool | | | | | Analysis, | | Analysis | | | | | | | Policy Sys | | | | triggers: human | | | | | | | writes: | | checkpoint on | | | | | | | fraud scores | | escalation | | | | | | | to memory | | | | | | +--------------+ +--------------+ +--------------------+ | | +------------------------------------------------------------+ | | | Behavior: Orchestrates skills through workflow stages | | Acceptance: Evaluated at agent level using skill quality bars| +---------------------------------------------------------------+
AGENT STORY: CLAIM-001 As a Claims Processing Agent, triggered by new insurance claim submission, I assess the claim, gather required documentation, and route to appropriate resolution, so that claims are processed accurately with minimal customer wait time. Autonomy: Supervised
trigger: type: message source: Claims intake system (A2A from portal agent) conditions: Claim type in [auto, property, health] examples: - "New auto claim #12345 submitted with photos and police report" behavior: type: hybrid stages: - name: Initial Assessment purpose: Categorize claim and determine processing path transitions: - to: Documentation Gathering when: Additional docs needed - to: Auto-Approval Check when: Claim is straightforward - to: Fraud Review when: Risk indicators detected - name: Documentation Gathering purpose: Request and validate required documents transitions: - to: Auto-Approval Check when: All docs received and valid - to: Human Review when: Doc gathering timeout (72h) - name: Auto-Approval Check purpose: Determine if claim qualifies for automatic approval transitions: - to: Resolution when: Within auto-approval thresholds - to: Human Review when: Exceeds thresholds or edge case - name: Fraud Review purpose: Deep analysis for potential fraud transitions: - to: Human Review when: Analysis complete - name: Human Review purpose: Adjuster makes final determination transitions: - to: Resolution when: Decision made - name: Resolution purpose: Execute approval/denial and notify customer planning: local reasoning: strategy: hybrid decision_points: - name: Fraud Risk Assessment inputs: Claim history, submission patterns, document metadata approach: ML model + rule-based flags fallback: Route to human review - name: Auto-Approval Eligibility inputs: Claim amount, policy limits, documentation completeness approach: Policy rules engine fallback: Route to human review iteration: enabled: true max_attempts: 3 retry_conditions: Document validation failures, API timeouts memory: working: - Current claim context and gathered documents - Conversation history with customer persistent: - name: Claims Knowledge Base type: vector purpose: Similar claim retrieval for consistency updates: append - name: Customer History type: relational purpose: Policy and claims history lookup updates: read_only learning: - type: feedback_loop signal: Adjuster corrections to agent assessments tools: - name: Document Analysis MCP purpose: Extract and validate information from uploaded documents permissions: read - name: Policy System purpose: Retrieve policy details and coverage limits permissions: read - name: Customer Communication purpose: Send status updates and document requests permissions: execute conditions: Outbound communications require template match skills: - name: Damage Assessment domain: Insurance claim evaluation proficiencies: - Interpret photos and repair estimates for auto/property damage - Cross-reference damage claims against policy coverage - Identify inconsistencies between reported and documented damage tools_used: [Document Analysis MCP, Policy System] quality_bar: Assessments align with adjuster decisions 90%+ of the time acquired: built_in - name: Fraud Detection domain: Insurance fraud patterns proficiencies: - Recognize common fraud indicators (staged accidents, inflated claims) - Analyze claim patterns across customer history - Flag document anomalies (metadata inconsistencies, edited images) tools_used: [Document Analysis MCP] quality_bar: >85% precision on fraud flags (minimize false positives) acquired: learned - name: Customer Communication domain: Claims customer experience proficiencies: - Explain claim status and next steps in plain language - Request specific documentation with clear instructions - De-escalate frustrated customers while maintaining accuracy tools_used: [Customer Communication] quality_bar: Customer satisfaction score >4.2/5 on agent interactions acquired: built_in - name: Policy Interpretation domain: Insurance policy analysis proficiencies: - Parse coverage limits, deductibles, and exclusions - Apply policy terms to specific claim scenarios - Identify coverage gaps or ambiguities requiring human review tools_used: [Policy System] quality_bar: Coverage determination matches adjuster interpretation 95%+ acquired: built_in human_interaction: mode: on_the_loop checkpoints: - name: High-Value Approval trigger: Claim amount > $10,000 type: approval timeout: Route to senior adjuster after 24h - name: Fraud Escalation trigger: Fraud score > 0.7 type: review timeout: Hold claim, alert supervisor escalation: conditions: Agent confidence < 60%, customer complaint, system error channel: Adjuster queue with full context package collaboration: role: worker reports_to: Claims Supervisor Agent peers: - agent: Customer Service Agent interaction: Request/Response (customer context handoff) acceptance: functional: - Correctly categorizes claim type with 95% accuracy - Identifies missing documentation within 5 minutes of submission - Routes fraud-risk claims to review 100% of the time quality: - Initial assessment completes in < 2 minutes - Customer receives first contact within 1 hour guardrails: - Never auto-approve claims exceeding policy limits - Never communicate denial without human review - All customer PII handled per data retention policy
Begin with just the core story. Add annotations only when:
| Level | Human Role | Agent Authority |
|---|---|---|
| Full | None during execution | Complete decision authority |
| Supervised | Monitors, intervenes on exception | Executes within guardrails |
| Collaborative | Active participant in decisions | Proposes, human confirms |
| Directed | Initiates and guides each step | Executes specific instructions |
Every criterion should be something you can actually verify. "Processes claims correctly" is not testable. "Correctly categorizes claim type with 95% accuracy" is.
+-------------------------------------------------------------+
| Product Vision |
+-------------------------------------------------------------+
|
v
+-------------------------------------------------------------+
| Agent Stories |
| (What agents do and why - this document) |
+-------------------------------------------------------------+
|
+---------------+---------------+
v v v
+-------------------+ +-------------+ +---------------------+
| Behavior Specs | | Tool Specs | | Integration Specs |
| (Detailed flows, | | (MCP server | | (A2A protocols, |
| prompts, logic) | | contracts) | | event schemas) |
+-------------------+ +-------------+ +---------------------+
Agent Stories sit between high-level vision and implementation details. They should be:
+------------------------------------------------------------+ | AGENT STORY QUICK REFERENCE | +------------------------------------------------------------+ | | | CORE (Required) | | ----------------- | | - Role: What the agent is | | - Trigger: What activates it | | - Action: What it does | | - Outcome: Why it matters | | - Autonomy: Full|Supervised|Collaborative|Directed | | | | ANNOTATIONS (As Needed) | | ----------------------- | | - trigger: Event details, conditions, examples | | - behavior: Stages, capabilities, planning approach | | - reasoning: Decision points, iteration, fallbacks | | - memory: Working, persistent, learning | | - tools: MCP servers, permissions, conditions | | - skills: Competencies, proficiencies, quality bars | | - human_interaction: Mode, checkpoints, escalation | | - collaboration: Role, coordinates/reports_to, peers | | - acceptance: Functional, quality, guardrails | | | | SKILL ACQUISITION TYPES | | ----------------------- | | - built_in: Core competency the agent is designed with | | - learned: Acquired through training or feedback | | - delegated: Performed by another agent or service | | | | TIPS | | ---- | | * Start with core only, add complexity as needed | | * Make acceptance criteria observable and testable | | * Guardrails are things that must NEVER happen | | * One story = one coherent agent responsibility | | * Skills can be reused across multiple agents | | | +------------------------------------------------------------+