Cross-Tool Compatibility and the Handoff Protocol (POS Part 4)

After exploring Context Management in Part 3, it’s time to look at how this context moves between different AI tools.

The Vendor Lock-in Trap

Most AI workflow systems are built for one tool. Cursor rules work in Cursor. Claude Code’s CLAUDE.md works in Claude Code. Copilot instructions work in Copilot. If you switch tools, you rebuild your setup from scratch.

POS avoids this by building on a universal interface. Files that any tool can read.

The AGENTS.md Standard

AGENTS.md is an emerging standard for describing projects to AI tools. It’s a Markdown file at the repository root that tells any AI what the system is, how it’s organized, what rules to follow, and how to get started.

POS generates AGENTS.md from pos.yaml. Any tool that reads Markdown files can read it.

Tool	How It Reads Context	Session Management
Claude Code	`AGENTS.md` + `CLAUDE.md` auto-loaded	Automatic via SessionStart hook
Cursor	`AGENTS.md` via `.cursorrules` include	Manual script or YAML write
Windsurf	`AGENTS.md` via workspace rules	Manual YAML write
GitHub Copilot	`.github/copilot-instructions.md`	Manual YAML write
Goose	`AGENTS.md` natively	Manual script
Any LLM	Paste `AGENTS.md` as system prompt	Manual YAML write

Every tool gets the same information. Tool-specific config files are thin wrappers that point to AGENTS.md.

Portable Skills

Skills are the trickiest cross-tool challenge. Claude Code reads skills from .claude/skills/ with YAML frontmatter. Other tools don’t understand this format.

POS bridges this with a generation step:

.claude/skills/code-review/SKILL.md  →  generate-portable-skills.sh  →  .skills/code-review.md

The portable version strips Claude-specific metadata and outputs plain Markdown. Any tool can read .skills/code-review.md as an instruction document. A registry.yaml catalogs all available skills with their triggers.

Multi-Model Coordination

Different AI models have different strengths. POS matches tasks to models using capability levels:

Level	Models	Best For
`basic`	Haiku, Flash, 4o-mini	Status checks, docs, formatting
`standard`	Sonnet, GPT-4o, Gemini Pro	Features, bugs, code review
`advanced`	Sonnet+, advanced models	Architecture, refactoring
`reasoning`	Opus, o3, deep reasoning	Planning, root cause analysis

The task queue labels each task with a required capability level. A basic model sees only documentation tasks. A reasoning model sees architecture decisions. But capability matching alone isn’t enough. The models need a way to share what they learned.

The Continuity Problem

AI sessions are stateless. When you close Claude Code and reopen it, the model has no memory of the previous conversation. Any accumulated context, such as what you were working on, what decisions were made, and what was tried and failed, disappears.

Handoffs solve this by creating a persistent record at the end of each session that the next session reads at startup.

The Session Lifecycle

Every AI tool that enters POS follows three steps:

1. Register

When a session starts, it announces itself:

# .handoff/sessions/claude-code.yaml
agent: claude-code
context: ticketapp
capability: reasoning
started: "2026-03-18T09:00:00Z"
current_task: null
files_touched: []

Other tools can see who’s active. The system knows which context is in use.

2. Work

During the session, the tool updates its session file:

current_task: "implementing Stripe webhook handler"
files_touched:
  - app/Http/Controllers/WebhookController.php
  - tests/Feature/WebhookTest.php
  - routes/api.php

If the session crashes, there’s still a record of what was being worked on.

3. Close

When the session ends, the tool creates a handoff record:

# .handoff/handoffs/2026-03-18-claude-code.yaml
agent: claude-code
context: ticketapp
started: "2026-03-18T09:00:00Z"
ended: "2026-03-18T11:30:00Z"

summary: |
  Implemented Stripe webhook handler for payment events.
  Added signature verification and event routing.
  Tests pass for payment_intent.succeeded and charge.refunded events.

completed:
  - Webhook controller with signature verification
  - Event routing for 4 payment event types
  - Feature tests for success and failure paths

pending:
  - Subscription lifecycle events (not started)
  - Webhook retry handling (deferred, needs architecture decision)

blockers:
  - Need Stripe webhook secret for staging environment

resume_point: |
  Open app/Http/Controllers/WebhookController.php.
  The handleSubscription() method is stubbed but not implemented.
  Start with the customer.subscription.created event type.

The handoff record is a complete briefing for the next session. It says what was done, what’s left, what’s blocking, and exactly where to resume.

Cross-Model Handoffs

The handoff system enables workflows that span multiple AI models:

Planning phase (Opus): Reads the project requirements, designs the architecture, creates a sprint plan with phased tasks, and writes a handoff describing the plan and key decisions.

Implementation phase (Sonnet): Reads Opus’s handoff, picks up the first implementation task, writes code, runs tests, and creates its own handoff describing what was built and what remains.

Documentation phase (Haiku): Reads Sonnet’s handoff, writes API documentation, formats commit messages, and updates the project README.

Each model reads the previous model’s handoff. Context carries forward without any model needing to re-discover what the others did.

Conflict Prevention

Multiple AI tools can work simultaneously. Session registration provides visibility:

# .state/snapshot.yaml
registered_agents:
  - agent: "claude-code"
    context: "ticketapp"
    capability: "reasoning"
  - agent: "cursor"
    context: "acmecorp"
    capability: "standard"

When an agent registers, it sees other active sessions and avoids working on the same context or files. This is visibility-based coordination, not locking. POS trusts tools to be cooperative.

What Cross-Tool Compatibility Costs

There are trade-offs worth acknowledging:

Lowest common denominator: The system must work with tools that can only read files. This means no interactive UI, no real-time collaboration, no rich integrations.

Manual overhead for some tools: Claude Code gets automatic session management via hooks. Every other tool requires manual registration. This is friction.

Skill parity gaps: Claude Code gets slash commands and tool restrictions. Other tools get the portable Markdown version. Same instructions, no automated trigger matching.

These trade-offs are acceptable because the alternative of building separate integrations for each tool is worse. One system that works everywhere at 80% is better than six perfect integrations that each require separate maintenance.

In the final post, Part 5, we look at what still needs work, honest assessments of current gaps, and the roadmap ahead.

This is part 4 of a 5-part series on Building a Personal Operating System for AI-Assisted Development.