ai developer-tools workflow pos

The Skill System (POS Part 2)

· 9 min read
On this page

In Part 1: The Problem and the Architecture, we established the file-based foundation of the Personal Operating System. Now, let’s look at how that architecture translates into executable capabilities.

The first time you ask an AI coding tool to “do a code review,” you get something generic. It will scan for syntax issues, maybe flag some style inconsistencies, and call it done. It won’t check whether the implementation matches your approved plan, or verify that your security conventions are followed, or know that your team requires self-rating feedback on every review.

The second time, you remember to add those instructions. The third time, you forget some of them. By the tenth time, you have a patchwork of ad-hoc prompts scattered across your clipboard history, each slightly different, none complete.

This is the problem that skills solve.

What a Skill Looks Like

A skill is a markdown file with YAML frontmatter that defines a reusable, executable instruction set. Here is the skeleton:

---
name: code-review
description: Perform standardized code reviews for projects.
allowed-tools: Read, Glob, Grep, Bash
---

Below the frontmatter is the full instruction set. What to check, in what order, what output to produce, and what to do when things go wrong. The allowed-tools field is critical. It tells the AI exactly which capabilities it can use for this skill, preventing a code review from accidentally modifying files or a planning session from running deployment scripts.

The instructions are written as if briefing a skilled colleague. They assume competence but not context. They specify the “what” and “why” explicitly, and leave the “how” to the AI’s judgment within the defined tool constraints.

The 30 Skills

POS currently includes 30 skills organized into six categories:

Development: code-review, debugging, create-pr, api-review, frontend-design, architecture, feature-spec, excalidraw

Operations: production-deploy, security-audit, repo-management, project-creation, project-index

Planning: plan-generation, verification, team-sprint, team-management, schedule-management, time-tracking, work-logging

Content: content-review, communication-draft, product-marketing, curriculum-design, learn

System: context-switch, session-management, cross-handoff, skill-creator

Meta: meeting-analyzer

Each skill encodes a specific workflow that would otherwise live in someone’s head (or get reinvented every session). The security-audit skill, for example, includes an OWASP Top 10 reference document and checks for common vulnerability patterns specific to the tech stacks in use. The plan-generation skill enforces the Plan-Approve-Execute workflow, ensuring no implementation begins without explicit approval.

Tool Restrictions

Not every skill should have access to every tool. POS defines four restriction tiers:

Read-only: Skills like code-review and architecture can read files, search code, and run analysis commands, but can’t modify anything. Their allowed-tools include Read, Glob, Grep, and Bash (for read-only commands like git diff).

Read-write: Skills like debugging and frontend-design can read and also modify files. They add Write and Edit to their allowed tools.

Full access: Skills like production-deploy and repo-management can run any command, including potentially destructive operations. These skills include explicit safety checks and confirmation steps in their instructions.

Restricted: Skills like plan-generation and verification are deliberately limited to prevent scope creep. A planning skill should produce a plan document, not start implementing it.

These tiers exist because AI tools will use whatever capabilities you give them. If a code review skill can write files, it will sometimes “helpfully” fix the issues it finds, which defeats the purpose of a review. Restricting tools enforces discipline that prompting alone can’t guarantee.

Cross-Tool Portability

Skills are defined in .claude/skills/ using Claude Code’s native format. But the system isn’t Claude-specific. A script called generate-portable-skills.sh strips Claude-specific frontmatter and produces portable versions in .skills/:

.claude/skills/code-review/SKILL.md    # Claude Code format
.skills/code-review.md                  # Portable format (any tool)

The portable versions contain the same instructions without tool-specific metadata. You can point Cursor, Windsurf, or any other AI tool at the .skills/ directory and it will understand the instructions. The skill content is just markdown, which is the most universal format for AI instruction.

Creating New Skills

New skills follow a consistent creation process. The skill-creator skill itself guides the AI through it:

  1. Define the skill name, description, and tool restrictions in YAML frontmatter
  2. Write the instruction set in markdown, following the established patterns
  3. If the skill should share common sections (session context, rules, self-rating), create a .tmpl template that references partials
  4. Run ./scripts/generate-skills.sh skill-name to generate the final file
  5. Run ./scripts/validate-skills.sh skill-name to verify it passes all checks
  6. Run ./scripts/generate-portable-skills.sh to create the portable version

The patterns directory at .claude/skills/skill-creator/references/patterns.md documents the conventions, so new skills stay consistent with existing ones.


Building skills is only half the challenge. Keeping 30 of them consistent and improving over time is the other half.

Systems Rot Without Feedback

Any system with 30 components will drift if left unattended. Instructions become outdated. Conventions diverge. One skill gets updated with a better pattern, but the other 29 still use the old one. Without active maintenance mechanisms, a skill system decays into the same inconsistency it was designed to prevent.

POS addresses this with four interlocking mechanisms. Templates, validation, self-rating, and cross-skill artifacts.

The Template System

The template system, introduced in Part 1, becomes essential at scale. Common sections that appear in most skills (the session context preamble, the standard rules, the output format, the self-rating block) live as partials in templates/skills/partials/:

templates/skills/partials/
  _preamble.md          # Session context + grounding instructions
  _common-rules.md      # Plan-Approve-Execute, commit rules
  _output-format.md     # Standard output structure
  _self-rating.md       # Self-assessment block
  _verification.md      # Verify-before-claiming checks

Skills that use these partials have a .tmpl file alongside their SKILL.md. Running ./scripts/generate-skills.sh processes every template, injects the current partial content, and produces the final skill file. The --dry-run flag compares the generated output against the existing file and exits with code 1 if anything has drifted:

./scripts/generate-skills.sh --dry-run
# Exit 0: all skills match their templates
# Exit 1: drift detected; some skills are stale

This makes drift detection a one-command check that can run in CI or as part of a daily sync.

Static Validation

The validate-skills.sh script performs six static checks on every skill file:

  1. SKILL.md exists: every skill directory must contain its definition file
  2. YAML frontmatter present: the file must start with --- and contain valid frontmatter
  3. Required fields defined: name, description, and allowed-tools must be present
  4. Valid tool names: every tool in allowed-tools must be from the known tool list (Read, Write, Edit, Glob, Grep, Bash, Agent, WebFetch, WebSearch, and others)
  5. File length check: skills exceeding 500 lines are flagged as likely too complex
  6. No unresolved placeholders: generated output must not contain leftover {{PLACEHOLDER}} variables

The script runs in under five seconds across all 30 skills and exits with a non-zero code on any error. It’s the kind of check that catches problems before they reach a production session.

Self-Rating Feedback

Every skill in POS includes a self-assessment block (injected via the _self-rating.md partial). At the end of each skill execution, the AI rates the experience on a 0-10 scale:

  • 10: Perfect. The skill had everything needed, output was ideal
  • 7-9: Good. Minor friction, small improvements possible
  • 4-6: Mediocre. Significant gaps in instructions or tooling
  • 0-3: Poor. The skill was inadequate for the task

If the rating falls below 8, the AI writes a feedback file to .handoff/feedback/:

skill: code-review
rating: 6
friction: "No guidance on reviewing database migration files"
suggestion: "Add a migration review checklist section"
context: "Reviewing payment module refactor"

The aggregate-feedback.sh script collects these entries and produces a summary:

=== Skill Feedback Summary ===

Total entries: 47
Average rating: 7/10

SKILL                     AVG      ENTRIES
-----                     ---      -------
code-review               8        12
plan-generation           7        9
security-audit            6        5
frontend-design           8        8
...

This creates a quantitative signal for where to invest improvement effort. A skill averaging 6 out of 10 across five uses has a clear, specific set of friction points documented in its feedback files. You don’t have to guess what’s wrong. The system tells you.

The self-assessment is invisible to the user. It’s internal quality tracking, not conversation output. The AI rates the skill, writes the feedback if warranted, and moves on.

Cross-Skill Artifacts

Skills don’t operate in isolation. A plan-generation session produces a plan that a code-review session later needs to reference. A security-audit produces a report that production-deploy should check before deploying.

POS handles this through the artifact system. The lib-artifacts.sh library provides four functions:

  • artifact_write: persists a skill output with standardized metadata (skill name, context, type, timestamp, branch)
  • artifact_find: retrieves the latest active artifact of a given type for a context
  • artifact_list: lists all artifacts for a context
  • artifact_archive: marks artifacts from a completed branch as archived

Artifacts are stored in .handoff/artifacts/{context}/{skill}/ with a date-stamped filename and YAML frontmatter:

---
skill: plan-generation
context: ticketapp
branch: feature/payment-refactor
created: "2026-03-15T14:30:00Z"
type: plan
status: active
---

[plan content here]

When a code-review skill starts, its instructions tell it to check for prior artifacts: artifact_find ticketapp plan. If a plan exists, the review checks implementation against that plan. If a security report exists, the review flags anything the audit identified. This creates data flow between skills without requiring them to run in the same session or even use the same AI tool.

The Improvement Loop

These four mechanisms form a closed loop:

Templates prevent drift. When you improve a common pattern (for example, better session context instructions), you edit one partial. Regenerating updates every skill that uses it. The --dry-run check catches any skill that falls behind.

Validation prevents breakage. Static checks catch structural problems (missing frontmatter, invalid tool names, oversized files) before they cause runtime failures. You can’t accidentally deploy a skill that references a nonexistent tool.

Self-rating prevents stagnation. Quantitative feedback from actual usage identifies which skills need attention. Instead of guessing which of 30 skills needs improvement, you look at the aggregate scores and read the friction reports.

Artifacts prevent isolation. Skills produce durable outputs that other skills can consume. A plan informs a review. A review informs a deployment. A security audit informs everything. Data flows through the system even when no single session spans the full workflow.

The result is a system that gets better through use. Every skill execution is both a productive task and a feedback signal. Every template update propagates automatically. Every validation run catches regressions. The 30 skills aren’t 30 independent documents. They’re 30 nodes in a system that maintains and improves itself.

Part 3 covers Context Management, detailing how tiered loading and instant project switching keep AI tools focused without wasting tokens.

This is part 2 of a 5-part series on Building a Personal Operating System for AI-Assisted Development.