Context Management (POS Part 3) | Abubakar Siddiq Ango

Building on Part 2’s deep dive into the Skill System, this post tackles the challenge of managing what those skills can “see.”

The Context Window Problem

AI tools have finite context windows. Load too much information and the model loses focus. Load too little and it makes wrong assumptions. The challenge is finding the right amount of context for each task.

When you manage multiple projects, this problem intensifies. The AI needs to know about the current project, but also needs awareness that other projects exist. It needs to know the current sprint plan, but not every sprint plan from the last six months.

Tiered Context Loading

POS uses a three-tier loading strategy. Each tier adds more context, but only when the task requires it.

Tier 1: Quick Check (~75 lines)

When: Status checks, simple questions, formatting tasks

What loads: The context’s QUICK-START.md is a 40-line summary of the project, its current state, and the most important files.

# TicketApp: Quick Start

## What
Event management SaaS platform (Laravel + Inertia + Vue)

## Current State
- Sprint: Payments Improvements
- Branch: develop
- Active task: Stripe webhook integration
- Blocked: None

## Key Files
- Backend: app/Http/Controllers/EventController.php
- Frontend: resources/js/Pages/Events/
- Config: .env.example, docker-compose.yml

## Commands
- Test: php artisan test
- Serve: php artisan serve
- Deploy: /deploy ticketapp

An AI reading 75 lines has enough to answer “what’s the current sprint?” or “where do the event controllers live?” without loading thousands of lines of architecture documentation.

Tier 2: Standard Work (~300 lines)

When: Implementing features, fixing bugs, writing code

What loads: Tier 1 plus the relevant docs for the current task. If the sprint plan says “add Stripe webhooks,” the AI reads the payment architecture doc and the Stripe integration guide.

The key insight is selective loading. The AI reads the QUICK-START to understand context, then loads only the docs relevant to the specific task. This avoids the common pattern of dumping every project file into context.

Tier 3: Full Context (~800+ lines)

When: Architecture decisions, cross-cutting refactors, system design reviews

What loads: The full AGENTS.md for the project, including architecture diagrams, team structure, technology decisions, and dependency maps.

This tier is reserved for decisions that affect the whole system. Most daily work stays at Tier 1 or 2.

Context Switching

Switching between contexts is instant. POS tracks which context is active and loads only that context’s state.

The /context-switch skill (or @shortcut syntax) handles switching:

@ticketapp       → Switch to TicketApp context
@acmecorp       → Switch to AcmeCorp context
@personal         → Switch to personal context

What happens during a switch:

The current context’s status gets saved
The new context’s QUICK-START loads
The session file updates to reflect the new context
The snapshot refreshes

The AI now has the new context’s state without carrying the old context’s details. Token budget stays clean.

Multi-Context Awareness

Even when focused on one context, the AI has lightweight awareness of all contexts through .state/snapshot.yaml:

contexts:
  ticketapp:
    active: true
    task: "Stripe webhook integration"
  acmecorp:
    active: false
    task: "CDI tutorial draft"
  jobportal:
    active: false
    task: "Brand identity sprint"

This costs about 20 lines of context but provides cross-project awareness. If you ask “what else is on my plate?” the AI can answer without loading every project.

The QUICK-START Pattern

Every context has a QUICK-START.md that follows a strict format:

What: One sentence describing the project
Current State: Sprint, branch, active task, blockers
Key Files: The five to ten most important file paths
Commands: How to test, run, and deploy

The constraint is 40 lines. This forces prioritization. You can’t dump everything. You must choose what matters most right now.

QUICK-START files are living documents. When the sprint changes, the current state section updates. When the architecture evolves, the key files section updates. The file always reflects what an AI needs to know today, not what was relevant last month.

How It Compares to Other Approaches

Dump everything into context: Simple but wasteful. A 10,000-line context window filled with project docs leaves little room for the actual conversation. The model’s attention diffuses.

Let the AI explore: The model reads files on demand. Works for small projects but is slow for large ones. The AI makes multiple tool calls just to orient itself.

Indexed summaries: Tools like PROJECT_INDEX.yaml map file paths to descriptions. Good for navigation but lacks runtime state (what’s the current sprint? what’s blocked?).

Tiered loading (POS approach): Combine quick summaries (Tier 1) with selective deep-loading (Tier 2) and full context as a last resort (Tier 3). Balances token efficiency with completeness.

The Cost Equation

With a typical context window (200K-1M tokens) and token caching:

Approach	Tokens Used	% of Window
Full dump	40,000-80,000	20-40%
Tier 1 only	1,500-3,000	<2%
Tier 1 + selective Tier 2	5,000-12,000	3-6%
Tier 3 (rare)	15,000-30,000	8-15%

The savings compound over a session. Every tool call, every response, every follow-up question operates within the remaining budget. Starting lean means more room for the actual work.

In Part 4, we will explore cross-tool compatibility and how sessions remember each other through handoffs.

This is part 3 of a 5-part series on Building a Personal Operating System for AI-Assisted Development.