# Skills-MCP: Architectural Reference (Condensed) ## What It Does Skills-MCP manages a persistent knowledge library with RPG-style mastery scoring (0-100, 5 dimensions), SM-2 spaced repetition, and progressive disclosure (list → peek → load). Each skill lives in ~/.skills//SKILL.md with YAML frontmatter + markdown body. Supports optional git auto-commit and AI review via gh copilot. ## Key Concepts ### Mastery Scoring: 5 Dimensions × 20 Points = 0-100 Total *Completeness*: Full scope coverage? *Specificity*: Concrete steps or vague? *Examples*: Worked examples provided? *Edge Cases*: Failures/exceptions documented? *Actionability*: Followable without clarification? Maps to levels: Novice (0-20), Apprentice (21-40), Journeyman (41-60), Expert (61-80), Master (81-100). ### Effective Score with Decay When accessing a skill, if today > sm2_due_date (overdue): days_overdue = today - sm2_due_date decay_factor = 1 - (days_overdue / sm2_interval) × 0.5 (linear, 50% max loss) effective_score = mastery_score × decay_factor Unreviewed skills gradually lose confidence, forcing periodic validation. ### SM-2 Spaced Repetition (Quality 0-5 Scale) User provides quality after review: *0*: Complete blackout *1-2*: Wrong, obvious error *3*: Right with effort *4*: Right with hesitation *5*: Perfect SM-2 state: (repetitions, ef, interval, due_date) *Update*: If quality < 3: reset repetitions=0, interval=1 day Else: ef' = ef + (0.1 - (5−quality)×(0.08 + (5−quality)×0.02)) (clamp ef ≥ 1.3) If repetitions == 1: interval = 1; else if == 2: interval = 3; else interval *= ef' Increment repetitions due_date = today + interval Perfect recalls extend intervals exponentially; poor recalls reset short. ### Surprise Estimation Unigram language model scoring: given skill content, how novel is a candidate fact? Tokenize + stem both skill body and new fact Compute P(word | skill content) with Laplace smoothing Return mean surprisal in bits: avg log2(1/P) Interpretation: >10 bits = novel, 6-10 = somewhat, <6 = redundant ## Tools (16 total) *Tier 1 (Discovery, ~20 tokens)* list_skills() → {name, description, mastery_level, score, effective_score, use_count, tags} array peek_skill(name) → first section + metadata only load_skill(name) → full content + increments use_count, updates last_used *Tier 2 (Refinement)* skill_slice(name, query, max_sections=2) → top-N sections by TF-IDF relevance to query create_skill(name, description, content, tags, compatibility, source_type) → new skill, returns assessment prompt update_skill(name, content) → replace body, keeps metadata, returns assessment prompt *Scoring* record_assessment(name, completeness, specificity, examples, edge_cases, actionability, notes, source_type) → initializes SM-2 (rep=0, ef=2.5, interval=1, due=tomorrow), computes total score + level report_outcome(name, quality, notes) → runs SM-2 update, schedules next review *Review* review_stale_skills(limit=10) → sorted by urgency (days_overdue / interval), descending validate_skill(name) → extracts sources from ## Sources + body references ai_review_skill(name) → shells to gh copilot, returns review_text + suggested_quality *Analysis & Ops* estimate_surprise(name, new_fact) → novelty score in bits skills_report() → library stats, rebuilds index.json reflect_on_session(context_summary) → returns template for end-of-session skill updates list_skill_files(name) / read_skill_file(name, path) → auxiliary files in references/ initialize_skills_git() → git init + .gitignore + pre-commit hook (auto-unstages empty .md files) ## SKILL.md Format *Frontmatter (YAML)*: name, description, compatibility, + metadata dict containing: Assessment: mastery_score (0-100), mastery_level, effective_score, effective_level Usage: use_count, last_used, created_at, updated_at, last_validated (ISO8601) Scoring: score_completeness/specificity/examples/edge_cases/actionability (0-20 each) SM-2: sm2_repetitions, sm2_ef, sm2_interval, sm2_due_date, sm2_last_review Metadata: tags, source_type (external_url|codebase|derived), assessment_notes *Body*: Markdown with optional ## Sources section (extracted by validate_skill). ## Data Flow *Create*: create_skill() → writes frontmatter (score=0, rep=0, ef=2.5, interval=1, due=tomorrow) + body → rebuilds index → returns assessment_prompt *Assess*: record_assessment() → sums 5 dimensions, initializes SM-2 state → rebuilds index *Use*: load_skill() → increments use_count, updates last_used → rebuilds index *Review*: report_outcome(quality) → SM-2 algo, schedules next review → rebuilds index *Decay*: On every load, effective_score = mastery_score × decay_factor if overdue ## Config ~/.config/skills-mcp/config.json: skills_dir: Root path (default ~/.skills) ai_review.enabled, .command (gh copilot -p), .timeout (120s) vault_git_enabled: Enable git auto-commit (default true) ## Non-Obvious Design Patterns *Assessment Prompt on Create/Update*: Returns prompt immediately so user scores before moving on (closes feedback loop) *Index Rebuild on Every Write*: Full rebuild, not incremental; no caching. Ensures consistency, optimize later for 100+ skills *Manual YAML Parsing*: No external deps, but validate on load for robustness *Decay Linear, Not Exponential*: 50% max loss over interval; adjust if too aggressive *TF-IDF-lite in skill_slice*: Simple word overlap, not true TF-IDF; sufficient for typical skill sizes *Git Pre-Commit Hook (Bash)*: Prevents empty .md commits by unstaging; portable, no Python *MCP Prompt Onboarding*: One prompt "skills_workflow" explains list → peek → load flow and reflect → create → score loop ## Known Gaps & Next Steps *No caching*: Full index rebuild on every write (performance issue for 100+ skills) *AI review fragile*: Command hardcoded, no fallback if gh-copilot missing *No source verification*: validate_skill extracts ## Sources but doesn't verify they're live *Git untested*: Auto-commit exists but needs integration tests *No advanced queries*: Can't filter by mastery level or search across all skills by content