{
  "skill": {
    "id": "prompt-regression-testing",
    "name": "prompt-regression-testing",
    "description": "Use when you need to evaluate whether prompts, instructions, or model routing changes altered outputs in undesirable ways.",
    "path": "skills/dev/prompt-regression-testing/SKILL.md",
    "tags": [
      "prompting",
      "testing",
      "regression",
      "eval"
    ]
  },
  "document": {
    "path": "skills/dev/prompt-regression-testing/SKILL.md",
    "frontmatter": {
      "name": "prompt-regression-testing",
      "description": "Use when you need to evaluate whether prompts, instructions, or model routing changes altered outputs in undesirable ways.",
      "version": "0.1.0",
      "author": "Hermes Agent",
      "license": "private",
      "metadata": {
        "hermes": {
          "tags": [
            "prompting",
            "testing",
            "regression",
            "eval"
          ],
          "related_skills": [
            "api-debugging"
          ]
        }
      }
    },
    "content": "---\nname: prompt-regression-testing\ndescription: Use when you need to evaluate whether prompts, instructions, or model routing changes altered outputs in undesirable ways.\nversion: 0.1.0\nauthor: Hermes Agent\nlicense: private\nmetadata:\n  hermes:\n    tags: [prompting, testing, regression, eval]\n    related_skills: [api-debugging]\n---\n\n# Prompt Regression Testing\n\n## Overview\n\nUse this skill to compare output behavior before and after a prompt, config, or model change. Focus on repeatability and actionable differences.\n\n## Workflow\n\n1. Define the baseline and the changed version.\n2. Use the same input set across both versions.\n3. Compare outputs for correctness, style, safety, and instruction following.\n4. Classify differences as improvement, regression, or neutral.\n5. Summarize the pattern, not just individual examples.\n\n## What to Track\n\n- instruction adherence\n- factual accuracy\n- compactness\n- formatting stability\n- tool-use behavior\n\n## Verification\n\n- Same inputs were used\n- Differences were categorized\n- Regression risk was stated clearly\n"
  }
}