This resonates with my experience: we have dozens of internal “playbooks” and prompt snippets floating around, and nobody knows which ones still work after model changes. If you can make “skill quality” visible over time (regressions, drift), that’s valuable. Do you have a CI integration where you can pin a skill version and fail builds if eval scores drop?
This resonates with my experience: we have dozens of internal “playbooks” and prompt snippets floating around, and nobody knows which ones still work after model changes. If you can make “skill quality” visible over time (regressions, drift), that’s valuable. Do you have a CI integration where you can pin a skill version and fail builds if eval scores drop?
At some point I thought skills were only markdown files