| evals_registry |
EVAL |
| id |
date |
output |
method |
anomalies |
action |
| EVAL-XXX |
YYYY-MM-DD |
string (what was produced) |
string (how it was evaluated - manual read, test, benchmark, user feedback) |
list of strings (what was wrong, missing, surprising) |
| keep | correct | deprecate |
|
|
| Log an eval whenever you validate the quality of something Claude produced (report, audit, plan, generated code). |
| Action keep - the output is fit for purpose as-is. |
| Action correct - needs revision; capture what. |
| Action deprecate - the approach itself is flawed; link to the decision that replaces it. |
|