Commit Graph

2 Commits

Author SHA1 Message Date
Sure Admin (bot)
43460664c4 feat(ci): improve LLM eval visibility in GitHub Actions (#1546)
* feat(ci): improve LLM eval visibility in GitHub Actions

- Add step summary output for each eval run (shows in GH UI)
- Add new 'summarize_evals' job that aggregates results from all matrix runs
- Generate markdown table with accuracy, cost, and duration for all evals
- Add threshold checking (fails workflow if accuracy < 70%)
- Include status icons (/) for quick visual assessment
- Show overall pass/fail status at the end of summary

* Fix LLM eval workflow summary

---------

Co-authored-by: SureBot <sure-bot@we-promise.com>
Co-authored-by: Juan José Mata <juanjo.mata@gmail.com>
2026-04-24 11:18:45 +02:00
Juan José Mata
8ae77ca379 Add GitHub Actions workflow to discover and run LLM evaluations (#1439)
* Run release eval workflow across model list

* Gracefully skip evals when OpenAI token is unusable

* Add defensive nil check for eval run export
2026-04-11 21:09:15 +02:00