QT/sure - sure

QT/sure

Fork 0

mirror of https://github.com/we-promise/sure.git synced 2026-05-08 13:14:58 +00:00

Commit Graph

Author	SHA1	Message	Date
Sure Admin (bot)	43460664c4	feat(ci): improve LLM eval visibility in GitHub Actions (#1546 ) * feat(ci): improve LLM eval visibility in GitHub Actions - Add step summary output for each eval run (shows in GH UI) - Add new 'summarize_evals' job that aggregates results from all matrix runs - Generate markdown table with accuracy, cost, and duration for all evals - Add threshold checking (fails workflow if accuracy < 70%) - Include status icons (✅/❌) for quick visual assessment - Show overall pass/fail status at the end of summary * Fix LLM eval workflow summary --------- Co-authored-by: SureBot <sure-bot@we-promise.com> Co-authored-by: Juan José Mata <juanjo.mata@gmail.com>	2026-04-24 11:18:45 +02:00
Juan José Mata	8ae77ca379	Add GitHub Actions workflow to discover and run LLM evaluations (#1439 ) * Run release eval workflow across model list * Gracefully skip evals when OpenAI token is unusable * Add defensive nil check for eval run export	2026-04-11 21:09:15 +02:00

Author

SHA1

Message

Date

Sure Admin (bot)

43460664c4

feat(ci): improve LLM eval visibility in GitHub Actions (#1546 )

* feat(ci): improve LLM eval visibility in GitHub Actions

- Add step summary output for each eval run (shows in GH UI)
- Add new 'summarize_evals' job that aggregates results from all matrix runs
- Generate markdown table with accuracy, cost, and duration for all evals
- Add threshold checking (fails workflow if accuracy < 70%)
- Include status icons (✅/❌) for quick visual assessment
- Show overall pass/fail status at the end of summary

* Fix LLM eval workflow summary

---------

Co-authored-by: SureBot <sure-bot@we-promise.com>
Co-authored-by: Juan José Mata <juanjo.mata@gmail.com>

2026-04-24 11:18:45 +02:00

Juan José Mata

8ae77ca379

Add GitHub Actions workflow to discover and run LLM evaluations (#1439 )

* Run release eval workflow across model list

* Gracefully skip evals when OpenAI token is unusable

* Add defensive nil check for eval run export

2026-04-11 21:09:15 +02:00

2 Commits