mirror of
https://github.com/we-promise/sure.git
synced 2026-04-20 12:34:12 +00:00
Small llms improvements (#400)
* Initial implementation * FIX keys * Add langfuse evals support * FIX trace upload * Delete .claude/settings.local.json Signed-off-by: soky srm <sokysrm@gmail.com> * Update client.rb * Small LLMs improvements * Keep batch size normal * Update categorizer * FIX json mode * Add reasonable alternative to matching * FIX thinking blocks for llms * Implement json mode support with AUTO mode * Make auto default for everyone * FIX linter * Address review * Allow export manual categories * FIX user export * FIX oneshot example pollution * Update categorization_golden_v1.yml * Update categorization_golden_v1.yml * Trim to 100 items * Update auto_categorizer.rb * FIX for auto retry in auto mode * Separate the Eval Logic from the Auto-Categorizer The expected_null_count parameter conflates eval-specific logic with production categorization logic. * Force json mode on evals * Introduce a more mixed dataset 150 items, performance from a local model: By Difficulty: easy: 93.22% accuracy (55/59) medium: 93.33% accuracy (42/45) hard: 92.86% accuracy (26/28) edge_case: 100.0% accuracy (18/18) * Improve datasets Remove Data leakage from prompts * Create eval runs as "pending" --------- Signed-off-by: soky srm <sokysrm@gmail.com> Signed-off-by: Juan José Mata <juanjo.mata@gmail.com> Co-authored-by: Juan José Mata <juanjo.mata@gmail.com>
This commit is contained in:
@@ -51,7 +51,7 @@ class Provider::Openai < Provider
|
||||
@uri_base.present?
|
||||
end
|
||||
|
||||
def auto_categorize(transactions: [], user_categories: [], model: "", family: nil)
|
||||
def auto_categorize(transactions: [], user_categories: [], model: "", family: nil, json_mode: nil)
|
||||
with_provider_response do
|
||||
raise Error, "Too many transactions to auto-categorize. Max is 25 per request." if transactions.size > 25
|
||||
if user_categories.blank?
|
||||
@@ -74,7 +74,8 @@ class Provider::Openai < Provider
|
||||
user_categories: user_categories,
|
||||
custom_provider: custom_provider?,
|
||||
langfuse_trace: trace,
|
||||
family: family
|
||||
family: family,
|
||||
json_mode: json_mode
|
||||
).auto_categorize
|
||||
|
||||
trace&.update(output: result.map(&:to_h))
|
||||
@@ -83,7 +84,7 @@ class Provider::Openai < Provider
|
||||
end
|
||||
end
|
||||
|
||||
def auto_detect_merchants(transactions: [], user_merchants: [], model: "", family: nil)
|
||||
def auto_detect_merchants(transactions: [], user_merchants: [], model: "", family: nil, json_mode: nil)
|
||||
with_provider_response do
|
||||
raise Error, "Too many transactions to auto-detect merchants. Max is 25 per request." if transactions.size > 25
|
||||
|
||||
@@ -101,7 +102,8 @@ class Provider::Openai < Provider
|
||||
user_merchants: user_merchants,
|
||||
custom_provider: custom_provider?,
|
||||
langfuse_trace: trace,
|
||||
family: family
|
||||
family: family,
|
||||
json_mode: json_mode
|
||||
).auto_detect_merchants
|
||||
|
||||
trace&.update(output: result.map(&:to_h))
|
||||
|
||||
Reference in New Issue
Block a user