QT/sure - sure

QT/sure

Fork 0

mirror of https://github.com/we-promise/sure.git synced 2026-06-04 18:29:02 +00:00

Commit Graph

Author	SHA1	Message	Date
Guillem Arias Fauste	991cb959c1	feat(ai): Anthropic batch ops + LLM cost ledger (2/5) (#1984 ) * feat(ai): add Anthropic provider with chat parity (1/5) Introduces Provider::Anthropic alongside Provider::Openai, implementing the LlmConcept chat_response contract over the official anthropic Ruby SDK. Batch ops, PDF, and RAG land in follow-up PRs. - Provider::Anthropic uses Messages API for sync and streaming responses - ChatConfig builds requests with ephemeral prompt-cache markers on the system prompt and the last tool definition - MessageFormatter reconstructs multi-turn history (text + tool_use + tool_result blocks) from raw Message records, including the paired user-role tool_result turn Anthropic requires after every tool_use - ChatParser maps Anthropic Message into the shared ChatResponse Data - Registry, Setting, User, Chat default model wired for ANTHROPIC_* envs and Setting.anthropic_; LLM_PROVIDER selects between providers - Responder forwards raw conversation_history (Array<Message>) so providers without hosted conversation state can rebuild context - OpenAI provider accepts and ignores the new kwarg (no behavior change) Tests cover provider init, model gating, MessageFormatter for all turn shapes, ChatConfig request building (max_tokens, system cache, tool conversion), ChatParser for text / tool_use / mixed blocks, Registry discovery, and mocked chat_response success / error / function_request paths. Live VCR cassettes recorded in a follow-up with a real key. Stacked PRs: 2/5 batch ops + cost ledger, 3/5 PDF, 4/5 pgvector RAG, 5/5 settings UI + disclosure. fix(ai): address PR review on Anthropic provider foundation Surface fixes raised by Codex + CodeRabbit on PR 1/5: - Provider::Anthropic#chat_response now accepts (and ignores) a `messages:` kwarg. Assistant::Responder passes both `messages:` (OpenAI-shape) and `conversation_history:` (raw Message records) for cross-provider parity, so the previous signature raised ArgumentError on the first chat turn through the Anthropic provider. - Provider::Anthropic#supports_model? bypasses the `claude` prefix gate when a custom base_url is configured, mirroring the OpenAI provider. Bedrock-shaped IDs like `anthropic.claude-sonnet-4-5-20250929-v1:0` and `claude-opus-4@20250514` are otherwise rejected by Assistant::Provided#get_model_provider and the chat dies. - Setting.anthropic_access_token is now in EncryptedSettingFields::ENCRYPTED_FIELDS so the Anthropic API key is encrypted at rest like every other provider secret. Previously plaintext while siblings (openai_access_token, twelve_data_api_key, external_assistant_token) were ciphertext. - Chat.default_model falls back to whichever provider is actually configured. Previously, with LLM_PROVIDER=anthropic but no Anthropic credentials, the default model resolved to a Claude ID that no registered provider supported, so chats failed even when OpenAI was fully configured. Adds Provider::{Anthropic,Openai}#configured? class methods for the readable callsite. - Provider::Anthropic.effective_model uses `ENV["ANTHROPIC_MODEL"].presence \|\| Setting.anthropic_model` so the Setting lookup is only performed when the env var is absent — the previous `ENV.fetch(KEY, default)` evaluated the default arg eagerly on every call. - Provider::Anthropic::ChatConfig#anthropic_input_schema strips both `:strict` and `"strict"` keys so JSON-decoded schemas with string keys cannot leak the OpenAI-only flag through to Anthropic. Test coverage added: supports_model? bypass on custom endpoints, chat_response messages: kwarg compatibility, default_model fallback in the three credential combinations, configured? against ENV + Setting, strict-flag stripping for both key types, and a `Setting.expects(:anthropic_model).never` assertion proving the ENV-precedence test now exercises the lazy path. All 4365 tests pass (1 pre-existing libvips env error unrelated). * test(chat): make default_model tests resilient to ENV model overrides CodeRabbit flagged on PR review: the new default_model tests asserted against Provider::::DEFAULT_MODEL, but Chat.default_model actually returns Provider::.effective_model.presence (which reads OPENAI_MODEL / ANTHROPIC_MODEL from the environment). With either env var set, the tests would fail intermittently even though routing was correct. - New default_model tests now assert against the provider's effective_model directly, so they verify the routing decision (which provider's value wins) without coupling to the constant. - Pre-existing "creates with default model" assertions had the same brittleness; switch them to compare against Chat.default_model so the chosen model is whatever the env / Setting cascade resolves to. Verified by running `ANTHROPIC_MODEL=claude-haiku-4-5 OPENAI_MODEL=gpt-4o bin/rails test test/models/chat_test.rb` — 16 runs, 0 failures (previously 2 pre-existing failures + 0 from the new tests). * fix(ai): address local review on Anthropic foundation - Provider::Anthropic#supports_pdf_processing? bypasses prefix gate for custom endpoints, mirroring supports_model? - Provider::Anthropic#initialize raises Error when custom_endpoint? AND model.blank?, parity with Provider::Openai - stream_chat_response captures partial usage on mid-stream errors and records it via the new on_partial callback so chat_response can skip the duplicate error row in the outer rescue - safe_accumulated_message swallows the secondary failure when the SDK cannot reconstruct a snapshot - langfuse_client memoizes properly (\|\|= instead of =) so repeated calls don't churn Langfuse instances - MessageFormatter sorts tool_calls by created_at then id so the message array is deterministic across replays; skips tool_calls missing both provider_call_id and provider_id rather than sending `id: nil` and getting rejected by Anthropic - Setting.anthropic_access_token default falls back through ENV["ANTHROPIC_API_KEY"].presence (was missing .presence, so an empty-string env value bled through) - User#openai_configured? / #anthropic_configured? delegate to the Provider::* class methods — single source of truth - Assistant::Responder renames the OpenAI-shape history builder conversation_history → openai_messages_payload so the kwarg name matches the local method name (messages: openai_messages_payload, conversation_history: chat_message_records) - Assistant::Builtin stale-history comment updated to reference both builders Adds a streaming chat_response test using ad-hoc subclasses of the SDK event types so the case/when dispatch matches via is_a? without stubbing class-level === behavior. * test(ai): add Anthropic tool_use round-trip + multi-tool turn coverage Addresses @jjmata's "worth confirming" note on PR #1983: tool-use turns from prior assistant messages must round-trip correctly when retrieved from the database. - New `ChatParser → ToolCall::Function → MessageFormatter` test walks the full path: Anthropic response with a tool_use block → ChatFunctionRequest → ToolCall::Function.from_function_request → persisted on the AssistantMessage → MessageFormatter rebuild on the next turn. Asserts the original `tool_use.id` is preserved end-to-end as both `tool_use.id` and the paired `tool_result.tool_use_id`, and that the original `input` hash and serialized result content survive. - New multi-tool assistant turn test confirms two tool_use blocks on a single assistant message render as two tool_use blocks followed by two paired tool_result blocks in a single user-role follow-up, matching Anthropic's required alternation. Both tests exercise the existing PR1 code without behavior changes. * test(ai): require "ostruct" explicitly in Anthropic provider tests OpenStruct is moving out of Ruby's default load path (warning in 3.4+, removed in 3.5+). Tests work today because ActiveSupport transitively loads it, but that's incidental. Match the existing convention in test/controllers/settings/hostings_controller_test.rb which explicitly requires ostruct for the same reason. * fix(ai): sanitize Langfuse warn logs, normalize tool_use.input, dedup history fetch Addresses three open CodeRabbit findings on PR #1983. - Provider::Anthropic Langfuse rescue branches no longer include `e.full_message` in `Rails.logger.warn`. `full_message` bundles the backtrace + cause chain and on some SDK error types includes the serialized request/response payload (prompt, model output). Logs now report `#{e.class}: #{e.message}` only. Three sites: create_langfuse_trace, log_langfuse_generation, upsert_langfuse_trace. Note: Provider::Openai has the same pattern (copy-pasted source) — harmonization deferred to a follow-up cleanup PR; this commit fixes only the Anthropic provider to keep PR scope tight. - MessageFormatter#parse_arguments now coerces any non-Hash parsed result to `{}`. Anthropic's Messages API requires `tool_use.input` to be a JSON object (map); a stored ToolCall::Function record whose arguments parse to a scalar, bool, or array (corrupt row, legacy data, cross-provider bleed) would otherwise produce a payload the API rejects. Normal flow stores Hash arguments end-to-end so the fix is defensive — adds 2 tests covering scalar/array JSON strings and non-String non-Hash inputs. - Assistant::Responder dedups the chat-history fetch. The previous layout fired two near-identical `chat.messages.where(...).includes( :tool_calls).ordered` queries per LLM turn (one for the OpenAI-shape payload, one for the raw-records kwarg). A new memoized `complete_chat_messages` fetches once; `chat_message_records` filters out the current message via `Array#reject`, `openai_messages_payload` iterates the cached array unchanged. One SQL query per turn instead of two. Memoization scope = single Responder instance (per LLM call), so cache invalidation is not a concern. All 4370 tests pass (1 pre-existing libvips env error unrelated). Rubocop + brakeman clean. * fix(ci): replace sk-ant- prefixed test placeholders Pipelock secret scanner pattern-matches `sk-ant-` as a real Anthropic API key and fails the PR security-scan check. Test stubs and ClimateControl env values used `sk-ant-test`, `sk-ant-from-setting`, `sk-ant-x`, `sk-ant-y` as obvious placeholders, but the scanner does not care about value entropy. Switched to `fake-anthropic-key-` / `fake-token-` strings so the scanner stops flagging them. No production code touched, no behavior change — Provider::Anthropic still accepts any non-blank token. feat(ai): add Anthropic batch ops + LLM cost ledger (2/5) Implements auto_categorize, auto_detect_merchants, and enhance_provider_merchants on Provider::Anthropic via forced tool calls, plus the cost-ledger plumbing they need. - Provider::Anthropic::AutoCategorizer, AutoMerchantDetector, ProviderMerchantEnhancer each define a single output tool whose input_schema mirrors the desired output, then force the model to call it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }. Anthropic guarantees the tool_use.input matches the schema, so there is no JSON parsing fragility, no <think> tag stripping, and no json_object/json_schema fallback ladders. - Concerns::UsageRecorder mirrors the OpenAI sibling but persists cache_creation_input_tokens / cache_read_input_tokens to dedicated columns instead of metadata. - Migration adds cache_creation_tokens, cache_read_tokens (nullable integers) to llm_usages. OpenAI rows leave them null. - LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6 $3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic" for claude-* via the existing exact/prefix lookup. - Provider::Anthropic#chat_response now persists cache columns directly rather than stashing them in metadata. - 25-transaction batch cap mirrors the OpenAI provider so the cost ledger sees the same shape regardless of which provider ran a batch. Tests cover the forced-tool-call path, null/None normalization, case-insensitive merchant matching, the missing-tool_use error path, and Anthropic-specific pricing + provider inference on LlmUsage. Stacked on #1983 (PR 1/5). 3/5 PDF + vision next. * fix(ai): attribute Bedrock model IDs to anthropic + clean nil enum - LlmUsage.infer_provider now returns "anthropic" for Bedrock / Vertex shaped IDs (anthropic.* and anthropic/), so cost-ledger filtering by provider stays correct even when no per-MTok rate is stored. Previously these IDs fell through to the "openai" default. - AutoCategorizer drops the redundant nil sentinel from the category_name enum — the union type [string, null] already permits null, and some JSON Schema validators reject nil literals inside enum arrays. test(ai): require "ostruct" in Anthropic batch op tests Same rationale as the PR1 ostruct fix — explicit require so the tests don't depend on ActiveSupport's transitive load when Ruby 3.5+ removes OpenStruct from the default load path. * fix(llm-usage): include Anthropic cache tokens in estimated_cost calculate_cost only priced prompt + completion tokens, so estimated_cost under-reported every cached call — the cache_creation/cache_read columns this PR added were tracked but never billed. Verified against the Anthropic dashboard: a cached chat turn billed $0.05 but the ledger recorded $0.038; the gap was exactly the unpriced cache tokens. Price them relative to the input rate (Anthropic: cache write 1.25x, read 0.1x) and thread the cache counts from both recorders (chat + batch). OpenAI rows leave the columns null (treated as 0), so they're unaffected. Ledger now reproduces the dashboard ($0.054 for the test turn). * chore(ai): guard chat usage double-record; flag deferred Anthropic batch wiring - Hardening: guard the success-path record_llm_usage with `unless partial_usage_recorded` so a future change that emits partial usage on a normal stream can't silently double-bill (the symptom investigated in the #1984 review). No behavior change today — on_partial only fires from the mid-stream-error rescue, which re-raises past this line. - Notice: the family auto-categorize / merchant-detect / merchant-enhance flows still hardcode get_provider(:openai). Provider::Anthropic now implements those batch ops but they aren't wired into the family flows yet — documented with TODOs at each site for the follow-up. * chore(ai): point family-flow TODOs at tracking issue #2113 * fix(ai): address review findings on cost ledger + categorizer schema Three AI-review findings on #1984: - category_name enum omitted null (codex + coderabbit): the prompt + type allow Claude to abstain on uncertain transactions, but JSON Schema `enum` restricted the value to category names, so null was invalid — forcing miscategorization. Append nil to the enum (the consumer already normalizes null -> uncategorized). - Cache pricing applied to all providers (coderabbit): the 1.25x/0.1x cache multipliers are Anthropic-specific. Gate them on provider == "anthropic" so a non-Anthropic caller passing cache counts isn't billed with the wrong rates. - Negative cache-token counts (coderabbit): add DB check constraints (cache_*_tokens IS NULL OR >= 0), per the repo's DB-level-validation convention. Tests: enum includes nil; non-Anthropic cache tokens aren't priced.	2026-06-01 22:00:48 +02:00
Juan José Mata	f491916411	Track failed LLM API calls in llm_usages table (#360 ) * Track failed LLM API calls in llm_usages table This commit adds comprehensive error tracking for failed LLM API calls: - Updated LlmUsage model with helper methods to identify failed calls and retrieve error details (failed?, http_status_code, error_message) - Modified Provider::Openai to record failed API calls with error metadata including HTTP status codes and error messages in both native and generic chat response methods - Enhanced UsageRecorder concern with record_usage_error method to support error tracking for auto-categorization and auto-merchant detection - Updated LLM usage UI to display failed calls with: - Red background highlighting for failed rows - Error indicator icon with "Failed" label - Interactive tooltip on hover showing error message and HTTP status code Failed calls are now tracked with zero tokens and null cost, storing error details in the metadata JSONB column for visibility and debugging. * Dark mode fixes --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-11-22 02:15:20 +01:00
soky srm	bb364fab38	LLM cost estimation (#223 ) * Password reset back button also after confirmation Signed-off-by: Juan José Mata <juanjo.mata@gmail.com> * Implement a filter for category (#215) - Also implement an is empty/is null condition. * Implement an LLM cost estimation page Track costs across all the cost categories: auto categorization, auto merchant detection and chat. Show warning with estimated cost when running a rule that contains AI. * Update pricing * Add google pricing and fix inferred model everywhere. * Update app/models/llm_usage.rb Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: soky srm <sokysrm@gmail.com> * FIX address review * Linter * Address review - Lowered log level - extracted the duplicated record_usage method into a shared concern * Update app/controllers/settings/llm_usages_controller.rb Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: soky srm <sokysrm@gmail.com> * Moved attr_reader out of private --------- Signed-off-by: Juan José Mata <juanjo.mata@gmail.com> Signed-off-by: soky srm <sokysrm@gmail.com> Co-authored-by: Juan José Mata <juanjo.mata@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-10-24 00:08:59 +02:00

Author

SHA1

Message

Date

Guillem Arias Fauste

991cb959c1

feat(ai): Anthropic batch ops + LLM cost ledger (2/5) (#1984 )

* feat(ai): add Anthropic provider with chat parity (1/5)

Introduces Provider::Anthropic alongside Provider::Openai, implementing
the LlmConcept chat_response contract over the official anthropic Ruby
SDK. Batch ops, PDF, and RAG land in follow-up PRs.

- Provider::Anthropic uses Messages API for sync and streaming responses
- ChatConfig builds requests with ephemeral prompt-cache markers on the
  system prompt and the last tool definition
- MessageFormatter reconstructs multi-turn history (text + tool_use +
  tool_result blocks) from raw Message records, including the paired
  user-role tool_result turn Anthropic requires after every tool_use
- ChatParser maps Anthropic Message into the shared ChatResponse Data
- Registry, Setting, User, Chat default model wired for ANTHROPIC_*
  envs and Setting.anthropic_*; LLM_PROVIDER selects between providers
- Responder forwards raw conversation_history (Array<Message>) so
  providers without hosted conversation state can rebuild context
- OpenAI provider accepts and ignores the new kwarg (no behavior change)

Tests cover provider init, model gating, MessageFormatter for all turn
shapes, ChatConfig request building (max_tokens, system cache, tool
conversion), ChatParser for text / tool_use / mixed blocks, Registry
discovery, and mocked chat_response success / error / function_request
paths. Live VCR cassettes recorded in a follow-up with a real key.

Stacked PRs: 2/5 batch ops + cost ledger, 3/5 PDF, 4/5 pgvector RAG,
5/5 settings UI + disclosure.

* fix(ai): address PR review on Anthropic provider foundation

Surface fixes raised by Codex + CodeRabbit on PR 1/5:

- Provider::Anthropic#chat_response now accepts (and ignores) a
  `messages:` kwarg. Assistant::Responder passes both `messages:`
  (OpenAI-shape) and `conversation_history:` (raw Message records) for
  cross-provider parity, so the previous signature raised
  ArgumentError on the first chat turn through the Anthropic provider.
- Provider::Anthropic#supports_model? bypasses the `claude` prefix
  gate when a custom base_url is configured, mirroring the OpenAI
  provider. Bedrock-shaped IDs like
  `anthropic.claude-sonnet-4-5-20250929-v1:0` and
  `claude-opus-4@20250514` are otherwise rejected by
  Assistant::Provided#get_model_provider and the chat dies.
- Setting.anthropic_access_token is now in
  EncryptedSettingFields::ENCRYPTED_FIELDS so the Anthropic API key
  is encrypted at rest like every other provider secret. Previously
  plaintext while siblings (openai_access_token, twelve_data_api_key,
  external_assistant_token) were ciphertext.
- Chat.default_model falls back to whichever provider is actually
  configured. Previously, with LLM_PROVIDER=anthropic but no
  Anthropic credentials, the default model resolved to a Claude ID
  that no registered provider supported, so chats failed even when
  OpenAI was fully configured. Adds Provider::{Anthropic,Openai}#configured?
  class methods for the readable callsite.
- Provider::Anthropic.effective_model uses
  `ENV["ANTHROPIC_MODEL"].presence || Setting.anthropic_model` so the
  Setting lookup is only performed when the env var is absent — the
  previous `ENV.fetch(KEY, default)` evaluated the default arg
  eagerly on every call.
- Provider::Anthropic::ChatConfig#anthropic_input_schema strips both
  `:strict` and `"strict"` keys so JSON-decoded schemas with string
  keys cannot leak the OpenAI-only flag through to Anthropic.

Test coverage added: supports_model? bypass on custom endpoints,
chat_response messages: kwarg compatibility, default_model fallback
in the three credential combinations, configured? against ENV +
Setting, strict-flag stripping for both key types, and a
`Setting.expects(:anthropic_model).never` assertion proving the
ENV-precedence test now exercises the lazy path.

All 4365 tests pass (1 pre-existing libvips env error unrelated).

* test(chat): make default_model tests resilient to ENV model overrides

CodeRabbit flagged on PR review: the new default_model tests asserted
against Provider::*::DEFAULT_MODEL, but Chat.default_model actually
returns Provider::*.effective_model.presence (which reads
OPENAI_MODEL / ANTHROPIC_MODEL from the environment). With either env
var set, the tests would fail intermittently even though routing was
correct.

- New default_model tests now assert against the provider's
  effective_model directly, so they verify the routing decision
  (which provider's value wins) without coupling to the constant.
- Pre-existing "creates with default model" assertions had the same
  brittleness; switch them to compare against Chat.default_model so
  the chosen model is whatever the env / Setting cascade resolves to.

Verified by running `ANTHROPIC_MODEL=claude-haiku-4-5 OPENAI_MODEL=gpt-4o
bin/rails test test/models/chat_test.rb` — 16 runs, 0 failures
(previously 2 pre-existing failures + 0 from the new tests).

* fix(ai): address local review on Anthropic foundation

- Provider::Anthropic#supports_pdf_processing? bypasses prefix gate for
  custom endpoints, mirroring supports_model?
- Provider::Anthropic#initialize raises Error when custom_endpoint? AND
  model.blank?, parity with Provider::Openai
- stream_chat_response captures partial usage on mid-stream errors and
  records it via the new on_partial callback so chat_response can skip
  the duplicate error row in the outer rescue
- safe_accumulated_message swallows the secondary failure when the SDK
  cannot reconstruct a snapshot
- langfuse_client memoizes properly (||= instead of =) so repeated calls
  don't churn Langfuse instances
- MessageFormatter sorts tool_calls by created_at then id so the
  message array is deterministic across replays; skips tool_calls
  missing both provider_call_id and provider_id rather than sending
  `id: nil` and getting rejected by Anthropic
- Setting.anthropic_access_token default falls back through
  ENV["ANTHROPIC_API_KEY"].presence (was missing .presence, so an
  empty-string env value bled through)
- User#openai_configured? / #anthropic_configured? delegate to the
  Provider::* class methods — single source of truth
- Assistant::Responder renames the OpenAI-shape history builder
  conversation_history → openai_messages_payload so the kwarg name
  matches the local method name (messages: openai_messages_payload,
  conversation_history: chat_message_records)
- Assistant::Builtin stale-history comment updated to reference both
  builders

Adds a streaming chat_response test using ad-hoc subclasses of the
SDK event types so the case/when dispatch matches via is_a? without
stubbing class-level === behavior.

* test(ai): add Anthropic tool_use round-trip + multi-tool turn coverage

Addresses @jjmata's "worth confirming" note on PR #1983: tool-use turns
from prior assistant messages must round-trip correctly when retrieved
from the database.

- New `ChatParser → ToolCall::Function → MessageFormatter` test walks
  the full path: Anthropic response with a tool_use block →
  ChatFunctionRequest → ToolCall::Function.from_function_request →
  persisted on the AssistantMessage → MessageFormatter rebuild on the
  next turn. Asserts the original `tool_use.id` is preserved end-to-end
  as both `tool_use.id` and the paired `tool_result.tool_use_id`, and
  that the original `input` hash and serialized result content survive.
- New multi-tool assistant turn test confirms two tool_use blocks on a
  single assistant message render as two tool_use blocks followed by
  two paired tool_result blocks in a single user-role follow-up,
  matching Anthropic's required alternation.

Both tests exercise the existing PR1 code without behavior changes.

* test(ai): require "ostruct" explicitly in Anthropic provider tests

OpenStruct is moving out of Ruby's default load path (warning in 3.4+,
removed in 3.5+). Tests work today because ActiveSupport transitively
loads it, but that's incidental. Match the existing convention in
test/controllers/settings/hostings_controller_test.rb which explicitly
requires ostruct for the same reason.

* fix(ai): sanitize Langfuse warn logs, normalize tool_use.input, dedup history fetch

Addresses three open CodeRabbit findings on PR #1983.

- Provider::Anthropic Langfuse rescue branches no longer include
  `e.full_message` in `Rails.logger.warn`. `full_message` bundles the
  backtrace + cause chain and on some SDK error types includes the
  serialized request/response payload (prompt, model output). Logs
  now report `#{e.class}: #{e.message}` only. Three sites:
  create_langfuse_trace, log_langfuse_generation, upsert_langfuse_trace.
  Note: Provider::Openai has the same pattern (copy-pasted source) —
  harmonization deferred to a follow-up cleanup PR; this commit fixes
  only the Anthropic provider to keep PR scope tight.

- MessageFormatter#parse_arguments now coerces any non-Hash parsed
  result to `{}`. Anthropic's Messages API requires `tool_use.input`
  to be a JSON object (map); a stored ToolCall::Function record whose
  arguments parse to a scalar, bool, or array (corrupt row, legacy
  data, cross-provider bleed) would otherwise produce a payload the
  API rejects. Normal flow stores Hash arguments end-to-end so the
  fix is defensive — adds 2 tests covering scalar/array JSON strings
  and non-String non-Hash inputs.

- Assistant::Responder dedups the chat-history fetch. The previous
  layout fired two near-identical `chat.messages.where(...).includes(
  :tool_calls).ordered` queries per LLM turn (one for the OpenAI-shape
  payload, one for the raw-records kwarg). A new memoized
  `complete_chat_messages` fetches once; `chat_message_records` filters
  out the current message via `Array#reject`, `openai_messages_payload`
  iterates the cached array unchanged. One SQL query per turn instead
  of two. Memoization scope = single Responder instance (per LLM call),
  so cache invalidation is not a concern.

All 4370 tests pass (1 pre-existing libvips env error unrelated).
Rubocop + brakeman clean.

* fix(ci): replace sk-ant- prefixed test placeholders

Pipelock secret scanner pattern-matches `sk-ant-*` as a real Anthropic
API key and fails the PR security-scan check. Test stubs and
ClimateControl env values used `sk-ant-test`, `sk-ant-from-setting`,
`sk-ant-x`, `sk-ant-y` as obvious placeholders, but the scanner does
not care about value entropy.

Switched to `fake-anthropic-key-*` / `fake-token-*` strings so the
scanner stops flagging them. No production code touched, no behavior
change — Provider::Anthropic still accepts any non-blank token.

* feat(ai): add Anthropic batch ops + LLM cost ledger (2/5)

Implements auto_categorize, auto_detect_merchants, and
enhance_provider_merchants on Provider::Anthropic via forced tool calls,
plus the cost-ledger plumbing they need.

- Provider::Anthropic::AutoCategorizer, AutoMerchantDetector,
  ProviderMerchantEnhancer each define a single output tool whose
  input_schema mirrors the desired output, then force the model to call
  it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }.
  Anthropic guarantees the tool_use.input matches the schema, so there
  is no JSON parsing fragility, no <think> tag stripping, and no
  json_object/json_schema fallback ladders.
- Concerns::UsageRecorder mirrors the OpenAI sibling but persists
  cache_creation_input_tokens / cache_read_input_tokens to dedicated
  columns instead of metadata.
- Migration adds cache_creation_tokens, cache_read_tokens (nullable
  integers) to llm_usages. OpenAI rows leave them null.
- LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6
  $3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic"
  for claude-* via the existing exact/prefix lookup.
- Provider::Anthropic#chat_response now persists cache columns directly
  rather than stashing them in metadata.
- 25-transaction batch cap mirrors the OpenAI provider so the cost
  ledger sees the same shape regardless of which provider ran a batch.

Tests cover the forced-tool-call path, null/None normalization,
case-insensitive merchant matching, the missing-tool_use error path,
and Anthropic-specific pricing + provider inference on LlmUsage.

Stacked on #1983 (PR 1/5). 3/5 PDF + vision next.

* fix(ai): attribute Bedrock model IDs to anthropic + clean nil enum

- LlmUsage.infer_provider now returns "anthropic" for Bedrock /
  Vertex shaped IDs (anthropic.* and anthropic/*), so cost-ledger
  filtering by provider stays correct even when no per-MTok rate is
  stored. Previously these IDs fell through to the "openai" default.
- AutoCategorizer drops the redundant nil sentinel from the
  category_name enum — the union type [string, null] already permits
  null, and some JSON Schema validators reject nil literals inside
  enum arrays.

* test(ai): require "ostruct" in Anthropic batch op tests

Same rationale as the PR1 ostruct fix — explicit require so the tests
don't depend on ActiveSupport's transitive load when Ruby 3.5+ removes
OpenStruct from the default load path.

* fix(llm-usage): include Anthropic cache tokens in estimated_cost

calculate_cost only priced prompt + completion tokens, so estimated_cost
under-reported every cached call — the cache_creation/cache_read columns this PR
added were tracked but never billed. Verified against the Anthropic dashboard: a
cached chat turn billed $0.05 but the ledger recorded $0.038; the gap was exactly
the unpriced cache tokens.

Price them relative to the input rate (Anthropic: cache write 1.25x, read 0.1x)
and thread the cache counts from both recorders (chat + batch). OpenAI rows leave
the columns null (treated as 0), so they're unaffected. Ledger now reproduces the
dashboard ($0.054 for the test turn).

* chore(ai): guard chat usage double-record; flag deferred Anthropic batch wiring

- Hardening: guard the success-path record_llm_usage with
  `unless partial_usage_recorded` so a future change that emits partial usage on
  a normal stream can't silently double-bill (the symptom investigated in the
  #1984 review). No behavior change today — on_partial only fires from the
  mid-stream-error rescue, which re-raises past this line.
- Notice: the family auto-categorize / merchant-detect / merchant-enhance flows
  still hardcode get_provider(:openai). Provider::Anthropic now implements those
  batch ops but they aren't wired into the family flows yet — documented with
  TODOs at each site for the follow-up.

* chore(ai): point family-flow TODOs at tracking issue #2113

* fix(ai): address review findings on cost ledger + categorizer schema

Three AI-review findings on #1984:

- category_name enum omitted null (codex + coderabbit): the prompt + type allow
  Claude to abstain on uncertain transactions, but JSON Schema `enum` restricted
  the value to category names, so null was invalid — forcing miscategorization.
  Append nil to the enum (the consumer already normalizes null -> uncategorized).
- Cache pricing applied to all providers (coderabbit): the 1.25x/0.1x cache
  multipliers are Anthropic-specific. Gate them on provider == "anthropic" so a
  non-Anthropic caller passing cache counts isn't billed with the wrong rates.
- Negative cache-token counts (coderabbit): add DB check constraints
  (cache_*_tokens IS NULL OR >= 0), per the repo's DB-level-validation convention.

Tests: enum includes nil; non-Anthropic cache tokens aren't priced.

2026-06-01 22:00:48 +02:00

Juan José Mata

f491916411

Track failed LLM API calls in llm_usages table (#360 )

* Track failed LLM API calls in llm_usages table

This commit adds comprehensive error tracking for failed LLM API calls:

- Updated LlmUsage model with helper methods to identify failed calls
  and retrieve error details (failed?, http_status_code, error_message)

- Modified Provider::Openai to record failed API calls with error metadata
  including HTTP status codes and error messages in both native and
  generic chat response methods

- Enhanced UsageRecorder concern with record_usage_error method to support
  error tracking for auto-categorization and auto-merchant detection

- Updated LLM usage UI to display failed calls with:
  - Red background highlighting for failed rows
  - Error indicator icon with "Failed" label
  - Interactive tooltip on hover showing error message and HTTP status code

Failed calls are now tracked with zero tokens and null cost, storing
error details in the metadata JSONB column for visibility and debugging.

* Dark mode fixes

---------

Co-authored-by: Claude <noreply@anthropic.com>

2025-11-22 02:15:20 +01:00

soky srm

bb364fab38

LLM cost estimation (#223 )

* Password reset back button also after confirmation

Signed-off-by: Juan José Mata <juanjo.mata@gmail.com>

* Implement a filter for category (#215)

- Also implement an is empty/is null condition.

* Implement an LLM cost estimation page

Track costs across all the cost categories: auto categorization, auto merchant detection and chat.
Show warning with estimated cost when running a rule that contains AI.

* Update pricing

* Add google pricing

and fix inferred model everywhere.

* Update app/models/llm_usage.rb

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: soky srm <sokysrm@gmail.com>

* FIX address review

* Linter

* Address review

- Lowered log level
- extracted the duplicated record_usage method into a shared concern

* Update app/controllers/settings/llm_usages_controller.rb

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: soky srm <sokysrm@gmail.com>

* Moved attr_reader out of private

---------

Signed-off-by: Juan José Mata <juanjo.mata@gmail.com>
Signed-off-by: soky srm <sokysrm@gmail.com>
Co-authored-by: Juan José Mata <juanjo.mata@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

2025-10-24 00:08:59 +02:00

3 Commits