Files
sure/app/models/provider/anthropic.rb
Guillem Arias Fauste 991cb959c1 feat(ai): Anthropic batch ops + LLM cost ledger (2/5) (#1984)
* feat(ai): add Anthropic provider with chat parity (1/5)

Introduces Provider::Anthropic alongside Provider::Openai, implementing
the LlmConcept chat_response contract over the official anthropic Ruby
SDK. Batch ops, PDF, and RAG land in follow-up PRs.

- Provider::Anthropic uses Messages API for sync and streaming responses
- ChatConfig builds requests with ephemeral prompt-cache markers on the
  system prompt and the last tool definition
- MessageFormatter reconstructs multi-turn history (text + tool_use +
  tool_result blocks) from raw Message records, including the paired
  user-role tool_result turn Anthropic requires after every tool_use
- ChatParser maps Anthropic Message into the shared ChatResponse Data
- Registry, Setting, User, Chat default model wired for ANTHROPIC_*
  envs and Setting.anthropic_*; LLM_PROVIDER selects between providers
- Responder forwards raw conversation_history (Array<Message>) so
  providers without hosted conversation state can rebuild context
- OpenAI provider accepts and ignores the new kwarg (no behavior change)

Tests cover provider init, model gating, MessageFormatter for all turn
shapes, ChatConfig request building (max_tokens, system cache, tool
conversion), ChatParser for text / tool_use / mixed blocks, Registry
discovery, and mocked chat_response success / error / function_request
paths. Live VCR cassettes recorded in a follow-up with a real key.

Stacked PRs: 2/5 batch ops + cost ledger, 3/5 PDF, 4/5 pgvector RAG,
5/5 settings UI + disclosure.

* fix(ai): address PR review on Anthropic provider foundation

Surface fixes raised by Codex + CodeRabbit on PR 1/5:

- Provider::Anthropic#chat_response now accepts (and ignores) a
  `messages:` kwarg. Assistant::Responder passes both `messages:`
  (OpenAI-shape) and `conversation_history:` (raw Message records) for
  cross-provider parity, so the previous signature raised
  ArgumentError on the first chat turn through the Anthropic provider.
- Provider::Anthropic#supports_model? bypasses the `claude` prefix
  gate when a custom base_url is configured, mirroring the OpenAI
  provider. Bedrock-shaped IDs like
  `anthropic.claude-sonnet-4-5-20250929-v1:0` and
  `claude-opus-4@20250514` are otherwise rejected by
  Assistant::Provided#get_model_provider and the chat dies.
- Setting.anthropic_access_token is now in
  EncryptedSettingFields::ENCRYPTED_FIELDS so the Anthropic API key
  is encrypted at rest like every other provider secret. Previously
  plaintext while siblings (openai_access_token, twelve_data_api_key,
  external_assistant_token) were ciphertext.
- Chat.default_model falls back to whichever provider is actually
  configured. Previously, with LLM_PROVIDER=anthropic but no
  Anthropic credentials, the default model resolved to a Claude ID
  that no registered provider supported, so chats failed even when
  OpenAI was fully configured. Adds Provider::{Anthropic,Openai}#configured?
  class methods for the readable callsite.
- Provider::Anthropic.effective_model uses
  `ENV["ANTHROPIC_MODEL"].presence || Setting.anthropic_model` so the
  Setting lookup is only performed when the env var is absent — the
  previous `ENV.fetch(KEY, default)` evaluated the default arg
  eagerly on every call.
- Provider::Anthropic::ChatConfig#anthropic_input_schema strips both
  `:strict` and `"strict"` keys so JSON-decoded schemas with string
  keys cannot leak the OpenAI-only flag through to Anthropic.

Test coverage added: supports_model? bypass on custom endpoints,
chat_response messages: kwarg compatibility, default_model fallback
in the three credential combinations, configured? against ENV +
Setting, strict-flag stripping for both key types, and a
`Setting.expects(:anthropic_model).never` assertion proving the
ENV-precedence test now exercises the lazy path.

All 4365 tests pass (1 pre-existing libvips env error unrelated).

* test(chat): make default_model tests resilient to ENV model overrides

CodeRabbit flagged on PR review: the new default_model tests asserted
against Provider::*::DEFAULT_MODEL, but Chat.default_model actually
returns Provider::*.effective_model.presence (which reads
OPENAI_MODEL / ANTHROPIC_MODEL from the environment). With either env
var set, the tests would fail intermittently even though routing was
correct.

- New default_model tests now assert against the provider's
  effective_model directly, so they verify the routing decision
  (which provider's value wins) without coupling to the constant.
- Pre-existing "creates with default model" assertions had the same
  brittleness; switch them to compare against Chat.default_model so
  the chosen model is whatever the env / Setting cascade resolves to.

Verified by running `ANTHROPIC_MODEL=claude-haiku-4-5 OPENAI_MODEL=gpt-4o
bin/rails test test/models/chat_test.rb` — 16 runs, 0 failures
(previously 2 pre-existing failures + 0 from the new tests).

* fix(ai): address local review on Anthropic foundation

- Provider::Anthropic#supports_pdf_processing? bypasses prefix gate for
  custom endpoints, mirroring supports_model?
- Provider::Anthropic#initialize raises Error when custom_endpoint? AND
  model.blank?, parity with Provider::Openai
- stream_chat_response captures partial usage on mid-stream errors and
  records it via the new on_partial callback so chat_response can skip
  the duplicate error row in the outer rescue
- safe_accumulated_message swallows the secondary failure when the SDK
  cannot reconstruct a snapshot
- langfuse_client memoizes properly (||= instead of =) so repeated calls
  don't churn Langfuse instances
- MessageFormatter sorts tool_calls by created_at then id so the
  message array is deterministic across replays; skips tool_calls
  missing both provider_call_id and provider_id rather than sending
  `id: nil` and getting rejected by Anthropic
- Setting.anthropic_access_token default falls back through
  ENV["ANTHROPIC_API_KEY"].presence (was missing .presence, so an
  empty-string env value bled through)
- User#openai_configured? / #anthropic_configured? delegate to the
  Provider::* class methods — single source of truth
- Assistant::Responder renames the OpenAI-shape history builder
  conversation_history → openai_messages_payload so the kwarg name
  matches the local method name (messages: openai_messages_payload,
  conversation_history: chat_message_records)
- Assistant::Builtin stale-history comment updated to reference both
  builders

Adds a streaming chat_response test using ad-hoc subclasses of the
SDK event types so the case/when dispatch matches via is_a? without
stubbing class-level === behavior.

* test(ai): add Anthropic tool_use round-trip + multi-tool turn coverage

Addresses @jjmata's "worth confirming" note on PR #1983: tool-use turns
from prior assistant messages must round-trip correctly when retrieved
from the database.

- New `ChatParser → ToolCall::Function → MessageFormatter` test walks
  the full path: Anthropic response with a tool_use block →
  ChatFunctionRequest → ToolCall::Function.from_function_request →
  persisted on the AssistantMessage → MessageFormatter rebuild on the
  next turn. Asserts the original `tool_use.id` is preserved end-to-end
  as both `tool_use.id` and the paired `tool_result.tool_use_id`, and
  that the original `input` hash and serialized result content survive.
- New multi-tool assistant turn test confirms two tool_use blocks on a
  single assistant message render as two tool_use blocks followed by
  two paired tool_result blocks in a single user-role follow-up,
  matching Anthropic's required alternation.

Both tests exercise the existing PR1 code without behavior changes.

* test(ai): require "ostruct" explicitly in Anthropic provider tests

OpenStruct is moving out of Ruby's default load path (warning in 3.4+,
removed in 3.5+). Tests work today because ActiveSupport transitively
loads it, but that's incidental. Match the existing convention in
test/controllers/settings/hostings_controller_test.rb which explicitly
requires ostruct for the same reason.

* fix(ai): sanitize Langfuse warn logs, normalize tool_use.input, dedup history fetch

Addresses three open CodeRabbit findings on PR #1983.

- Provider::Anthropic Langfuse rescue branches no longer include
  `e.full_message` in `Rails.logger.warn`. `full_message` bundles the
  backtrace + cause chain and on some SDK error types includes the
  serialized request/response payload (prompt, model output). Logs
  now report `#{e.class}: #{e.message}` only. Three sites:
  create_langfuse_trace, log_langfuse_generation, upsert_langfuse_trace.
  Note: Provider::Openai has the same pattern (copy-pasted source) —
  harmonization deferred to a follow-up cleanup PR; this commit fixes
  only the Anthropic provider to keep PR scope tight.

- MessageFormatter#parse_arguments now coerces any non-Hash parsed
  result to `{}`. Anthropic's Messages API requires `tool_use.input`
  to be a JSON object (map); a stored ToolCall::Function record whose
  arguments parse to a scalar, bool, or array (corrupt row, legacy
  data, cross-provider bleed) would otherwise produce a payload the
  API rejects. Normal flow stores Hash arguments end-to-end so the
  fix is defensive — adds 2 tests covering scalar/array JSON strings
  and non-String non-Hash inputs.

- Assistant::Responder dedups the chat-history fetch. The previous
  layout fired two near-identical `chat.messages.where(...).includes(
  :tool_calls).ordered` queries per LLM turn (one for the OpenAI-shape
  payload, one for the raw-records kwarg). A new memoized
  `complete_chat_messages` fetches once; `chat_message_records` filters
  out the current message via `Array#reject`, `openai_messages_payload`
  iterates the cached array unchanged. One SQL query per turn instead
  of two. Memoization scope = single Responder instance (per LLM call),
  so cache invalidation is not a concern.

All 4370 tests pass (1 pre-existing libvips env error unrelated).
Rubocop + brakeman clean.

* fix(ci): replace sk-ant- prefixed test placeholders

Pipelock secret scanner pattern-matches `sk-ant-*` as a real Anthropic
API key and fails the PR security-scan check. Test stubs and
ClimateControl env values used `sk-ant-test`, `sk-ant-from-setting`,
`sk-ant-x`, `sk-ant-y` as obvious placeholders, but the scanner does
not care about value entropy.

Switched to `fake-anthropic-key-*` / `fake-token-*` strings so the
scanner stops flagging them. No production code touched, no behavior
change — Provider::Anthropic still accepts any non-blank token.

* feat(ai): add Anthropic batch ops + LLM cost ledger (2/5)

Implements auto_categorize, auto_detect_merchants, and
enhance_provider_merchants on Provider::Anthropic via forced tool calls,
plus the cost-ledger plumbing they need.

- Provider::Anthropic::AutoCategorizer, AutoMerchantDetector,
  ProviderMerchantEnhancer each define a single output tool whose
  input_schema mirrors the desired output, then force the model to call
  it via tool_choice: { type: "tool", name: ..., disable_parallel_tool_use: true }.
  Anthropic guarantees the tool_use.input matches the schema, so there
  is no JSON parsing fragility, no <think> tag stripping, and no
  json_object/json_schema fallback ladders.
- Concerns::UsageRecorder mirrors the OpenAI sibling but persists
  cache_creation_input_tokens / cache_read_input_tokens to dedicated
  columns instead of metadata.
- Migration adds cache_creation_tokens, cache_read_tokens (nullable
  integers) to llm_usages. OpenAI rows leave them null.
- LlmUsage::PRICING gains Claude 4.x rows (opus-4-7 $15/$75, sonnet-4-6
  $3/$15, haiku-4-5 $1/$5 per MTok). infer_provider returns "anthropic"
  for claude-* via the existing exact/prefix lookup.
- Provider::Anthropic#chat_response now persists cache columns directly
  rather than stashing them in metadata.
- 25-transaction batch cap mirrors the OpenAI provider so the cost
  ledger sees the same shape regardless of which provider ran a batch.

Tests cover the forced-tool-call path, null/None normalization,
case-insensitive merchant matching, the missing-tool_use error path,
and Anthropic-specific pricing + provider inference on LlmUsage.

Stacked on #1983 (PR 1/5). 3/5 PDF + vision next.

* fix(ai): attribute Bedrock model IDs to anthropic + clean nil enum

- LlmUsage.infer_provider now returns "anthropic" for Bedrock /
  Vertex shaped IDs (anthropic.* and anthropic/*), so cost-ledger
  filtering by provider stays correct even when no per-MTok rate is
  stored. Previously these IDs fell through to the "openai" default.
- AutoCategorizer drops the redundant nil sentinel from the
  category_name enum — the union type [string, null] already permits
  null, and some JSON Schema validators reject nil literals inside
  enum arrays.

* test(ai): require "ostruct" in Anthropic batch op tests

Same rationale as the PR1 ostruct fix — explicit require so the tests
don't depend on ActiveSupport's transitive load when Ruby 3.5+ removes
OpenStruct from the default load path.

* fix(llm-usage): include Anthropic cache tokens in estimated_cost

calculate_cost only priced prompt + completion tokens, so estimated_cost
under-reported every cached call — the cache_creation/cache_read columns this PR
added were tracked but never billed. Verified against the Anthropic dashboard: a
cached chat turn billed $0.05 but the ledger recorded $0.038; the gap was exactly
the unpriced cache tokens.

Price them relative to the input rate (Anthropic: cache write 1.25x, read 0.1x)
and thread the cache counts from both recorders (chat + batch). OpenAI rows leave
the columns null (treated as 0), so they're unaffected. Ledger now reproduces the
dashboard ($0.054 for the test turn).

* chore(ai): guard chat usage double-record; flag deferred Anthropic batch wiring

- Hardening: guard the success-path record_llm_usage with
  `unless partial_usage_recorded` so a future change that emits partial usage on
  a normal stream can't silently double-bill (the symptom investigated in the
  #1984 review). No behavior change today — on_partial only fires from the
  mid-stream-error rescue, which re-raises past this line.
- Notice: the family auto-categorize / merchant-detect / merchant-enhance flows
  still hardcode get_provider(:openai). Provider::Anthropic now implements those
  batch ops but they aren't wired into the family flows yet — documented with
  TODOs at each site for the follow-up.

* chore(ai): point family-flow TODOs at tracking issue #2113

* fix(ai): address review findings on cost ledger + categorizer schema

Three AI-review findings on #1984:

- category_name enum omitted null (codex + coderabbit): the prompt + type allow
  Claude to abstain on uncertain transactions, but JSON Schema `enum` restricted
  the value to category names, so null was invalid — forcing miscategorization.
  Append nil to the enum (the consumer already normalizes null -> uncategorized).
- Cache pricing applied to all providers (coderabbit): the 1.25x/0.1x cache
  multipliers are Anthropic-specific. Gate them on provider == "anthropic" so a
  non-Anthropic caller passing cache counts isn't billed with the wrong rates.
- Negative cache-token counts (coderabbit): add DB check constraints
  (cache_*_tokens IS NULL OR >= 0), per the repo's DB-level-validation convention.

Tests: enum includes nil; non-Anthropic cache tokens aren't priced.
2026-06-01 22:00:48 +02:00

448 lines
14 KiB
Ruby

class Provider::Anthropic < Provider
include LlmConcept
# Subclass so errors caught in this provider are raised as Provider::Anthropic::Error
Error = Class.new(Provider::Error)
# Supported Anthropic model prefixes
DEFAULT_ANTHROPIC_MODEL_PREFIXES = %w[claude].freeze
DEFAULT_MODEL = "claude-sonnet-4-6"
# All Claude 3.5+ and 4.x models accept native document content blocks.
VISION_CAPABLE_MODEL_PREFIXES = %w[claude].freeze
def self.effective_model
# Use ENV[].presence rather than ENV.fetch(KEY, default) so the Setting
# lookup is only performed when the ENV var is actually absent — otherwise
# the default arg is evaluated eagerly on every call.
configured_model = ENV["ANTHROPIC_MODEL"].presence || Setting.anthropic_model
configured_model.presence || DEFAULT_MODEL
end
def self.configured?
ENV["ANTHROPIC_ACCESS_TOKEN"].present? ||
ENV["ANTHROPIC_API_KEY"].present? ||
Setting.anthropic_access_token.present?
end
def initialize(access_token, base_url: nil, model: nil)
client_options = { api_key: access_token }
client_options[:base_url] = base_url if base_url.present?
client_options[:timeout] = ENV.fetch("ANTHROPIC_REQUEST_TIMEOUT", 600).to_i
@client = ::Anthropic::Client.new(**client_options)
@base_url = base_url
if custom_endpoint? && model.blank?
raise Error, "Model is required when using a custom Anthropic-compatible endpoint"
end
@default_model = model.presence || DEFAULT_MODEL
end
def supports_model?(model)
# Custom endpoints (Bedrock, Vertex, or other Anthropic-compatible proxies)
# use their own model-ID conventions — e.g. Bedrock IDs look like
# `anthropic.claude-sonnet-4-5-20250929-v1:0`. Mirror the OpenAI provider
# and bypass the prefix gate when the caller has wired a custom base_url.
return true if custom_endpoint?
DEFAULT_ANTHROPIC_MODEL_PREFIXES.any? { |prefix| model.to_s.start_with?(prefix) }
end
def provider_name
custom_endpoint? ? "Custom Anthropic-compatible (#{@base_url})" : "Anthropic"
end
def supported_models_description
if custom_endpoint?
"configured model: #{@default_model}"
else
"models starting with: #{DEFAULT_ANTHROPIC_MODEL_PREFIXES.join(', ')}"
end
end
def custom_endpoint?
@base_url.present?
end
def auto_categorize(transactions: [], user_categories: [], model: "", family: nil, json_mode: nil)
with_provider_response do
raise Error, "Too many transactions to auto-categorize. Max is 25 per request." if transactions.size > 25
if user_categories.blank?
family_id = family&.id || "unknown"
Rails.logger.error("Cannot auto-categorize transactions for family #{family_id}: no categories available")
raise Error, "No categories available for auto-categorization"
end
effective_model = model.presence || @default_model
trace = create_langfuse_trace(
name: "anthropic.auto_categorize",
input: { transactions: transactions, user_categories: user_categories }
)
result = AutoCategorizer.new(
client,
model: effective_model,
transactions: transactions,
user_categories: user_categories,
langfuse_trace: trace,
family: family
).auto_categorize
upsert_langfuse_trace(trace: trace, output: result.map(&:to_h))
result
end
end
def auto_detect_merchants(transactions: [], user_merchants: [], model: "", family: nil, json_mode: nil)
with_provider_response do
raise Error, "Too many transactions to auto-detect merchants. Max is 25 per request." if transactions.size > 25
effective_model = model.presence || @default_model
trace = create_langfuse_trace(
name: "anthropic.auto_detect_merchants",
input: { transactions: transactions, user_merchants: user_merchants }
)
result = AutoMerchantDetector.new(
client,
model: effective_model,
transactions: transactions,
user_merchants: user_merchants,
langfuse_trace: trace,
family: family
).auto_detect_merchants
upsert_langfuse_trace(trace: trace, output: result.map(&:to_h))
result
end
end
def enhance_provider_merchants(merchants: [], model: "", family: nil, json_mode: nil)
with_provider_response do
raise Error, "Too many merchants to enhance. Max is 25 per request." if merchants.size > 25
effective_model = model.presence || @default_model
trace = create_langfuse_trace(
name: "anthropic.enhance_provider_merchants",
input: { merchants: merchants }
)
result = ProviderMerchantEnhancer.new(
client,
model: effective_model,
merchants: merchants,
langfuse_trace: trace,
family: family
).enhance_merchants
upsert_langfuse_trace(trace: trace, output: result.map(&:to_h))
result
end
end
def supports_pdf_processing?(model: @default_model)
return true if custom_endpoint?
VISION_CAPABLE_MODEL_PREFIXES.any? { |prefix| model.to_s.start_with?(prefix) }
end
def process_pdf(pdf_content:, model: "", family: nil)
raise Error, "process_pdf not yet implemented for Provider::Anthropic"
end
def extract_bank_statement(pdf_content:, model: "", family: nil)
raise Error, "extract_bank_statement not yet implemented for Provider::Anthropic"
end
def chat_response(
prompt,
model:,
instructions: nil,
functions: [],
function_results: [],
messages: nil,
conversation_history: [],
streamer: nil,
previous_response_id: nil,
session_id: nil,
user_identifier: nil,
family: nil
)
with_provider_response do
chat_config = ChatConfig.new(
prompt: prompt,
instructions: instructions,
functions: functions,
function_results: function_results,
conversation_history: conversation_history,
default_max_tokens: default_max_tokens
)
request_params = chat_config.build_request(model: model)
trace = create_langfuse_trace(
name: "anthropic.chat_response",
input: { messages: request_params[:messages], system: request_params[:system_] },
session_id: session_id,
user_identifier: user_identifier
)
partial_usage_recorded = false
begin
parsed, usage =
if streamer.present?
stream_chat_response(
streamer: streamer,
request_params: request_params,
on_partial: ->(partial_usage) {
record_llm_usage(family: family, model: model, operation: "chat", usage: partial_usage)
partial_usage_recorded = true
}
)
else
sync_chat_response(request_params: request_params)
end
log_langfuse_generation(
name: "chat_response",
model: model,
input: request_params[:messages],
output: parsed.messages.map(&:output_text).join("\n"),
usage: usage,
trace: trace
)
# Record once. On a normal stream `on_partial` never fires (it only runs
# from stream_chat_response's rescue on a mid-stream error, which
# re-raises past here), so today this is the sole recorder. Guard it
# anyway so a future change that emits partial usage on success can't
# silently double-bill — the symptom we chased in the #1984 review.
record_llm_usage(family: family, model: model, operation: "chat", usage: usage) unless partial_usage_recorded
parsed
rescue => e
log_langfuse_generation(
name: "chat_response",
model: model,
input: request_params[:messages],
error: e,
trace: trace
)
record_llm_usage(family: family, model: model, operation: "chat", error: e) unless partial_usage_recorded
raise
end
end
end
private
attr_reader :client
def default_max_tokens
ENV.fetch("ANTHROPIC_MAX_TOKENS", 4096).to_i
end
def sync_chat_response(request_params:)
raw = client.messages.create(**request_params)
parsed = ChatParser.new(raw).parsed
usage = build_usage_hash(raw.usage)
[ parsed, usage ]
end
def stream_chat_response(streamer:, request_params:, on_partial: nil)
final_message = nil
stream = client.messages.stream(**request_params)
# If `stream.each` raises mid-iteration (network drop, client abort),
# we still want to surface whatever tokens accumulated so the cost
# ledger doesn't lose partial-output billing.
begin
stream.each do |event|
case event
when ::Anthropic::Streaming::TextEvent
streamer.call(
Provider::LlmConcept::ChatStreamChunk.new(type: "output_text", data: event.text, usage: nil)
)
when ::Anthropic::Streaming::MessageStopEvent
final_message = event.message
end
end
rescue => mid_stream_error
partial = safe_accumulated_message(stream)
on_partial&.call(build_usage_hash(partial&.usage)) if partial
raise mid_stream_error
end
final_message ||= safe_accumulated_message(stream)
parsed = ChatParser.new(final_message).parsed
usage = build_usage_hash(final_message.usage)
streamer.call(
Provider::LlmConcept::ChatStreamChunk.new(type: "response", data: parsed, usage: usage)
)
[ parsed, usage ]
end
def safe_accumulated_message(stream)
stream.accumulated_message
rescue StandardError
nil
end
def build_usage_hash(raw_usage)
return {} unless raw_usage
input = raw_usage.input_tokens.to_i
output = raw_usage.output_tokens.to_i
hash = {
"input_tokens" => input,
"output_tokens" => output,
"total_tokens" => input + output
}
if raw_usage.respond_to?(:cache_creation_input_tokens) && raw_usage.cache_creation_input_tokens
hash["cache_creation_input_tokens"] = raw_usage.cache_creation_input_tokens
end
if raw_usage.respond_to?(:cache_read_input_tokens) && raw_usage.cache_read_input_tokens
hash["cache_read_input_tokens"] = raw_usage.cache_read_input_tokens
end
hash
end
def langfuse_client
return unless ENV["LANGFUSE_PUBLIC_KEY"].present? && ENV["LANGFUSE_SECRET_KEY"].present?
@langfuse_client ||= Langfuse.new
end
def create_langfuse_trace(name:, input:, session_id: nil, user_identifier: nil)
return unless langfuse_client
langfuse_client.trace(
name: name,
input: input,
session_id: session_id,
user_id: user_identifier,
environment: Rails.env
)
rescue => e
# Sanitized log (class + message only) — `e.full_message` bundles the
# backtrace + cause chain, which on some SDK error types includes the
# serialized request/response payload (model output, user prompt).
Rails.logger.warn("Langfuse trace creation failed: #{e.class}: #{e.message}")
nil
end
def log_langfuse_generation(name:, model:, input:, trace:, output: nil, usage: nil, error: nil)
return unless langfuse_client
generation = trace&.generation(
name: name,
model: model,
input: input
)
if error
generation&.end(
output: { error: error.message, details: error.respond_to?(:details) ? error.details : nil },
level: "ERROR"
)
upsert_langfuse_trace(trace: trace, output: { error: error.message }, level: "ERROR")
else
generation&.end(output: output, usage: usage)
upsert_langfuse_trace(trace: trace, output: output)
end
rescue => e
Rails.logger.warn("Langfuse logging failed: #{e.class}: #{e.message}")
end
def upsert_langfuse_trace(trace:, output:, level: nil)
return unless langfuse_client && trace&.id
payload = { id: trace.id, output: output }
payload[:level] = level if level.present?
langfuse_client.trace(**payload)
rescue => e
Rails.logger.warn("Langfuse trace upsert failed for trace_id=#{trace&.id}: #{e.class}: #{e.message}")
nil
end
def record_llm_usage(family:, model:, operation:, usage: nil, error: nil)
return unless family
if error.present?
http_status_code = extract_http_status_code(error)
family.llm_usages.create!(
provider: "anthropic",
model: model,
operation: operation,
prompt_tokens: 0,
completion_tokens: 0,
total_tokens: 0,
estimated_cost: nil,
metadata: {
error: safe_error_message(error),
http_status_code: http_status_code
}
)
return
end
return unless usage
prompt_tokens = usage["input_tokens"] || 0
completion_tokens = usage["output_tokens"] || 0
total_tokens = usage["total_tokens"] || (prompt_tokens + completion_tokens)
estimated_cost = LlmUsage.calculate_cost(
model: model,
prompt_tokens: prompt_tokens,
completion_tokens: completion_tokens,
cache_creation_tokens: usage["cache_creation_input_tokens"],
cache_read_tokens: usage["cache_read_input_tokens"]
)
family.llm_usages.create!(
provider: "anthropic",
model: model,
operation: operation,
prompt_tokens: prompt_tokens,
completion_tokens: completion_tokens,
total_tokens: total_tokens,
cache_creation_tokens: usage["cache_creation_input_tokens"],
cache_read_tokens: usage["cache_read_input_tokens"],
estimated_cost: estimated_cost,
metadata: {}
)
rescue => e
Rails.logger.error("Failed to record LLM usage: #{e.message}")
end
def extract_http_status_code(error)
if error.respond_to?(:status)
error.status
elsif error.respond_to?(:http_status)
error.http_status
elsif safe_error_message(error) =~ /(\d{3})/
$1.to_i
end
end
def safe_error_message(error)
error&.message
rescue => e
"(message unavailable: #{e.class})"
end
end