Files
sure/test/models/provider/anthropic/message_formatter_test.rb
Guillem Arias Fauste 8251b7e4d6 feat(ai): add Anthropic provider with chat parity (1/5) (#1983)
* feat(ai): add Anthropic provider with chat parity (1/5)

Introduces Provider::Anthropic alongside Provider::Openai, implementing
the LlmConcept chat_response contract over the official anthropic Ruby
SDK. Batch ops, PDF, and RAG land in follow-up PRs.

- Provider::Anthropic uses Messages API for sync and streaming responses
- ChatConfig builds requests with ephemeral prompt-cache markers on the
  system prompt and the last tool definition
- MessageFormatter reconstructs multi-turn history (text + tool_use +
  tool_result blocks) from raw Message records, including the paired
  user-role tool_result turn Anthropic requires after every tool_use
- ChatParser maps Anthropic Message into the shared ChatResponse Data
- Registry, Setting, User, Chat default model wired for ANTHROPIC_*
  envs and Setting.anthropic_*; LLM_PROVIDER selects between providers
- Responder forwards raw conversation_history (Array<Message>) so
  providers without hosted conversation state can rebuild context
- OpenAI provider accepts and ignores the new kwarg (no behavior change)

Tests cover provider init, model gating, MessageFormatter for all turn
shapes, ChatConfig request building (max_tokens, system cache, tool
conversion), ChatParser for text / tool_use / mixed blocks, Registry
discovery, and mocked chat_response success / error / function_request
paths. Live VCR cassettes recorded in a follow-up with a real key.

Stacked PRs: 2/5 batch ops + cost ledger, 3/5 PDF, 4/5 pgvector RAG,
5/5 settings UI + disclosure.

* fix(ai): address PR review on Anthropic provider foundation

Surface fixes raised by Codex + CodeRabbit on PR 1/5:

- Provider::Anthropic#chat_response now accepts (and ignores) a
  `messages:` kwarg. Assistant::Responder passes both `messages:`
  (OpenAI-shape) and `conversation_history:` (raw Message records) for
  cross-provider parity, so the previous signature raised
  ArgumentError on the first chat turn through the Anthropic provider.
- Provider::Anthropic#supports_model? bypasses the `claude` prefix
  gate when a custom base_url is configured, mirroring the OpenAI
  provider. Bedrock-shaped IDs like
  `anthropic.claude-sonnet-4-5-20250929-v1:0` and
  `claude-opus-4@20250514` are otherwise rejected by
  Assistant::Provided#get_model_provider and the chat dies.
- Setting.anthropic_access_token is now in
  EncryptedSettingFields::ENCRYPTED_FIELDS so the Anthropic API key
  is encrypted at rest like every other provider secret. Previously
  plaintext while siblings (openai_access_token, twelve_data_api_key,
  external_assistant_token) were ciphertext.
- Chat.default_model falls back to whichever provider is actually
  configured. Previously, with LLM_PROVIDER=anthropic but no
  Anthropic credentials, the default model resolved to a Claude ID
  that no registered provider supported, so chats failed even when
  OpenAI was fully configured. Adds Provider::{Anthropic,Openai}#configured?
  class methods for the readable callsite.
- Provider::Anthropic.effective_model uses
  `ENV["ANTHROPIC_MODEL"].presence || Setting.anthropic_model` so the
  Setting lookup is only performed when the env var is absent — the
  previous `ENV.fetch(KEY, default)` evaluated the default arg
  eagerly on every call.
- Provider::Anthropic::ChatConfig#anthropic_input_schema strips both
  `:strict` and `"strict"` keys so JSON-decoded schemas with string
  keys cannot leak the OpenAI-only flag through to Anthropic.

Test coverage added: supports_model? bypass on custom endpoints,
chat_response messages: kwarg compatibility, default_model fallback
in the three credential combinations, configured? against ENV +
Setting, strict-flag stripping for both key types, and a
`Setting.expects(:anthropic_model).never` assertion proving the
ENV-precedence test now exercises the lazy path.

All 4365 tests pass (1 pre-existing libvips env error unrelated).

* test(chat): make default_model tests resilient to ENV model overrides

CodeRabbit flagged on PR review: the new default_model tests asserted
against Provider::*::DEFAULT_MODEL, but Chat.default_model actually
returns Provider::*.effective_model.presence (which reads
OPENAI_MODEL / ANTHROPIC_MODEL from the environment). With either env
var set, the tests would fail intermittently even though routing was
correct.

- New default_model tests now assert against the provider's
  effective_model directly, so they verify the routing decision
  (which provider's value wins) without coupling to the constant.
- Pre-existing "creates with default model" assertions had the same
  brittleness; switch them to compare against Chat.default_model so
  the chosen model is whatever the env / Setting cascade resolves to.

Verified by running `ANTHROPIC_MODEL=claude-haiku-4-5 OPENAI_MODEL=gpt-4o
bin/rails test test/models/chat_test.rb` — 16 runs, 0 failures
(previously 2 pre-existing failures + 0 from the new tests).

* fix(ai): address local review on Anthropic foundation

- Provider::Anthropic#supports_pdf_processing? bypasses prefix gate for
  custom endpoints, mirroring supports_model?
- Provider::Anthropic#initialize raises Error when custom_endpoint? AND
  model.blank?, parity with Provider::Openai
- stream_chat_response captures partial usage on mid-stream errors and
  records it via the new on_partial callback so chat_response can skip
  the duplicate error row in the outer rescue
- safe_accumulated_message swallows the secondary failure when the SDK
  cannot reconstruct a snapshot
- langfuse_client memoizes properly (||= instead of =) so repeated calls
  don't churn Langfuse instances
- MessageFormatter sorts tool_calls by created_at then id so the
  message array is deterministic across replays; skips tool_calls
  missing both provider_call_id and provider_id rather than sending
  `id: nil` and getting rejected by Anthropic
- Setting.anthropic_access_token default falls back through
  ENV["ANTHROPIC_API_KEY"].presence (was missing .presence, so an
  empty-string env value bled through)
- User#openai_configured? / #anthropic_configured? delegate to the
  Provider::* class methods — single source of truth
- Assistant::Responder renames the OpenAI-shape history builder
  conversation_history → openai_messages_payload so the kwarg name
  matches the local method name (messages: openai_messages_payload,
  conversation_history: chat_message_records)
- Assistant::Builtin stale-history comment updated to reference both
  builders

Adds a streaming chat_response test using ad-hoc subclasses of the
SDK event types so the case/when dispatch matches via is_a? without
stubbing class-level === behavior.

* test(ai): add Anthropic tool_use round-trip + multi-tool turn coverage

Addresses @jjmata's "worth confirming" note on PR #1983: tool-use turns
from prior assistant messages must round-trip correctly when retrieved
from the database.

- New `ChatParser → ToolCall::Function → MessageFormatter` test walks
  the full path: Anthropic response with a tool_use block →
  ChatFunctionRequest → ToolCall::Function.from_function_request →
  persisted on the AssistantMessage → MessageFormatter rebuild on the
  next turn. Asserts the original `tool_use.id` is preserved end-to-end
  as both `tool_use.id` and the paired `tool_result.tool_use_id`, and
  that the original `input` hash and serialized result content survive.
- New multi-tool assistant turn test confirms two tool_use blocks on a
  single assistant message render as two tool_use blocks followed by
  two paired tool_result blocks in a single user-role follow-up,
  matching Anthropic's required alternation.

Both tests exercise the existing PR1 code without behavior changes.

* test(ai): require "ostruct" explicitly in Anthropic provider tests

OpenStruct is moving out of Ruby's default load path (warning in 3.4+,
removed in 3.5+). Tests work today because ActiveSupport transitively
loads it, but that's incidental. Match the existing convention in
test/controllers/settings/hostings_controller_test.rb which explicitly
requires ostruct for the same reason.

* fix(ai): sanitize Langfuse warn logs, normalize tool_use.input, dedup history fetch

Addresses three open CodeRabbit findings on PR #1983.

- Provider::Anthropic Langfuse rescue branches no longer include
  `e.full_message` in `Rails.logger.warn`. `full_message` bundles the
  backtrace + cause chain and on some SDK error types includes the
  serialized request/response payload (prompt, model output). Logs
  now report `#{e.class}: #{e.message}` only. Three sites:
  create_langfuse_trace, log_langfuse_generation, upsert_langfuse_trace.
  Note: Provider::Openai has the same pattern (copy-pasted source) —
  harmonization deferred to a follow-up cleanup PR; this commit fixes
  only the Anthropic provider to keep PR scope tight.

- MessageFormatter#parse_arguments now coerces any non-Hash parsed
  result to `{}`. Anthropic's Messages API requires `tool_use.input`
  to be a JSON object (map); a stored ToolCall::Function record whose
  arguments parse to a scalar, bool, or array (corrupt row, legacy
  data, cross-provider bleed) would otherwise produce a payload the
  API rejects. Normal flow stores Hash arguments end-to-end so the
  fix is defensive — adds 2 tests covering scalar/array JSON strings
  and non-String non-Hash inputs.

- Assistant::Responder dedups the chat-history fetch. The previous
  layout fired two near-identical `chat.messages.where(...).includes(
  :tool_calls).ordered` queries per LLM turn (one for the OpenAI-shape
  payload, one for the raw-records kwarg). A new memoized
  `complete_chat_messages` fetches once; `chat_message_records` filters
  out the current message via `Array#reject`, `openai_messages_payload`
  iterates the cached array unchanged. One SQL query per turn instead
  of two. Memoization scope = single Responder instance (per LLM call),
  so cache invalidation is not a concern.

All 4370 tests pass (1 pre-existing libvips env error unrelated).
Rubocop + brakeman clean.

* fix(ci): replace sk-ant- prefixed test placeholders

Pipelock secret scanner pattern-matches `sk-ant-*` as a real Anthropic
API key and fails the PR security-scan check. Test stubs and
ClimateControl env values used `sk-ant-test`, `sk-ant-from-setting`,
`sk-ant-x`, `sk-ant-y` as obvious placeholders, but the scanner does
not care about value entropy.

Switched to `fake-anthropic-key-*` / `fake-token-*` strings so the
scanner stops flagging them. No production code touched, no behavior
change — Provider::Anthropic still accepts any non-blank token.
2026-05-31 16:11:28 +02:00

238 lines
8.7 KiB
Ruby

require "test_helper"
require "ostruct"
class Provider::Anthropic::MessageFormatterTest < ActiveSupport::TestCase
test "builds a single user turn from prompt alone" do
formatter = Provider::Anthropic::MessageFormatter.new(prompt: "hi")
messages = formatter.build
assert_equal 1, messages.size
assert_equal({ role: "user", content: "hi" }, messages.first)
end
test "skips empty content from history" do
history = [ stub_user_message("") ]
messages = Provider::Anthropic::MessageFormatter.new(prompt: "next", conversation_history: history).build
assert_equal [ { role: "user", content: "next" } ], messages
end
test "renders text-only assistant history with no tool calls" do
history = [
stub_user_message("first question"),
stub_assistant_message("first answer")
]
messages = Provider::Anthropic::MessageFormatter.new(prompt: "second question", conversation_history: history).build
assert_equal({ role: "user", content: "first question" }, messages[0])
assert_equal "assistant", messages[1][:role]
assert_equal [ { type: "text", text: "first answer" } ], messages[1][:content]
assert_equal({ role: "user", content: "second question" }, messages[2])
end
test "renders assistant tool_call history with paired tool_result turn" do
tool_call = stub_tool_call(
id: "toolu_1",
name: "get_net_worth",
arguments: { "currency" => "USD" },
result: { "amount" => 12345, "currency" => "USD" }
)
assistant = stub_assistant_message("Your net worth is $12,345.", tool_calls: [ tool_call ])
history = [ stub_user_message("net worth?"), assistant ]
messages = Provider::Anthropic::MessageFormatter.new(prompt: "anything else?", conversation_history: history).build
assert_equal({ role: "user", content: "net worth?" }, messages[0])
assert_equal "assistant", messages[1][:role]
assert_equal "tool_use", messages[1][:content].first[:type]
assert_equal "toolu_1", messages[1][:content].first[:id]
assert_equal "get_net_worth", messages[1][:content].first[:name]
assert_equal({ "currency" => "USD" }, messages[1][:content].first[:input])
assert_equal "text", messages[1][:content].last[:type]
assert_equal "user", messages[2][:role]
assert_equal "tool_result", messages[2][:content].first[:type]
assert_equal "toolu_1", messages[2][:content].first[:tool_use_id]
assert_equal({ "amount" => 12345, "currency" => "USD" }.to_json, messages[2][:content].first[:content])
assert_equal({ role: "user", content: "anything else?" }, messages[3])
end
test "renders in-flight function_results as assistant tool_use + user tool_result" do
formatter = Provider::Anthropic::MessageFormatter.new(
prompt: "what is my net worth?",
function_results: [ {
call_id: "toolu_42",
name: "get_net_worth",
arguments: { "currency" => "USD" }.to_json,
output: { amount: 99, currency: "USD" }
} ]
)
messages = formatter.build
assert_equal({ role: "user", content: "what is my net worth?" }, messages[0])
assert_equal "assistant", messages[1][:role]
assert_equal "tool_use", messages[1][:content].first[:type]
assert_equal "toolu_42", messages[1][:content].first[:id]
assert_equal({ "currency" => "USD" }, messages[1][:content].first[:input])
assert_equal "user", messages[2][:role]
assert_equal "tool_result", messages[2][:content].first[:type]
assert_equal "toolu_42", messages[2][:content].first[:tool_use_id]
assert_includes messages[2][:content].first[:content], "99"
end
# Confirms the round-trip flagged in PR #1983 review: an Anthropic tool_use
# block returned by the model → ChatFunctionRequest → ToolCall::Function
# persisted on the AssistantMessage → MessageFormatter rebuild on the next
# turn produces an Anthropic-compatible history where tool_use_id pairs back
# to the original block.
test "ChatParser → ToolCall::Function → MessageFormatter round-trips tool_use_id" do
anthropic_response = OpenStruct.new(
id: "msg_abc",
model: "claude-sonnet-4-6",
content: [
OpenStruct.new(type: :tool_use, id: "toolu_round_trip", name: "get_net_worth", input: { "currency" => "USD" })
]
)
parsed = Provider::Anthropic::ChatParser.new(anthropic_response).parsed
function_request = parsed.function_requests.first
persisted_tool_call = ToolCall::Function.from_function_request(
function_request,
{ "amount" => 12345, "currency" => "USD" }
)
assistant = stub_assistant_message("Your net worth is $12,345.", tool_calls: [ persisted_tool_call ])
history = [ stub_user_message("net worth?"), assistant ]
rebuilt = Provider::Anthropic::MessageFormatter.new(prompt: "follow-up", conversation_history: history).build
tool_use_block = rebuilt[1][:content].find { |b| b[:type] == "tool_use" }
tool_result_block = rebuilt[2][:content].first
assert_equal "toolu_round_trip", tool_use_block[:id]
assert_equal "toolu_round_trip", tool_result_block[:tool_use_id]
assert_equal({ "currency" => "USD" }, tool_use_block[:input])
assert_equal({ "amount" => 12345, "currency" => "USD" }.to_json, tool_result_block[:content])
end
test "renders multi-tool assistant turn with all pairings preserved" do
tool_a = stub_tool_call(
id: "toolu_a",
name: "get_accounts",
arguments: {},
result: [ { "id" => 1, "name" => "Checking" } ]
)
tool_b = stub_tool_call(
id: "toolu_b",
name: "get_holdings",
arguments: {},
result: [ { "ticker" => "VTI", "qty" => 10 } ]
)
assistant = stub_assistant_message("Looked up your accounts and holdings.", tool_calls: [ tool_a, tool_b ])
messages = Provider::Anthropic::MessageFormatter.new(
prompt: "follow-up",
conversation_history: [ stub_user_message("accounts and holdings?"), assistant ]
).build
tool_uses = messages[1][:content].select { |b| b[:type] == "tool_use" }
tool_results = messages[2][:content]
assert_equal 2, tool_uses.size
assert_equal 2, tool_results.size
assert_equal [ "toolu_a", "toolu_b" ], tool_uses.map { |b| b[:id] }
assert_equal [ "toolu_a", "toolu_b" ], tool_results.map { |b| b[:tool_use_id] }
# Anthropic requires the user turn to follow the assistant turn that used tools
assert_equal "assistant", messages[1][:role]
assert_equal "user", messages[2][:role]
end
test "parses string arguments and nil outputs gracefully" do
formatter = Provider::Anthropic::MessageFormatter.new(
prompt: "go",
function_results: [ {
call_id: "toolu_x",
name: "noop",
arguments: "",
output: nil
} ]
)
messages = formatter.build
assert_equal({}, messages[1][:content].first[:input])
assert_equal "", messages[2][:content].first[:content]
end
# Anthropic's tool_use.input MUST be a JSON object (map). If a stored
# ToolCall::Function record carries arguments that parse to a scalar or
# array (corrupt row, legacy data, OpenAI cross-bleed), the formatter
# must coerce them to `{}` so we don't ship an invalid payload.
test "coerces non-Hash parsed arguments to empty Hash" do
[ '"hello"', "123", "true", "[1,2,3]" ].each do |non_object_json|
formatter = Provider::Anthropic::MessageFormatter.new(
prompt: "go",
function_results: [ {
call_id: "toolu_x",
name: "noop",
arguments: non_object_json,
output: nil
} ]
)
messages = formatter.build
assert_equal({}, messages[1][:content].first[:input],
"expected empty Hash for arguments=#{non_object_json.inspect}")
end
end
test "coerces non-Hash non-String arguments to empty Hash" do
formatter = Provider::Anthropic::MessageFormatter.new(
prompt: "go",
function_results: [ {
call_id: "toolu_x",
name: "noop",
arguments: [ 1, 2, 3 ],
output: nil
} ]
)
messages = formatter.build
assert_equal({}, messages[1][:content].first[:input])
end
private
def stub_user_message(content)
msg = UserMessage.new(content: content, ai_model: "claude-sonnet-4-6")
msg.id = SecureRandom.uuid
msg
end
def stub_assistant_message(content, tool_calls: [])
msg = AssistantMessage.new(content: content, ai_model: "claude-sonnet-4-6")
msg.id = SecureRandom.uuid
msg.stubs(:tool_calls).returns(tool_calls)
msg
end
def stub_tool_call(id:, name:, arguments:, result:)
tc = ToolCall::Function.new(
function_name: name,
function_arguments: arguments,
function_result: result
)
tc.stubs(:provider_call_id).returns(id)
tc.stubs(:provider_id).returns(id)
tc
end
end