Rebase PR #784 and fix OpenAI model/chat regressions (#1384)

* Wire conversation history through OpenAI responses API * Fix RuboCop hash brace spacing in assistant tests * Pipelock ignores * Batch fixes --------- Co-authored-by: sokiee <sokysrm@gmail.com>
2026-05-07 12:54:04 +00:00 · 2026-04-15 18:45:24 +02:00
parent 53ea0375db
commit 7b2b1dd367
24 changed files with 937 additions and 90 deletions
--- a/.env.local.example
+++ b/.env.local.example
@@ -28,8 +28,22 @@ TWELVE_DATA_API_KEY =
 OPENAI_ACCESS_TOKEN =
 OPENAI_URI_BASE =
 OPENAI_MODEL =
-# OPENAI_REQUEST_TIMEOUT: Request timeout in seconds (default: 60)
-# OPENAI_SUPPORTS_PDF_PROCESSING: Set to false for endpoints without vision support (default: true)
+
+# LLM token budget. Applies to ALL outbound LLM calls: chat history,
+# auto-categorize, merchant detection, provider enhancer, PDF processing.
+# Defaults to Ollama's historical 2048-token baseline so small local models
+# work out of the box — raise explicitly for cloud or larger-context models.
+# LLM_CONTEXT_WINDOW = 2048          # Total tokens the model will accept
+# LLM_MAX_RESPONSE_TOKENS = 512      # Reserved for the model's reply
+# LLM_MAX_HISTORY_TOKENS =           # Derived if unset (context - response - system_reserve)
+# LLM_SYSTEM_PROMPT_RESERVE = 256    # Tokens reserved for the system prompt
+# LLM_MAX_ITEMS_PER_CALL = 25        # Upper bound on auto-categorize / merchant batches
+
+# OpenAI-compatible capability flags (custom/self-hosted providers)
+# OPENAI_REQUEST_TIMEOUT = 60                # HTTP timeout in seconds; raise for slow local models
+# OPENAI_SUPPORTS_PDF_PROCESSING = true      # Set to false for endpoints without vision support
+# OPENAI_SUPPORTS_RESPONSES_ENDPOINT =       # true to force Responses API on custom providers
+# LLM_JSON_MODE =                            # auto | strict | json_object | none

 # (example: LM Studio/Docker config) OpenAI-compatible API endpoint config
 # OPENAI_URI_BASE = http://host.docker.internal:1234/