Feature/pdf import transaction rows (#846)

* Add import row generation from PDF extracted data

- Add generate_rows_from_extracted_data method to PdfImport
- Add import! method to create transactions from PDF rows
- Update ProcessPdfJob to generate rows after extraction
- Update configured?, cleaned?, publishable? for PDF workflow
- Add column_keys, required_column_keys, mapping_steps
- Set bank statements to pending status for user review
- Add tests for new functionality

Closes #844

* Add tests for BankStatementExtractor

- Test transaction extraction from PDF content
- Test deduplication across chunk boundaries
- Test amount normalization for various formats
- Test graceful handling of malformed JSON responses
- Test error handling for empty/nil PDF content

* Fix supports_pdf_processing? to validate effective model

The validation was always checking @default_model, but process_pdf
allows overriding the model via parameter. This could cause a
vision-capable override model to be rejected, or a non-vision-capable
override to pass validation only to fail during processing.

Changes:
- supports_pdf_processing? now accepts optional model parameter
- process_pdf passes effective model to validation
- Raise Provider::Openai::Error inside with_provider_response for
  consistent error handling

Addresses review feedback from PR#808

* Fix insert_all! bug: explicitly set import_id

Rails insert_all! on associations does NOT auto-set the foreign key.
Added import_id explicitly and use Import::Row.insert_all! directly.
Also reload rows before counting to ensure accurate count.

* Fix pending status showing as processing for bank statements with rows

When bank statement PDF imports have extracted rows, show a 'Ready for Review'
screen with a link to the confirm path instead of the 'Processing' spinner.

This addresses the PR feedback that users couldn't reach the review flow even
though rows were created.

* Gate publishable? on account.present? to prevent import failure

PDF imports are created without an account, and import! raises if account
is missing. This prevents users from hitting publish and having the job fail.

* Wrap generate_rows_from_extracted_data in transaction for atomicity

- Clear rows and reset count even when no transactions extracted
- Use transaction block to prevent partial updates on failure
- Use mapped_rows.size instead of reload for count

* Localize transactions count string with i18n helper

* Add AccountMapping step for PDF imports when account is nil

PDF imports need account selection before publishing. This adds
Import::AccountMapping to mapping_steps when account is nil,
matching the behavior of TransactionImport and TradeImport.

Addresses PR#846 feedback about account selection for PDF imports.

* Only include CategoryMapping when rows have non-empty categories

PDF extraction doesn't extract categories from bank statements,
so the CategoryMapping step would show empty. Now we only include
CategoryMapping if rows actually have non-empty category values.

This prevents showing an empty mapping step for PDF imports.

* Fix PDF import UI flow and account selection

- Add direct account selection in PDF import UI instead of AccountMapping
- AccountMapping designed for CSV imports with multiple account values
- PDF imports need single account for all transactions
- Add update action and route for imports controller
- Fix controller to handle pdf_import param format from form_with
- Show Publish button when import is publishable (account set)
- Fix stepper nav: Upload/Configure/Clean non-clickable for PDF imports
- Redirect PDF imports from configuration step (auto-configured)
- Improve AI prompt to recognize M-PESA/mobile money as bank statements
- Fix migration ordering for import_rows table columns

* Add guard for invalid account_id in imports#update

Prevents silently clearing account when invalid ID is passed.
Returns error message instead of confusing 'Account saved' notice.

* Localize step names in import nav and add account guard

- Use t() helper for all step names (Upload, Configure, Clean, Map, Confirm)
- Add guard for invalid account_id in imports#update
- Prevents silently clearing account when invalid ID is passed

* Make category column migrations idempotent

Check if columns exist before adding to prevent duplicate column
errors when migrations are re-run with new timestamps.

* Add match_path for PDF import step highlighting

Fixes step detection when path is nil by using separate match_path
for current step highlighting while keeping links disabled.

* Rename category migrations and update to Rails 7.2

- Rename class to EnsureCategoryFieldsOnImportRows to avoid conflicts
- Rename class to EnsureCategoryIconOnImportRows
- Update migration version from 7.1 to 7.2 per guidelines
- Rename files to match class names
- Add match_path for PDF import step highlighting

* Use primary (black) style for Create Account and Save buttons

* Remove match_path from auto-completed PDF steps

Only step 4 (Confirm) needs match_path for active-step detection.
Steps 1-3 are purely informational and always complete.

* Add fallback for document type translation

Handles nil or unexpected document_type values gracefully.
Also removes match_path from auto-completed PDF steps.

* Use index-based step number for mobile indicator

Fixes 'Step 5 of 4' issue when Map step is dynamically removed.

* Fix hostings_controller_test: use blank? instead of nil

Setting returns empty string not nil for unset values.

* Localize step progress label and use design token

* Fix button styling: use design system Tailwind classes

btn--primary and btn--secondary CSS classes don't exist.
Use actual design system classes from DS::Buttonish.

* Fix CRLF line endings in tags_controller_test.rb

---------

Co-authored-by: mkdev11 <jaysmth689+github@users.noreply.github.com>
This commit is contained in:
MkDev11
2026-02-02 10:27:02 -05:00
committed by GitHub
parent 146d6203fd
commit 0afdb1d0fd
17 changed files with 515 additions and 53 deletions

View File

@@ -89,7 +89,7 @@ class Settings::HostingsControllerTest < ActionDispatch::IntegrationTest
assert_response :unprocessable_entity
assert_match(/OpenAI model is required/, flash[:alert])
assert_nil Setting.openai_uri_base
assert Setting.openai_uri_base.blank?, "Expected openai_uri_base to remain blank after failed validation"
end
end

View File

@@ -24,3 +24,24 @@ pdf_processed:
status: complete
ai_summary: "This is a bank statement from Chase Bank for the period January 1-31, 2024. It shows 15 transactions with an opening balance of $5,000 and closing balance of $4,500."
document_type: bank_statement
pdf_with_rows:
family: dylan_family
account: checking
type: PdfImport
status: pending
ai_summary: "Bank statement with extracted transactions"
document_type: bank_statement
extracted_data:
transactions:
- date: "2024-01-15"
amount: -50.00
name: "Coffee Shop"
category: "Food & Drink"
notes: "Morning coffee"
- date: "2024-01-16"
amount: 1500.00
name: "Salary"
category: "Income"
notes: ""
rows_count: 2

View File

@@ -6,6 +6,7 @@ class PdfImportTest < ActiveSupport::TestCase
setup do
@import = imports(:pdf)
@processed_import = imports(:pdf_processed)
@import_with_rows = imports(:pdf_with_rows)
end
test "pdf_uploaded? returns false when no file attached" do
@@ -24,27 +25,28 @@ class PdfImportTest < ActiveSupport::TestCase
assert_not @import.uploaded?
end
test "configured? returns true when AI processed" do
assert @processed_import.configured?
test "configured? requires AI processed and rows" do
assert_not @import.configured?
assert_not @processed_import.configured?
assert @import_with_rows.configured?
end
test "cleaned? returns true when AI processed" do
assert @processed_import.cleaned?
test "cleaned? requires configured and valid rows" do
assert_not @import.cleaned?
assert_not @processed_import.cleaned?
end
test "publishable? always returns false for PDF imports" do
test "publishable? requires bank statement with cleaned rows and valid mappings" do
assert_not @import.publishable?
assert_not @processed_import.publishable?
end
test "column_keys returns empty array" do
assert_equal [], @import.column_keys
test "column_keys returns transaction columns" do
assert_equal %i[date amount name category notes], @import.column_keys
end
test "required_column_keys returns empty array" do
assert_equal [], @import.required_column_keys
test "required_column_keys returns date and amount" do
assert_equal %i[date amount], @import.required_column_keys
end
test "document_type validates against allowed types" do
@@ -66,4 +68,68 @@ class PdfImportTest < ActiveSupport::TestCase
@import.process_with_ai_later
end
end
test "generate_rows_from_extracted_data creates import rows" do
import = imports(:pdf_with_rows)
import.rows.destroy_all
import.update_column(:rows_count, 0)
import.generate_rows_from_extracted_data
assert_equal 2, import.rows.count
assert_equal 2, import.rows_count
coffee_row = import.rows.find_by(name: "Coffee Shop")
assert_not_nil coffee_row
assert_equal "-50.0", coffee_row.amount
assert_equal "Food & Drink", coffee_row.category
salary_row = import.rows.find_by(name: "Salary")
assert_not_nil salary_row
assert_equal "1500.0", salary_row.amount
end
test "generate_rows_from_extracted_data does nothing without extracted transactions" do
@import.generate_rows_from_extracted_data
assert_equal 0, @import.rows.count
end
test "extracted_transactions returns transactions from extracted_data" do
assert_equal 2, @import_with_rows.extracted_transactions.size
assert_equal "Coffee Shop", @import_with_rows.extracted_transactions.first["name"]
end
test "extracted_transactions returns empty array when no data" do
assert_equal [], @import.extracted_transactions
end
test "has_extracted_transactions? returns true with transactions" do
assert @import_with_rows.has_extracted_transactions?
end
test "has_extracted_transactions? returns false without transactions" do
assert_not @import.has_extracted_transactions?
end
test "mapping_steps is empty when no categories in rows" do
# PDF imports use direct account selection in UI, not AccountMapping
assert_equal [], @import.mapping_steps
end
test "mapping_steps includes CategoryMapping when rows have categories" do
@import_with_rows.rows.create!(
date: "01/15/2024",
amount: -50.00,
currency: "USD",
name: "Test Transaction",
category: "Groceries"
)
assert_equal [ Import::CategoryMapping ], @import_with_rows.mapping_steps
end
test "mapping_steps does not include AccountMapping even when account is nil" do
# PDF imports handle account selection via direct UI, not mapping system
assert_nil @import.account
assert_not_includes @import.mapping_steps, Import::AccountMapping
end
end

View File

@@ -0,0 +1,197 @@
require "test_helper"
class Provider::Openai::BankStatementExtractorTest < ActiveSupport::TestCase
setup do
@client = mock("openai_client")
@model = "gpt-4.1"
end
test "extracts transactions from PDF content" do
mock_response = {
"choices" => [ {
"message" => {
"content" => {
"bank_name" => "Test Bank",
"account_holder" => "John Doe",
"account_number" => "1234",
"statement_period" => {
"start_date" => "2024-01-01",
"end_date" => "2024-01-31"
},
"opening_balance" => 5000.00,
"closing_balance" => 4500.00,
"transactions" => [
{ "date" => "2024-01-15", "description" => "Coffee Shop", "amount" => -5.50 },
{ "date" => "2024-01-20", "description" => "Salary Deposit", "amount" => 3000.00 }
]
}.to_json
}
} ]
}
@client.expects(:chat).returns(mock_response)
extractor = Provider::Openai::BankStatementExtractor.new(
client: @client,
pdf_content: "dummy",
model: @model
)
# Mock the PDF text extraction
extractor.stubs(:extract_pages_from_pdf).returns([ "Page 1 bank statement text" ])
result = extractor.extract
assert_equal "Test Bank", result[:bank_name]
assert_equal "John Doe", result[:account_holder]
assert_equal "1234", result[:account_number]
assert_equal 5000.00, result[:opening_balance]
assert_equal 4500.00, result[:closing_balance]
assert_equal 2, result[:transactions].size
first_txn = result[:transactions].first
assert_equal "2024-01-15", first_txn[:date]
assert_equal "Coffee Shop", first_txn[:name]
assert_equal(-5.50, first_txn[:amount])
end
test "handles empty PDF content" do
extractor = Provider::Openai::BankStatementExtractor.new(
client: @client,
pdf_content: "",
model: @model
)
assert_raises(Provider::Openai::Error) do
extractor.extract
end
end
test "handles nil PDF content" do
extractor = Provider::Openai::BankStatementExtractor.new(
client: @client,
pdf_content: nil,
model: @model
)
assert_raises(Provider::Openai::Error) do
extractor.extract
end
end
test "deduplicates transactions across chunk boundaries" do
# First chunk response
first_response = {
"choices" => [ {
"message" => {
"content" => {
"bank_name" => "Test Bank",
"account_holder" => "John Doe",
"account_number" => "1234",
"statement_period" => { "start_date" => "2024-01-01", "end_date" => "2024-01-31" },
"opening_balance" => 5000.00,
"closing_balance" => 4500.00,
"transactions" => [
{ "date" => "2024-01-15", "description" => "Coffee Shop", "amount" => -5.50 },
{ "date" => "2024-01-16", "description" => "Grocery Store", "amount" => -50.00 }
]
}.to_json
}
} ]
}
# Second chunk response with duplicate at boundary
second_response = {
"choices" => [ {
"message" => {
"content" => {
"transactions" => [
{ "date" => "2024-01-16", "description" => "Grocery Store", "amount" => -50.00 },
{ "date" => "2024-01-17", "description" => "Gas Station", "amount" => -40.00 }
]
}.to_json
}
} ]
}
@client.expects(:chat).twice.returns(first_response, second_response)
extractor = Provider::Openai::BankStatementExtractor.new(
client: @client,
pdf_content: "dummy",
model: @model
)
# Mock multiple pages that will create multiple chunks
extractor.stubs(:extract_pages_from_pdf).returns([
"Page 1 " * 500, # ~3500 chars, first chunk
"Page 2 " * 500 # ~3500 chars, second chunk
])
result = extractor.extract
# Should deduplicate the "Grocery Store" transaction at chunk boundary
assert_equal 3, result[:transactions].size
names = result[:transactions].map { |t| t[:name] }
assert_includes names, "Coffee Shop"
assert_includes names, "Grocery Store"
assert_includes names, "Gas Station"
end
test "normalizes transaction amounts" do
mock_response = {
"choices" => [ {
"message" => {
"content" => {
"transactions" => [
{ "date" => "2024-01-15", "description" => "Test 1", "amount" => "-$5.50" },
{ "date" => "2024-01-16", "description" => "Test 2", "amount" => "1,234.56" },
{ "date" => "2024-01-17", "description" => "Test 3", "amount" => -100 }
]
}.to_json
}
} ]
}
@client.expects(:chat).returns(mock_response)
extractor = Provider::Openai::BankStatementExtractor.new(
client: @client,
pdf_content: "dummy",
model: @model
)
extractor.stubs(:extract_pages_from_pdf).returns([ "Page 1 text" ])
result = extractor.extract
assert_equal(-5.50, result[:transactions][0][:amount])
assert_equal 1234.56, result[:transactions][1][:amount]
assert_equal(-100.0, result[:transactions][2][:amount])
end
test "handles malformed JSON response gracefully" do
mock_response = {
"choices" => [ {
"message" => {
"content" => "This is not valid JSON"
}
} ]
}
@client.expects(:chat).returns(mock_response)
extractor = Provider::Openai::BankStatementExtractor.new(
client: @client,
pdf_content: "dummy",
model: @model
)
extractor.stubs(:extract_pages_from_pdf).returns([ "Page 1 text" ])
result = extractor.extract
# Should return empty transactions on parse error
assert_equal [], result[:transactions]
end
end