Files
sure/app/models/enable_banking_item/importer.rb
0xRozier 005d2fac20 Fix/issue 954 enable banking duplicate transactions (#988)
* fix: deduplicate Enable Banking API transactions with different entry_reference IDs (#954)

Enable Banking API sometimes returns the same logical transaction multiple
times with different entry_reference values in a single response. This causes
duplicate entries because the existing ID-based deduplication treats them as
distinct transactions.

Add content-based deduplication that compares (date, amount, currency,
creditor, debtor, remittance_information, status) to detect and remove these
API-level duplicates before storing them. The first occurrence is kept.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Enable Banking processor and importer deduplication tests (#954)

Add tests for:
- EnableBankingEntry::Processor: verifies entry_reference fallback for
  external_id, idempotent re-processing, string key handling
- EnableBankingItem::Importer: verifies content-based deduplication removes
  API-level duplicates while preserving legitimate distinct transactions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle nil values in remittance_information array for dedup key (#954)

Call compact and map(&:to_s) before sort.join when remittance_information
is an array, preventing ArgumentError when it contains nil elements.
Also document the known limitation of content-based deduplication
collapsing genuinely distinct identical transactions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add coverage for nil values in remittance_information array (#954)

Verify that deduplication handles remittance_information arrays containing
nil elements without raising ArgumentError, and correctly treats arrays
with different nil positions but same non-nil content as duplicates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: prefer transaction_id over content-based dedup to preserve legit duplicates (#954)

When transaction_id is present, use it as the dedup key instead of falling
back to content-based deduplication. This preserves legitimately distinct
transactions with identical content (e.g. two laundromat payments of the
same amount on the same day) while still deduplicating the Enable Banking
duplicate entry_reference issue when transaction_id is nil.

Addresses review feedback from @jjmata about legitimate duplicate
transactions being incorrectly collapsed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use composite key for dedup instead of transaction_id alone (#954)

Per the Enable Banking API docs, transaction_id is not guaranteed to be
unique. Include it as one component of the composite content key rather
than using it as the sole dedup criterion. This preserves transactions
with non-unique transaction_ids but different content, while still
deduplicating true API-level duplicates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add value_date fallback coverage for dedup key (#954)

build_transaction_content_key falls back to value_date when booking_date
is absent. This test exercises that path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: document known limitation of content-based dedup (#954)

When transaction_id is nil for both transactions, pure content comparison
applies, which could theoretically collapse two genuinely distinct
transactions with identical fields. Document this trade-off inline for
future maintainers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add credit_debit_indicator to dedup composite key (#954)

transaction_amount.amount is always positive in the Enable Banking API,
with direction encoded separately in credit_debit_indicator (CRDT/DBIT).
Without it in the composite key, a payment and a same-day refund of the
same amount to the same merchant would produce identical keys, silently
dropping one transaction.

- Add credit_debit_indicator to build_transaction_content_key
- Add test for payment + same-day refund scenario
- Update docstring to document the rationale

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 16:53:30 +01:00

351 lines
15 KiB
Ruby

class EnableBankingItem::Importer
# Maximum number of pagination requests to prevent infinite loops
# Enable Banking typically returns ~100 transactions per page, so 100 pages = ~10,000 transactions
MAX_PAGINATION_PAGES = 100
attr_reader :enable_banking_item, :enable_banking_provider
def initialize(enable_banking_item, enable_banking_provider:)
@enable_banking_item = enable_banking_item
@enable_banking_provider = enable_banking_provider
end
def import
unless enable_banking_item.session_valid?
enable_banking_item.update!(status: :requires_update)
return { success: false, error: "Session expired or invalid", accounts_updated: 0, transactions_imported: 0 }
end
session_data = fetch_session_data
unless session_data
error_msg = @session_error || "Failed to fetch session data"
return { success: false, error: error_msg, accounts_updated: 0, transactions_imported: 0 }
end
# Store raw payload
begin
enable_banking_item.upsert_enable_banking_snapshot!(session_data)
rescue => e
Rails.logger.error "EnableBankingItem::Importer - Failed to store session snapshot: #{e.message}"
end
# Update accounts from session
accounts_updated = 0
accounts_failed = 0
if session_data[:accounts].present?
existing_uids = enable_banking_item.enable_banking_accounts
.joins(:account_provider)
.pluck(:uid)
.map(&:to_s)
# Enable Banking API returns accounts as an array of UIDs (strings) in the session response
# We need to handle both array of strings and array of hashes
session_data[:accounts].each do |account_data|
# Handle both string UIDs and hash objects
# Use identification_hash as the stable identifier across sessions
uid = if account_data.is_a?(String)
account_data
elsif account_data.is_a?(Hash)
(account_data[:identification_hash] || account_data[:uid] || account_data["identification_hash"] || account_data["uid"])&.to_s
else
nil
end
next unless uid.present?
# Only update if this account was previously linked
next unless existing_uids.include?(uid)
begin
# For string UIDs, we don't have account data to update - skip the import_account call
# The account data will be fetched via balances/transactions endpoints
if account_data.is_a?(Hash)
import_account(account_data)
accounts_updated += 1
end
rescue => e
accounts_failed += 1
Rails.logger.error "EnableBankingItem::Importer - Failed to update account #{uid}: #{e.message}"
end
end
end
# Fetch balances and transactions for linked accounts
transactions_imported = 0
transactions_failed = 0
linked_accounts_query = enable_banking_item.enable_banking_accounts.joins(:account_provider).joins(:account).merge(Account.visible)
linked_accounts_query.each do |enable_banking_account|
begin
fetch_and_update_balance(enable_banking_account)
result = fetch_and_store_transactions(enable_banking_account)
if result[:success]
transactions_imported += result[:transactions_count]
else
transactions_failed += 1
end
rescue => e
transactions_failed += 1
Rails.logger.error "EnableBankingItem::Importer - Failed to process account #{enable_banking_account.uid}: #{e.message}"
end
end
result = {
success: accounts_failed == 0 && transactions_failed == 0,
accounts_updated: accounts_updated,
accounts_failed: accounts_failed,
transactions_imported: transactions_imported,
transactions_failed: transactions_failed
}
if !result[:success] && (accounts_failed > 0 || transactions_failed > 0)
parts = []
parts << "#{accounts_failed} #{'account'.pluralize(accounts_failed)} failed" if accounts_failed > 0
parts << "#{transactions_failed} #{'transaction'.pluralize(transactions_failed)} failed" if transactions_failed > 0
result[:error] = parts.join(", ")
end
result
end
private
def extract_friendly_error_message(exception)
[ exception, exception.cause ].compact.each do |ex|
case ex
when SocketError then return "DNS resolution failed: check your network/DNS configuration"
when Net::OpenTimeout, Net::ReadTimeout then return "Connection timed out: the Enable Banking API may be unreachable"
when Errno::ECONNREFUSED then return "Connection refused: the Enable Banking API is unreachable"
end
end
msg = exception.message.to_s
return "DNS resolution failed: check your network/DNS configuration" if msg.include?("getaddrinfo") || msg.match?(/name or service not known/i)
return "Connection timed out: the Enable Banking API may be unreachable" if msg.include?("execution expired") || msg.include?("timeout") || msg.match?(/timed out/i)
return "Connection refused: the Enable Banking API is unreachable" if msg.include?("ECONNREFUSED") || msg.match?(/connection refused/i)
msg
end
def fetch_session_data
enable_banking_provider.get_session(session_id: enable_banking_item.session_id)
rescue Provider::EnableBanking::EnableBankingError => e
if e.error_type == :unauthorized || e.error_type == :not_found
enable_banking_item.update!(status: :requires_update)
end
Rails.logger.error "EnableBankingItem::Importer - Enable Banking API error: #{e.message}"
@session_error = extract_friendly_error_message(e)
nil
rescue => e
Rails.logger.error "EnableBankingItem::Importer - Unexpected error fetching session: #{e.class} - #{e.message}"
@session_error = extract_friendly_error_message(e)
nil
end
def import_account(account_data)
# Use identification_hash as the stable identifier across sessions
uid = account_data[:identification_hash] || account_data[:uid]
enable_banking_account = enable_banking_item.enable_banking_accounts.find_by(uid: uid.to_s)
return unless enable_banking_account
enable_banking_account.upsert_enable_banking_snapshot!(account_data)
enable_banking_account.save!
end
def fetch_and_update_balance(enable_banking_account)
balance_data = enable_banking_provider.get_account_balances(account_id: enable_banking_account.api_account_id)
# Enable Banking returns an array of balances
balances = balance_data[:balances] || []
return if balances.empty?
# Find the most relevant balance (prefer "ITAV" or "CLAV" types)
balance = balances.find { |b| b[:balance_type] == "ITAV" } ||
balances.find { |b| b[:balance_type] == "CLAV" } ||
balances.find { |b| b[:balance_type] == "ITBD" } ||
balances.find { |b| b[:balance_type] == "CLBD" } ||
balances.first
if balance.present?
amount = balance.dig(:balance_amount, :amount) || balance[:amount]
currency = balance.dig(:balance_amount, :currency) || balance[:currency]
if amount.present?
enable_banking_account.update!(
current_balance: amount.to_d,
currency: currency.presence || enable_banking_account.currency
)
end
end
rescue Provider::EnableBanking::EnableBankingError => e
Rails.logger.error "EnableBankingItem::Importer - Error fetching balance for account #{enable_banking_account.uid}: #{e.message}"
end
def fetch_and_store_transactions(enable_banking_account)
start_date = determine_sync_start_date(enable_banking_account)
all_transactions = []
continuation_key = nil
previous_continuation_key = nil
page_count = 0
# Paginate through all transactions with safeguards against infinite loops
loop do
page_count += 1
# Safeguard: prevent infinite loops from excessive pagination
if page_count > MAX_PAGINATION_PAGES
Rails.logger.error(
"EnableBankingItem::Importer - Pagination limit exceeded for account #{enable_banking_account.uid}. " \
"Stopped after #{MAX_PAGINATION_PAGES} pages (#{all_transactions.count} transactions). " \
"Last continuation_key: #{continuation_key.inspect}"
)
break
end
transactions_data = enable_banking_provider.get_account_transactions(
account_id: enable_banking_account.api_account_id,
date_from: start_date,
continuation_key: continuation_key
)
transactions = transactions_data[:transactions] || []
all_transactions.concat(transactions)
previous_continuation_key = continuation_key
continuation_key = transactions_data[:continuation_key]
# Safeguard: detect repeated continuation_key (provider returning same key)
if continuation_key.present? && continuation_key == previous_continuation_key
Rails.logger.error(
"EnableBankingItem::Importer - Repeated continuation_key detected for account #{enable_banking_account.uid}. " \
"Breaking loop after #{page_count} pages (#{all_transactions.count} transactions). " \
"Repeated key: #{continuation_key.inspect}, last response had #{transactions.count} transactions"
)
break
end
break if continuation_key.blank?
end
# Deduplicate API response: Enable Banking sometimes returns the same logical
# transaction with different entry_reference IDs in the same response.
# Remove content-level duplicates before storing. (Issue #954)
all_transactions = deduplicate_api_transactions(all_transactions)
transactions_count = all_transactions.count
if all_transactions.any?
existing_transactions = enable_banking_account.raw_transactions_payload.to_a
existing_ids = existing_transactions.map { |tx|
tx = tx.with_indifferent_access
tx[:transaction_id].presence || tx[:entry_reference].presence
}.compact.to_set
new_transactions = all_transactions.select do |tx|
# Use transaction_id if present, otherwise fall back to entry_reference
tx_id = tx[:transaction_id].presence || tx[:entry_reference].presence
tx_id.present? && !existing_ids.include?(tx_id)
end
if new_transactions.any?
enable_banking_account.upsert_enable_banking_transactions_snapshot!(existing_transactions + new_transactions)
end
end
{ success: true, transactions_count: transactions_count }
rescue Provider::EnableBanking::EnableBankingError => e
Rails.logger.error "EnableBankingItem::Importer - Error fetching transactions for account #{enable_banking_account.uid}: #{e.message}"
{ success: false, transactions_count: 0, error: e.message }
rescue => e
Rails.logger.error "EnableBankingItem::Importer - Unexpected error fetching transactions for account #{enable_banking_account.uid}: #{e.class} - #{e.message}"
{ success: false, transactions_count: 0, error: e.message }
end
# Deduplicate transactions from the Enable Banking API response.
# Some banks return the same logical transaction multiple times with different
# entry_reference IDs. We build a composite content key that includes
# transaction_id (when present) alongside date, amount, currency, creditor,
# debtor, remittance_information, and status. Per the Enable Banking API docs
# transaction_id is not guaranteed to be unique, so it cannot be used as
# the sole dedup criterion. Including it in the composite key preserves
# legitimately distinct transactions with identical content but different
# transaction_ids (e.g. two laundromat payments on the same day). (Issue #954)
def deduplicate_api_transactions(transactions)
seen = {}
duplicates_removed = 0
result = transactions.select do |tx|
tx = tx.with_indifferent_access
key = build_transaction_content_key(tx)
if seen[key]
duplicates_removed += 1
false
else
seen[key] = true
true
end
end
if duplicates_removed > 0
Rails.logger.info(
"EnableBankingItem::Importer - Removed #{duplicates_removed} content-level " \
"duplicate(s) from API response (#{transactions.count}#{result.count} transactions)"
)
end
result
end
# Build a composite key for deduplication. Two transactions with different
# entry_reference values but identical content fields (including
# transaction_id and credit_debit_indicator) are considered duplicates.
# transaction_id is included as one component — not a standalone key —
# because the Enable Banking API docs state it is not guaranteed to be
# unique. credit_debit_indicator (CRDT/DBIT) is included because
# transaction_amount.amount is always positive — without it, a payment
# and a same-day refund of the same amount would produce identical keys.
# Known limitation: when transaction_id is nil for both, pure content
# comparison applies. This means two genuinely distinct transactions
# with identical content (same date, amount, direction, creditor, etc.)
# and no transaction_id would collapse to one. In practice, banks that
# omit transaction_id rarely produce such exact duplicates in the same
# API response; timestamps or remittance info usually differ. (Issue #954)
def build_transaction_content_key(tx)
date = tx[:booking_date].presence || tx[:value_date]
amount = tx.dig(:transaction_amount, :amount).presence || tx[:amount]
currency = tx.dig(:transaction_amount, :currency).presence || tx[:currency]
creditor = tx.dig(:creditor, :name).presence || tx[:creditor_name]
debtor = tx.dig(:debtor, :name).presence || tx[:debtor_name]
remittance = tx[:remittance_information]
remittance_key = remittance.is_a?(Array) ? remittance.compact.map(&:to_s).sort.join("|") : remittance.to_s
status = tx[:status]
tid = tx[:transaction_id]
direction = tx[:credit_debit_indicator]
[ date, amount, currency, creditor, debtor, remittance_key, status, tid, direction ].map(&:to_s).join("\x1F")
end
def determine_sync_start_date(enable_banking_account)
has_stored_transactions = enable_banking_account.raw_transactions_payload.to_a.any?
# Use user-configured sync_start_date if set, otherwise default
user_start_date = enable_banking_item.sync_start_date
if has_stored_transactions
# For incremental syncs, get transactions from 7 days before last sync
if enable_banking_item.last_synced_at
enable_banking_item.last_synced_at.to_date - 7.days
else
30.days.ago.to_date
end
else
# Initial sync: use user's configured date or default to 3 months
user_start_date || 3.months.ago.to_date
end
end
end