Harden SimpleFin sync: retries, safer imports, manual relinking, and data-quality reconciliation (#544)

* Add tests and enhance logic for SimpleFin account synchronization and reconciliation

- Added retry logic with exponential backoff for network errors in `Provider::Simplefin`.
- Introduced tests to verify retry functionality and error handling for rate-limit, server errors, and stale data.
- Updated `SimplefinItem` to detect stale sync status and reconciliation issues.
- Enhanced UI to display stale sync warnings and data integrity notices.
- Improved SimpleFin account matching during updates with multi-tier strategy (ID, fingerprint, fuzzy match).
- Added transaction reconciliation logic to detect data gaps, transaction count drops, and duplicate transaction IDs.

* Introduce `SimplefinConnectionUpdateJob` for asynchronous SimpleFin connection updates

- Moved SimpleFin connection update logic to `SimplefinConnectionUpdateJob` to improve response times by offloading network retries, data fetching, and reconciliation tasks.
- Enhanced SimpleFin account matching with a multi-tier strategy (ID, fingerprint, fuzzy name match).
- Added retry logic and bounded latency for token claim requests in `Provider::Simplefin`.
- Updated tests to cover the new job flow and ensure correct account reconciliation during updates.

* Remove unused SimpleFin account matching logic and improve error handling in `SimplefinConnectionUpdateJob`

- Deleted the multi-tier account matching logic from `SimplefinItemsController` as it is no longer used.
- Enhanced error handling in `SimplefinConnectionUpdateJob` to gracefully handle import failures, ensuring orphaned items can be manually resolved.
- Updated job flow to conditionally set item status based on the success of import operations.

* Fix SimpleFin sync: check both legacy FK and AccountProvider for linked accounts

* Add crypto, checking, savings, and cash account detection; refine subtype selection and linking

- Enhanced `Simplefin::AccountTypeMapper` to include detection for crypto, checking, savings, and standalone cash accounts.
- Improved subtype selection UI with validation and warning indicators for missing selections.
- Updated SimpleFin account linking to handle both legacy FK and `AccountProvider` associations consistently.
- Refined job flow and importer logic for better handling of linked accounts and subtype inference.

* Improve `SimplefinConnectionUpdateJob` and holdings processing logic

- Fixed race condition in `SimplefinConnectionUpdateJob` by moving `destroy_later` calls outside of transactions.
- Updated fuzzy name match logic to use Levenshtein distance for better accuracy.
- Enhanced synthetic ticker generation in holdings processor with hash suffix for uniqueness.

* Refine SimpleFin entry processing logic and ensure `extra` data persistence

- Simplified pending flag determination to rely solely on provider-supplied values.
- Fixed potential stale values in `extra` by ensuring deep merge overwrite with `entry.transaction.save!`.

* Replace hardcoded fallback transaction description with localized string

* Refine pending flag logic in SimpleFin processor tests

- Adjust test to prevent falsely inferring pending status from missing posted dates.
- Ensure provider explicitly sets pending flag for transactions.

* Add `has_many :holdings` association to `AccountProvider` with `dependent: :nullify`

---------

Co-authored-by: Josh Waldrep <joshua.waldrep5+github@gmail.com>
This commit is contained in:
LPW
2026-01-05 16:11:47 -05:00
committed by GitHub
parent b3330a318d
commit c12c585a0e
21 changed files with 913 additions and 179 deletions

View File

@@ -15,11 +15,23 @@ class SimplefinItem::Importer
Rails.logger.info "SimplefinItem::Importer - last_synced_at: #{simplefin_item.last_synced_at.inspect}"
Rails.logger.info "SimplefinItem::Importer - sync_start_date: #{simplefin_item.sync_start_date.inspect}"
# Clear stale error and reconciliation stats from previous syncs at the start of a full import
# This ensures the UI doesn't show outdated warnings from old sync runs
if sync.respond_to?(:sync_stats)
sync.update_columns(sync_stats: {
"cleared_at" => Time.current.iso8601,
"import_started" => true
})
end
begin
# Defensive guard: If last_synced_at is set but there are linked accounts
# with no transactions captured yet (typical after a balances-only run),
# force the first full run to use chunked history to backfill.
linked_accounts = simplefin_item.simplefin_accounts.joins(:account)
#
# Check for linked accounts via BOTH legacy FK (accounts.simplefin_account_id) AND
# the new AccountProvider system. An account is "linked" if either association exists.
linked_accounts = simplefin_item.simplefin_accounts.select { |sfa| sfa.current_account.present? }
no_txns_yet = linked_accounts.any? && linked_accounts.all? { |sfa| sfa.raw_transactions_payload.blank? }
if simplefin_item.last_synced_at.nil? || no_txns_yet
@@ -569,7 +581,18 @@ class SimplefinItem::Importer
end
end
attrs[:raw_transactions_payload] = best_by_key.values
merged_transactions = best_by_key.values
attrs[:raw_transactions_payload] = merged_transactions
# NOTE: Reconciliation disabled - it analyzes the SimpleFin API response
# which only contains ~90 days of history, creating misleading "gap" warnings
# that don't reflect actual database state. Re-enable if we improve it to
# compare against database transactions instead of just the API response.
# begin
# reconcile_transactions(simplefin_account, merged_transactions)
# rescue => e
# Rails.logger.warn("SimpleFin: reconciliation failed for sfa=#{simplefin_account.id || account_id}: #{e.class} - #{e.message}")
# end
end
# Track whether incoming holdings are new/changed so we can materialize and refresh balances
@@ -783,4 +806,164 @@ class SimplefinItem::Importer
# Default to 7 days buffer for subsequent syncs
7
end
# Transaction reconciliation: detect potential data gaps or missing transactions
# This helps identify when SimpleFin may not be returning complete data
def reconcile_transactions(simplefin_account, new_transactions)
return if new_transactions.blank?
account_id = simplefin_account.account_id
existing_transactions = simplefin_account.raw_transactions_payload.to_a
reconciliation = { account_id: account_id, issues: [] }
# 1. Check for unexpected transaction count drops
# If we previously had more transactions and now have fewer (after merge),
# something may have been removed upstream
if existing_transactions.any?
existing_count = existing_transactions.size
new_count = new_transactions.size
# After merging, we should have at least as many as before
# A significant drop (>10%) could indicate data loss
if new_count < existing_count
drop_pct = ((existing_count - new_count).to_f / existing_count * 100).round(1)
if drop_pct > 10
reconciliation[:issues] << {
type: "transaction_count_drop",
message: "Transaction count dropped from #{existing_count} to #{new_count} (#{drop_pct}% decrease)",
severity: drop_pct > 25 ? "high" : "medium"
}
end
end
end
# 2. Detect gaps in transaction history
# Look for periods with no transactions that seem unusual
gaps = detect_transaction_gaps(new_transactions)
if gaps.any?
reconciliation[:issues] += gaps.map do |gap|
{
type: "transaction_gap",
message: "No transactions between #{gap[:start_date]} and #{gap[:end_date]} (#{gap[:days]} days)",
severity: gap[:days] > 30 ? "high" : "medium",
gap_start: gap[:start_date],
gap_end: gap[:end_date],
gap_days: gap[:days]
}
end
end
# 3. Check for stale data (most recent transaction is old)
latest_tx_date = extract_latest_transaction_date(new_transactions)
if latest_tx_date.present?
days_since_latest = (Date.current - latest_tx_date).to_i
if days_since_latest > 7
reconciliation[:issues] << {
type: "stale_transactions",
message: "Most recent transaction is #{days_since_latest} days old",
severity: days_since_latest > 14 ? "high" : "medium",
latest_date: latest_tx_date.to_s,
days_stale: days_since_latest
}
end
end
# 4. Check for duplicate transaction IDs (data integrity issue)
duplicate_ids = find_duplicate_transaction_ids(new_transactions)
if duplicate_ids.any?
reconciliation[:issues] << {
type: "duplicate_ids",
message: "Found #{duplicate_ids.size} duplicate transaction ID(s)",
severity: "low",
duplicate_count: duplicate_ids.size
}
end
# Record reconciliation results in stats
if reconciliation[:issues].any?
stats["reconciliation"] ||= {}
stats["reconciliation"][account_id] = reconciliation
# Count issues by severity
high_severity = reconciliation[:issues].count { |i| i[:severity] == "high" }
medium_severity = reconciliation[:issues].count { |i| i[:severity] == "medium" }
if high_severity > 0
stats["reconciliation_warnings"] = stats.fetch("reconciliation_warnings", 0) + high_severity
Rails.logger.warn("SimpleFin reconciliation: #{high_severity} high-severity issue(s) for account #{account_id}")
ActiveSupport::Notifications.instrument(
"simplefin.reconciliation_warning",
item_id: simplefin_item.id,
account_id: account_id,
issues: reconciliation[:issues]
)
end
if medium_severity > 0
stats["reconciliation_notices"] = stats.fetch("reconciliation_notices", 0) + medium_severity
end
persist_stats!
end
reconciliation
end
# Detect gaps in transaction history (periods with no activity)
def detect_transaction_gaps(transactions)
return [] if transactions.blank? || transactions.size < 2
# Extract and sort transaction dates
dates = transactions.map do |tx|
t = tx.with_indifferent_access
posted = t[:posted]
next nil if posted.blank? || posted.to_i <= 0
Time.at(posted.to_i).to_date
end.compact.uniq.sort
return [] if dates.size < 2
gaps = []
min_gap_days = 14 # Only report gaps of 2+ weeks
dates.each_cons(2) do |earlier, later|
gap_days = (later - earlier).to_i
if gap_days >= min_gap_days
gaps << {
start_date: earlier.to_s,
end_date: later.to_s,
days: gap_days
}
end
end
# Limit to top 3 largest gaps to avoid noise
gaps.sort_by { |g| -g[:days] }.first(3)
end
# Extract the most recent transaction date
def extract_latest_transaction_date(transactions)
return nil if transactions.blank?
latest_timestamp = transactions.map do |tx|
t = tx.with_indifferent_access
posted = t[:posted]
posted.to_i if posted.present? && posted.to_i > 0
end.compact.max
latest_timestamp ? Time.at(latest_timestamp).to_date : nil
end
# Find duplicate transaction IDs
def find_duplicate_transaction_ids(transactions)
return [] if transactions.blank?
ids = transactions.map do |tx|
t = tx.with_indifferent_access
t[:id] || t[:fitid]
end.compact
ids.group_by(&:itself).select { |_, v| v.size > 1 }.keys
end
end

View File

@@ -10,7 +10,15 @@ class SimplefinItem::Syncer
# can review and manually link accounts first. This mirrors the historical flow
# users expect: initial 7-day balances snapshot, then full chunked history after linking.
begin
if simplefin_item.simplefin_accounts.joins(:account).count == 0
# Check for linked accounts via BOTH legacy FK (accounts.simplefin_account_id) AND
# the new AccountProvider system. An account is "linked" if either association exists.
linked_via_legacy = simplefin_item.simplefin_accounts.joins(:account).count
linked_via_provider = simplefin_item.simplefin_accounts.joins(:account_provider).count
total_linked = simplefin_item.simplefin_accounts.select { |sfa| sfa.current_account.present? }.count
Rails.logger.info("SimplefinItem::Syncer - linked check: legacy=#{linked_via_legacy}, provider=#{linked_via_provider}, total=#{total_linked}")
if total_linked == 0
sync.update!(status_text: "Discovering accounts (balances only)...") if sync.respond_to?(:status_text)
# Pre-mark the sync as balances_only for runtime only (no persistence)
begin
@@ -52,8 +60,9 @@ class SimplefinItem::Syncer
finalize_setup_counts(sync)
# Process transactions/holdings only for linked accounts
linked_accounts = simplefin_item.simplefin_accounts.joins(:account)
if linked_accounts.any?
# Check both legacy FK and AccountProvider associations
linked_simplefin_accounts = simplefin_item.simplefin_accounts.select { |sfa| sfa.current_account.present? }
if linked_simplefin_accounts.any?
sync.update!(status_text: "Processing transactions and holdings...") if sync.respond_to?(:status_text)
simplefin_item.process_accounts
@@ -77,7 +86,11 @@ class SimplefinItem::Syncer
def finalize_setup_counts(sync)
sync.update!(status_text: "Checking account configuration...") if sync.respond_to?(:status_text)
total_accounts = simplefin_item.simplefin_accounts.count
linked_accounts = simplefin_item.simplefin_accounts.joins(:account)
# Count linked accounts using both legacy FK and AccountProvider associations
linked_count = simplefin_item.simplefin_accounts.count { |sfa| sfa.current_account.present? }
# Unlinked = no legacy FK AND no AccountProvider
unlinked_accounts = simplefin_item.simplefin_accounts
.left_joins(:account, :account_provider)
.where(accounts: { id: nil }, account_providers: { id: nil })
@@ -93,7 +106,7 @@ class SimplefinItem::Syncer
existing = (sync.sync_stats || {})
setup_stats = {
"total_accounts" => total_accounts,
"linked_accounts" => linked_accounts.count,
"linked_accounts" => linked_count,
"unlinked_accounts" => unlinked_accounts.count
}
sync.update!(sync_stats: existing.merge(setup_stats))
@@ -185,7 +198,8 @@ class SimplefinItem::Syncer
window_start = sync.created_at || 30.minutes.ago
window_end = Time.current
account_ids = simplefin_item.simplefin_accounts.joins(:account).pluck("accounts.id")
# Get account IDs via BOTH legacy FK and AccountProvider to ensure we capture all linked accounts
account_ids = simplefin_item.simplefin_accounts.filter_map { |sfa| sfa.current_account&.id }
return {} if account_ids.empty?
tx_scope = Entry.where(account_id: account_ids, source: "simplefin", entryable_type: "Transaction")
@@ -193,14 +207,16 @@ class SimplefinItem::Syncer
tx_updated = tx_scope.where(updated_at: window_start..window_end).where.not(created_at: window_start..window_end).count
tx_seen = tx_imported + tx_updated
holdings_scope = Holding.where(account_id: account_ids)
holdings_processed = holdings_scope.where(created_at: window_start..window_end).count
# Count holdings from raw_holdings_payload (what the sync found) rather than
# the database. Holdings are applied asynchronously via SimplefinHoldingsApplyJob,
# so database counts would always be 0 at this point.
holdings_found = simplefin_item.simplefin_accounts.sum { |sfa| Array(sfa.raw_holdings_payload).size }
{
"tx_imported" => tx_imported,
"tx_updated" => tx_updated,
"tx_seen" => tx_seen,
"holdings_processed" => holdings_processed,
"holdings_found" => holdings_found,
"window_start" => window_start,
"window_end" => window_end
}