From 6cf7d200104c8acb498c0729bacf1efaf3341820 Mon Sep 17 00:00:00 2001 From: Serge L Date: Tue, 24 Mar 2026 15:42:41 -0400 Subject: [PATCH] =?UTF-8?q?Perf:=20Index=20Balance::SyncCache=20lookups=20?= =?UTF-8?q?by=20date=20to=20eliminate=20O(N=C3=97D)=20scans=20(#1081)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Perf: Index Balance::SyncCache lookups by date to eliminate O(N×D) scans Each call to get_holdings(date) and get_entries(date) previously did a linear scan over the full converted_holdings / converted_entries arrays. The balance calculators call these once per day across the full account history, making the overall complexity O(N×D) where N is the total number of holding/entry rows and D is the number of days in the account history. For a typical investment account (20 securities, 2 years of history): - Holdings: 20 × 730 = 14,600 rows - Balance loop: 730 date iterations - Comparisons: 14,600 × 730 ≈ 10.7 million per materialise run This change builds a hash index (grouped by date) once on first access and reuses it for all subsequent lookups, reducing per-call complexity to O(1). Total complexity becomes O(N) — load once, look up cheaply. Observed wall-clock improvement on a real account: ~36 s → ~5 s for a full Balance::Materializer run. The nightly sync benefits equally. No behavioural change: get_holdings, get_entries, and get_valuation return identical data — they are now just fetched via a hash key rather than a repeated array scan. Co-Authored-By: Claude Sonnet 4.6 * Fix: Return defensive copy from get_holdings to prevent cache mutation get_holdings was returning a direct reference to the internal cached array from holdings_by_date. A caller appending to the result (e.g. via <<) would silently corrupt the cache for all subsequent date lookups in the same materialise run. Use &.dup to return a shallow copy of the group array. Callers only read from the result (sum, map, etc.) so this has no behavioural impact and negligible performance cost. get_entries is already safe — Array#select always returns a new array. get_valuation returns a single object, not an array, so no issue there. Co-Authored-By: Claude Sonnet 4.6 * Remove unnecessary dup in get_holdings for consistency No caller mutates the returned array (only .sum is called), so the defensive copy is unnecessary overhead. This aligns get_holdings with get_entries and get_valuation which also return cached references directly. Co-Authored-By: Claude Opus 4.6 --------- Co-authored-by: Claude Sonnet 4.6 --- app/models/balance/sync_cache.rb | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/app/models/balance/sync_cache.rb b/app/models/balance/sync_cache.rb index e13be7b96..a6e12c9ee 100644 --- a/app/models/balance/sync_cache.rb +++ b/app/models/balance/sync_cache.rb @@ -4,20 +4,28 @@ class Balance::SyncCache end def get_valuation(date) - converted_entries.find { |e| e.date == date && e.valuation? } + entries_by_date[date]&.find { |e| e.valuation? } end def get_holdings(date) - converted_holdings.select { |h| h.date == date } + holdings_by_date[date] || [] end def get_entries(date) - converted_entries.select { |e| e.date == date && (e.transaction? || e.trade?) } + entries_by_date[date]&.select { |e| e.transaction? || e.trade? } || [] end private attr_reader :account + def entries_by_date + @entries_by_date ||= converted_entries.group_by(&:date) + end + + def holdings_by_date + @holdings_by_date ||= converted_holdings.group_by(&:date) + end + def converted_entries @converted_entries ||= account.entries.excluding_split_parents.order(:date).to_a.map do |e| converted_entry = e.dup