Perf: Index Balance::SyncCache lookups by date to eliminate O(N×D) scans (#1081)

* Perf: Index Balance::SyncCache lookups by date to eliminate O(N×D) scans

Each call to get_holdings(date) and get_entries(date) previously did a
linear scan over the full converted_holdings / converted_entries arrays.
The balance calculators call these once per day across the full account
history, making the overall complexity O(N×D) where N is the total number
of holding/entry rows and D is the number of days in the account history.

For a typical investment account (20 securities, 2 years of history):
  - Holdings: 20 × 730 = 14,600 rows
  - Balance loop: 730 date iterations
  - Comparisons: 14,600 × 730 ≈ 10.7 million per materialise run

This change builds a hash index (grouped by date) once on first access and
reuses it for all subsequent lookups, reducing per-call complexity to O(1).
Total complexity becomes O(N) — load once, look up cheaply.

Observed wall-clock improvement on a real account: ~36 s → ~5 s for a full
Balance::Materializer run. The nightly sync benefits equally.

No behavioural change: get_holdings, get_entries, and get_valuation return
identical data — they are now just fetched via a hash key rather than a
repeated array scan.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: Return defensive copy from get_holdings to prevent cache mutation

get_holdings was returning a direct reference to the internal cached
array from holdings_by_date. A caller appending to the result (e.g.
via <<) would silently corrupt the cache for all subsequent date
lookups in the same materialise run.

Use &.dup to return a shallow copy of the group array. Callers only
read from the result (sum, map, etc.) so this has no behavioural
impact and negligible performance cost.

get_entries is already safe — Array#select always returns a new array.
get_valuation returns a single object, not an array, so no issue there.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Remove unnecessary dup in get_holdings for consistency

No caller mutates the returned array (only .sum is called), so the
defensive copy is unnecessary overhead. This aligns get_holdings with
get_entries and get_valuation which also return cached references directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Serge L
2026-03-24 15:42:41 -04:00
committed by GitHub
parent b78944ed6f
commit 6cf7d20010

View File

@@ -4,20 +4,28 @@ class Balance::SyncCache
end
def get_valuation(date)
converted_entries.find { |e| e.date == date && e.valuation? }
entries_by_date[date]&.find { |e| e.valuation? }
end
def get_holdings(date)
converted_holdings.select { |h| h.date == date }
holdings_by_date[date] || []
end
def get_entries(date)
converted_entries.select { |e| e.date == date && (e.transaction? || e.trade?) }
entries_by_date[date]&.select { |e| e.transaction? || e.trade? } || []
end
private
attr_reader :account
def entries_by_date
@entries_by_date ||= converted_entries.group_by(&:date)
end
def holdings_by_date
@holdings_by_date ||= converted_holdings.group_by(&:date)
end
def converted_entries
@converted_entries ||= account.entries.excluding_split_parents.order(:date).to_a.map do |e|
converted_entry = e.dup