Files
sure/app/models/holding/reverse_calculator.rb
wps260 c294cbf54b Performance improvements in holding calculation pipeline (#1579)
* Performance improvements in holding calculation pipeline

Investment accounts with large histories were pegging CPU at 100% during
sync. Root cause was a cluster of quadratic and superlinear algorithms in
the inner holding calculation loop. All are replaced with O(1) hash lookups
built from single-pass indexes over the already-loaded data.

Holding::PortfolioCache - load_prices:

  Three O(SxN) patterns inside the per-security loop:

  1. DB prices: `security.prices.where(...)` fired one SQL query per
     security (N+1). Replaced with a single bulk query before the loop:

       Security::Price.where(security_id: ..., date: ...).group_by(&:security_id)

     70 securities -> 70 queries becomes 1.

  2. Trade prices: `trades.select { |t| t.entryable.security_id == id }`
     scanned the full trades array for every security - O(SxT). Replaced
     with trades_by_security_id, pre-indexed once from the loaded array.

  3. Holding prices: `holdings.select { |h| h.security_id == id }` - same
     O(SxH) pattern. Replaced with holdings_by_security_id.

  Prices are now indexed into prices_by_date and prices_by_date_and_source
  hashes during load_prices, making get_price O(1) instead of scanning the
  flat prices array on every lookup.

Holding::PortfolioCache - get_trades / get_price:

  - get_trades(date:): `trades.select { |t| t.date == date }` (O(T) scan)
    replaced with trades_by_date hash (O(1)).

  - get_price: two `prices.select { p.date == date ... }.min_by` linear
    scans replaced with direct hash lookups into prices_by_date and
    prices_by_date_and_source.

Holding::PortfolioCache - collect_unique_securities:

  `holdings.map(&:security)` traversed the security association on every
  holding record (N+1 if not preloaded). Replaced with a pluck of
  security_ids followed by a single Security.where(id: ...) batch load.

Holding::ForwardCalculator / ReverseCalculator:

  `holdings += build_holdings(...)` allocated a new array copy on every
  iteration - O(N) per day x thousands of days = O(D^2) total allocations.
  Replaced with holdings.concat(...) which appends in place, O(1).

Holding::ReverseCalculator - precompute_cost_basis:

  Old: walked every date from account.start_date to Date.current (O(D)),
  writing a cost_basis entry for every security on every date. For an
  account with 2 trades over 9,250 days this wrote ~18,500 hash entries
  and consumed the full date range in the outer loop regardless of trade
  density.

  New: walks only buy trades (O(T)), appending one [date, avg_cost]
  snapshot per trade. cost_basis_for binary-searches the sparse snapshot
  array - O(log T) per lookup. Memory drops from O(DxS) to O(T).

Holding::Gapfillable:

  `security_holdings.find { |h| h.date == date }` was called on every
  date in the gapfill range - O(H) per date, O(HxD) total. Replaced with
  security_holdings.index_by(:date) built once before the loop, making
  each date lookup O(1).

Holding::Materializer - purge_stale_holdings:

  `account.entries.trades.map { |entry| entry.entryable.security_id }.uniq`
  loaded all trade entry records into Ruby then traversed the entryable
  association on each (N+1). Replaced with account.trades.pluck(:security_id).uniq
  (single SQL query returning only the IDs).

In testing, these changes were able to reduce sync time of an account with
25 years of history and 70 securities from about 90 minutes down to under
3 minutes.

* Lint fix

* Lint fix

* addressing the open review nits I agreed with:

* return dup'd arrays from PortfolioCache#get_trades so callers can't mutate memoized cache state
* use the precomputed security-id indexes in collect_unique_securities
* keep security-id dedupe in SQL via distinct.pluck(:security_id)
* tighten the DB price preload to select only needed columns
* harden cost-basis assertions with assert_in_delta

* Back out unnecessary AI slop

* Add back dup to trades array returned from memoized hash

trades_by_date[date] returns a live reference into the memoized hash.
Any caller that mutates the result would silently corrupt the cache for
subsequent calls on the same date within the same sync run. Add .dup to
return a shallow copy, matching the safety of the original select path.
2026-05-05 01:24:33 +02:00

125 lines
4.0 KiB
Ruby

class Holding::ReverseCalculator
attr_reader :account, :portfolio_snapshot
def initialize(account, portfolio_snapshot:, security_ids: nil)
@account = account
@portfolio_snapshot = portfolio_snapshot
@security_ids = security_ids
end
def calculate
Rails.logger.tagged("Holding::ReverseCalculator") do
precompute_cost_basis
holdings = calculate_holdings
Holding.gapfill(holdings)
end
end
private
# Reverse calculators will use the existing holdings as a source of security ids and prices
# since it is common for a provider to supply "current day" holdings but not all the historical
# trades that make up those holdings.
def portfolio_cache
@portfolio_cache ||= Holding::PortfolioCache.new(account, use_holdings: true, security_ids: @security_ids)
end
def calculate_holdings
# Start with the portfolio snapshot passed in from the materializer
current_portfolio = portfolio_snapshot.to_h
previous_portfolio = {}
holdings = []
Date.current.downto(account.start_date).each do |date|
today_trades = portfolio_cache.get_trades(date: date)
previous_portfolio = transform_portfolio(current_portfolio, today_trades, direction: :reverse)
# If current day, always use holding prices (since that's what Plaid gives us). For historical values, use market data (since Plaid doesn't supply historical prices)
holdings.concat(build_holdings(current_portfolio, date, price_source: date == Date.current ? "holding" : nil))
current_portfolio = previous_portfolio
end
holdings
end
def transform_portfolio(previous_portfolio, trade_entries, direction: :forward)
new_quantities = previous_portfolio.dup
trade_entries.each do |trade_entry|
trade = trade_entry.entryable
security_id = trade.security_id
qty_change = trade.qty
qty_change = qty_change * -1 if direction == :reverse
new_quantities[security_id] = (new_quantities[security_id] || 0) + qty_change
end
new_quantities
end
def build_holdings(portfolio, date, price_source: nil)
portfolio.map do |security_id, qty|
next if @security_ids && !@security_ids.include?(security_id)
price = portfolio_cache.get_price(security_id, date, source: price_source)
if price.nil?
next
end
Holding.new(
account_id: account.id,
security_id: security_id,
date: date,
qty: qty,
price: price.price,
currency: price.currency,
amount: qty * price.price,
cost_basis: cost_basis_for(security_id, date)
)
end.compact
end
def precompute_cost_basis
@cost_basis_snapshots = Hash.new { |h, k| h[k] = [] }
tracker = Hash.new { |h, k| h[k] = { total_cost: BigDecimal("0"), total_qty: BigDecimal("0") } }
portfolio_cache.get_trades.sort_by(&:date).each do |trade_entry|
trade = trade_entry.entryable
next unless trade.qty > 0
security_id = trade.security_id
trade_price = Money.new(trade.price, trade.currency)
begin
converted_price = trade_price.exchange_to(account.currency).amount
rescue Money::ConversionError
converted_price = trade.price
end
tracker[security_id][:total_cost] += converted_price * trade.qty
tracker[security_id][:total_qty] += trade.qty
@cost_basis_snapshots[security_id] << [
trade_entry.date,
tracker[security_id][:total_cost] / tracker[security_id][:total_qty]
]
end
end
def cost_basis_for(security_id, date)
snapshots = @cost_basis_snapshots[security_id]
return nil if snapshots.empty?
lo, hi, result = 0, snapshots.size - 1, nil
while lo <= hi
mid = (lo + hi) / 2
if snapshots[mid][0] <= date
result = snapshots[mid][1]
lo = mid + 1
else
hi = mid - 1
end
end
result
end
end