Files
sure/app/models/holding/materializer.rb
wps260 c294cbf54b Performance improvements in holding calculation pipeline (#1579)
* Performance improvements in holding calculation pipeline

Investment accounts with large histories were pegging CPU at 100% during
sync. Root cause was a cluster of quadratic and superlinear algorithms in
the inner holding calculation loop. All are replaced with O(1) hash lookups
built from single-pass indexes over the already-loaded data.

Holding::PortfolioCache - load_prices:

  Three O(SxN) patterns inside the per-security loop:

  1. DB prices: `security.prices.where(...)` fired one SQL query per
     security (N+1). Replaced with a single bulk query before the loop:

       Security::Price.where(security_id: ..., date: ...).group_by(&:security_id)

     70 securities -> 70 queries becomes 1.

  2. Trade prices: `trades.select { |t| t.entryable.security_id == id }`
     scanned the full trades array for every security - O(SxT). Replaced
     with trades_by_security_id, pre-indexed once from the loaded array.

  3. Holding prices: `holdings.select { |h| h.security_id == id }` - same
     O(SxH) pattern. Replaced with holdings_by_security_id.

  Prices are now indexed into prices_by_date and prices_by_date_and_source
  hashes during load_prices, making get_price O(1) instead of scanning the
  flat prices array on every lookup.

Holding::PortfolioCache - get_trades / get_price:

  - get_trades(date:): `trades.select { |t| t.date == date }` (O(T) scan)
    replaced with trades_by_date hash (O(1)).

  - get_price: two `prices.select { p.date == date ... }.min_by` linear
    scans replaced with direct hash lookups into prices_by_date and
    prices_by_date_and_source.

Holding::PortfolioCache - collect_unique_securities:

  `holdings.map(&:security)` traversed the security association on every
  holding record (N+1 if not preloaded). Replaced with a pluck of
  security_ids followed by a single Security.where(id: ...) batch load.

Holding::ForwardCalculator / ReverseCalculator:

  `holdings += build_holdings(...)` allocated a new array copy on every
  iteration - O(N) per day x thousands of days = O(D^2) total allocations.
  Replaced with holdings.concat(...) which appends in place, O(1).

Holding::ReverseCalculator - precompute_cost_basis:

  Old: walked every date from account.start_date to Date.current (O(D)),
  writing a cost_basis entry for every security on every date. For an
  account with 2 trades over 9,250 days this wrote ~18,500 hash entries
  and consumed the full date range in the outer loop regardless of trade
  density.

  New: walks only buy trades (O(T)), appending one [date, avg_cost]
  snapshot per trade. cost_basis_for binary-searches the sparse snapshot
  array - O(log T) per lookup. Memory drops from O(DxS) to O(T).

Holding::Gapfillable:

  `security_holdings.find { |h| h.date == date }` was called on every
  date in the gapfill range - O(H) per date, O(HxD) total. Replaced with
  security_holdings.index_by(:date) built once before the loop, making
  each date lookup O(1).

Holding::Materializer - purge_stale_holdings:

  `account.entries.trades.map { |entry| entry.entryable.security_id }.uniq`
  loaded all trade entry records into Ruby then traversed the entryable
  association on each (N+1). Replaced with account.trades.pluck(:security_id).uniq
  (single SQL query returning only the IDs).

In testing, these changes were able to reduce sync time of an account with
25 years of history and 70 securities from about 90 minutes down to under
3 minutes.

* Lint fix

* Lint fix

* addressing the open review nits I agreed with:

* return dup'd arrays from PortfolioCache#get_trades so callers can't mutate memoized cache state
* use the precomputed security-id indexes in collect_unique_securities
* keep security-id dedupe in SQL via distinct.pluck(:security_id)
* tighten the DB price preload to select only needed columns
* harden cost-basis assertions with assert_in_delta

* Back out unnecessary AI slop

* Add back dup to trades array returned from memoized hash

trades_by_date[date] returns a live reference into the memoized hash.
Any caller that mutates the result would silently corrupt the cache for
subsequent calls on the same date within the same sync run. Add .dup to
return a shallow copy, matching the safety of the original select path.
2026-05-05 01:24:33 +02:00

199 lines
7.5 KiB
Ruby

# "Materializes" holdings (similar to a DB materialized view, but done at the app level)
# into a series of records we can easily query and join with other data.
class Holding::Materializer
def initialize(account, strategy:, security_ids: nil)
@account = account
@strategy = strategy
@security_ids = security_ids
end
def materialize_holdings
calculate_holdings
Rails.logger.info("Persisting #{@holdings.size} holdings")
persist_holdings
if strategy == :forward && security_ids.nil?
purge_stale_holdings
end
# Clean up only calculated holdings that are directly shadowed by a provider snapshot
# on the same date/security/currency. Historical calculated rows for provider-linked
# securities are still needed to derive sane balance charts between sync snapshots.
cleanup_shadowed_calculated_holdings
# Also remove calculated rows on the provider's latest snapshot date when those
# securities are no longer present in the provider payload. This keeps "current"
# holdings/balance composition aligned with the provider snapshot while preserving
# older calculated history.
cleanup_stale_calculated_rows_on_latest_provider_snapshot
# Reload holdings association to clear any cached stale data
# This ensures subsequent Balance calculations see the fresh holdings
account.holdings.reload
@holdings
end
private
attr_reader :account, :strategy, :security_ids
def calculate_holdings
@holdings = calculator.calculate
end
def persist_holdings
return if @holdings.empty?
current_time = Time.now
# Load existing holdings to check locked status and source priority
existing_holdings_map = load_existing_holdings_map
# Separate holdings into categories based on cost_basis reconciliation
holdings_to_upsert_with_cost = []
holdings_to_upsert_without_cost = []
@holdings.each do |holding|
key = holding_key(holding)
existing = existing_holdings_map[key]
# Skip provider-sourced holdings - they have authoritative data from the provider
# (e.g., Coinbase, SimpleFIN) and should not be overwritten by calculated holdings
if existing&.account_provider_id.present?
Rails.logger.debug(
"Holding::Materializer - Skipping provider-sourced holding id=#{existing.id} " \
"security_id=#{existing.security_id} date=#{existing.date}"
)
next
end
reconciled = Holding::CostBasisReconciler.reconcile(
existing_holding: existing,
incoming_cost_basis: holding.cost_basis,
incoming_source: "calculated"
)
base_attrs = holding.attributes
.slice("date", "currency", "qty", "price", "amount", "security_id")
.merge("account_id" => account.id, "updated_at" => current_time)
if existing&.cost_basis_locked?
# For locked holdings, preserve ALL cost_basis fields
holdings_to_upsert_without_cost << base_attrs
elsif reconciled[:should_update] && reconciled[:cost_basis].present?
# Update with new cost_basis and source
holdings_to_upsert_with_cost << base_attrs.merge(
"cost_basis" => reconciled[:cost_basis],
"cost_basis_source" => reconciled[:cost_basis_source]
)
else
# No cost_basis to set, or existing is better - don't touch cost_basis fields
holdings_to_upsert_without_cost << base_attrs
end
end
# Upsert with cost_basis updates
if holdings_to_upsert_with_cost.any?
account.holdings.upsert_all(
holdings_to_upsert_with_cost,
unique_by: %i[account_id security_id date currency]
)
end
# Upsert without cost_basis (preserves existing)
if holdings_to_upsert_without_cost.any?
account.holdings.upsert_all(
holdings_to_upsert_without_cost,
unique_by: %i[account_id security_id date currency]
)
end
end
def load_existing_holdings_map
# Load holdings that might affect reconciliation:
# - Locked holdings (must preserve their cost_basis)
# - Holdings with a source (need to check priority)
# - Provider-sourced holdings (must not be overwritten)
account.holdings
.where(cost_basis_locked: true)
.or(account.holdings.where.not(cost_basis_source: nil))
.or(account.holdings.where.not(account_provider_id: nil))
.index_by { |h| holding_key(h) }
end
# Remove only calculated holdings that collide with an authoritative provider snapshot
# on the exact same key. This preserves reverse-calculated history for linked accounts.
def cleanup_shadowed_calculated_holdings
deleted_count = account.holdings
.where(account_provider_id: nil)
.where(<<~SQL)
EXISTS (
SELECT 1
FROM holdings provider_holdings
WHERE provider_holdings.account_id = holdings.account_id
AND provider_holdings.security_id = holdings.security_id
AND provider_holdings.date = holdings.date
AND provider_holdings.currency = holdings.currency
AND provider_holdings.account_provider_id IS NOT NULL
)
SQL
.delete_all
Rails.logger.info("Cleaned up #{deleted_count} calculated holdings shadowed by provider snapshots") if deleted_count > 0
end
def cleanup_stale_calculated_rows_on_latest_provider_snapshot
provider_snapshot_date = account.latest_provider_holdings_snapshot_date
return unless provider_snapshot_date
provider_security_ids = account.holdings
.where.not(account_provider_id: nil)
.where(date: provider_snapshot_date)
.distinct
.pluck(:security_id)
scope = account.holdings
.where(account_provider_id: nil, date: provider_snapshot_date)
scope = if provider_security_ids.any?
scope.where.not(security_id: provider_security_ids)
else
scope
end
deleted_count = scope.delete_all
Rails.logger.info("Cleaned up #{deleted_count} stale calculated holdings on latest provider snapshot date") if deleted_count > 0
end
def holding_key(holding)
[ holding.account_id || account.id, holding.security_id, holding.date, holding.currency ]
end
def purge_stale_holdings
portfolio_security_ids = account.trades.distinct.pluck(:security_id)
# Never delete provider-sourced holdings - they're authoritative from the provider
# If there are no securities in the portfolio, only delete non-provider holdings
if portfolio_security_ids.empty?
Rails.logger.info("Clearing non-provider holdings (no securities from trades)")
account.holdings.where(account_provider_id: nil).delete_all
else
# Keep provider holdings and holdings for known securities within date range
deleted_count = account.holdings
.where(account_provider_id: nil)
.delete_by("date < ? OR security_id NOT IN (?)", account.start_date, portfolio_security_ids)
Rails.logger.info("Purged #{deleted_count} stale holdings") if deleted_count > 0
end
end
def calculator
if strategy == :reverse
portfolio_snapshot = Holding::PortfolioSnapshot.new(account)
Holding::ReverseCalculator.new(account, portfolio_snapshot: portfolio_snapshot, security_ids: security_ids)
else
Holding::ForwardCalculator.new(account, security_ids: security_ids)
end
end
end