fix(simplefin): treat Vanguard/Fidelity cost_basis as total when needed (#1772)

* fix(simplefin): treat Vanguard/Fidelity cost_basis as total when needed

PR #1692 normalized SimpleFIN holdings cost_basis under the assumption
that the `cost_basis` / `basis` keys carry a per-share value (per the
SimpleFIN spec) and only `total_cost` / `value` carry a total position
cost. Vanguard and Fidelity violate the spec — they populate
`cost_basis` with the *total* (see the payload in #1182). After PR
#1692 those holdings get stored with cost_basis = total, and
Holding#calculate_trend then computes previous = qty × avg_cost, so the
"previous" value is inflated by a factor of qty and an entire
investment account renders a phantom return of roughly -(1 − 1/qty),
i.e. -97% to -99%.

Fix: sanity-check raw cost_basis against the holding's market share
price. Let share_price = market_value / qty; the geometric midpoint
between "raw is per-share" (raw ≈ share_price) and "raw is total"
(raw ≈ qty × share_price) is share_price × √qty. If raw is above the
midpoint it is divided by qty; otherwise it is kept as per-share.
Falls back to the pre-fix behaviour (trust the spec) when market_value
or qty is unavailable, so confidently-correct readings are never made
worse.

Verified against the reported Vanguard payload (qty=139, cost_basis=
22004.40, market_value=22626.42): normalize_cost_basis now returns
$158.31/share, matching 22004.40 / 139, and the phantom -99% return
collapses to a realistic ~+2.8%. Per-share readings ($45 cost on a $50
share price) remain untouched.

Closes #1718. Refs #1182, #1692.

* fixup: replace cost_basis heuristic with institution allowlist

Codex and @EdeAbreu23 flagged a real false-positive in the previous
geometric-midpoint heuristic: a legitimate per-share `cost_basis` on a
holding with a large unrealized loss (e.g. 100 shares with $100/share
basis now worth $5/share) trips `share_price × √qty` and gets divided
to $1/share — corrupting any standards-compliant brokerage with a big
loss.

Adopt @EdeAbreu23's safer shape:
- total_cost / value: always divide by qty (unchanged from #1692).
- cost_basis / basis: keep as-is by default.
- Only divide cost_basis / basis when the holding's SimpleFIN account
  is connected to a known-misbehaving institution. Allowlist starts
  with `vanguard` and `fidelity`, matched case-insensitively against
  the account's stored org name and domain. Easy to extend as more
  brokerages turn up.

Trades a small maintenance cost (curated list) for zero risk of
corrupting compliant providers.

Verified against five scenarios (all expected):
  Vanguard total in cost_basis (allowlist) → +2.83%
  Fidelity total in basis (allowlist)      → +33.33%
  Big-loss per-share (Codex case)          → -95.0%  (preserved)
  Honest per-share, small loss             → +11.11% (unchanged)
  total_cost on any institution            → +11.11% (unchanged)

---------

Co-authored-by: plind-junior <plind-junior@users.noreply.github.com>
This commit is contained in:
plind
2026-05-13 09:17:10 -07:00
committed by GitHub
parent 95f6451b39
commit 834686cffd
2 changed files with 104 additions and 10 deletions

View File

@@ -48,7 +48,7 @@ class SimplefinAccount::Investments::HoldingsProcessor
qty = parse_decimal(any_of(simplefin_holding, %w[shares quantity qty units]))
market_value = parse_decimal(any_of(simplefin_holding, %w[market_value current_value]))
raw_cost_basis, cost_basis_source_key = cost_basis_from(simplefin_holding)
cost_basis = normalize_cost_basis(raw_cost_basis, qty, cost_basis_source_key)
cost_basis = normalize_cost_basis(raw_cost_basis, qty, cost_basis_source_key, institution_reports_total_basis?)
# Derive price from market_value when possible; otherwise fall back to any price field
fallback_price = parse_decimal(any_of(simplefin_holding, %w[purchase_price price unit_price average_cost avg_cost]))
@@ -124,19 +124,50 @@ class SimplefinAccount::Investments::HoldingsProcessor
[ nil, nil ]
end
# Sure stores holding cost_basis as per-share average cost. Some SimpleFIN
# providers expose total position basis via total_cost/value, so normalize only
# when the selected provider field is known to represent total position basis.
def normalize_cost_basis(raw_cost_basis, qty, source_key)
# Sure stores holding cost_basis as per-share average cost. SimpleFIN
# brokerages are inconsistent about which field carries which shape:
#
# - total_cost / value: always a total position cost per the SimpleFIN
# spec and observed payloads; divide by qty unconditionally.
# - cost_basis / basis: the spec calls this per-share, and most
# brokerages comply. Keep these values unchanged by default.
#
# Exception: a small allowlist of brokerages (Vanguard, Fidelity) is
# known to populate cost_basis with the total position cost in violation
# of the spec (#1718, #1182). For those connections only, divide by qty.
#
# An earlier revision of this fix used a magnitude heuristic
# (share_price × √qty midpoint). It was withdrawn because a legitimate
# per-share basis on a holding with a large unrealized loss
# (e.g. 100 shares with basis $100 now worth $5) trips the midpoint and
# gets mis-divided to $1/share — corrupting compliant providers. The
# allowlist trades some manual maintenance for that safety.
def normalize_cost_basis(raw_cost_basis, qty, source_key, total_basis_institution = false)
return nil if raw_cost_basis.nil?
if %w[total_cost value].include?(source_key)
if %w[total_cost value].include?(source_key) ||
(total_basis_institution && %w[cost_basis basis].include?(source_key))
return nil unless qty.to_d.positive?
raw_cost_basis / qty
else
raw_cost_basis
return raw_cost_basis / qty
end
raw_cost_basis
end
# Institutions known to populate the SimpleFIN `cost_basis` / `basis`
# field with the total position cost rather than the per-share value the
# spec requires. Matched as case-insensitive substrings against the
# account's stored org name and domain.
TOTAL_BASIS_INSTITUTIONS = %w[vanguard fidelity].freeze
def institution_reports_total_basis?
org = simplefin_account.respond_to?(:org_data) ? simplefin_account.org_data : nil
return false if org.blank?
candidates = [ org["name"], org[:name], org["domain"], org[:domain] ].compact.map(&:to_s).map(&:downcase)
return false if candidates.empty?
TOTAL_BASIS_INSTITUTIONS.any? { |needle| candidates.any? { |c| c.include?(needle) } }
end
def resolve_security(symbol, description)

View File

@@ -80,6 +80,69 @@ class SimplefinAccount::Investments::HoldingsProcessorTest < ActiveSupport::Test
assert_equal "total_cost", source_key
end
test "cost_basis from a known total-basis institution is divided by qty" do
# Issue #1718 / #1182: Vanguard populates cost_basis with the total
# position cost. When the institution is on the allowlist we divide.
cost_basis = @processor.send(
:normalize_cost_basis,
BigDecimal("22004.40"),
BigDecimal("139.00"),
"cost_basis",
true # institution_reports_total_basis?
)
assert_in_delta 158.30, cost_basis.to_f, 0.01
end
test "basis from a known total-basis institution is divided by qty" do
cost_basis = @processor.send(
:normalize_cost_basis,
BigDecimal("9000.00"),
BigDecimal("200"),
"basis",
true
)
assert_equal BigDecimal("45.00"), cost_basis
end
test "cost_basis from a compliant institution is kept untouched (no false divide)" do
# Codex regression: a legitimate per-share basis on a holding with a
# large unrealized loss (e.g. $100/share basis now worth $5/share) must
# NOT be divided by qty. Per the SimpleFIN spec, cost_basis is per-share
# — only the institution allowlist should override that.
cost_basis = @processor.send(
:normalize_cost_basis,
BigDecimal("100.00"),
BigDecimal("100"),
"cost_basis",
false
)
assert_equal BigDecimal("100.00"), cost_basis
end
test "institution_reports_total_basis? matches Vanguard and Fidelity org metadata" do
cases = {
{ "name" => "Vanguard" } => true,
{ "name" => "VANGUARD BROKERAGE" } => true,
{ "name" => "Fidelity Investments" } => true,
{ "domain" => "vanguard.com" } => true,
{ "domain" => "401k.fidelity.com" } => true,
{ "name" => "Charles Schwab", "domain" => "schwab.com" } => false,
{ "name" => "Chase" } => false,
{} => false
}
cases.each do |org, expected|
account = Struct.new(:org_data).new(org)
processor = SimplefinAccount::Investments::HoldingsProcessor.new(account)
assert_equal expected,
processor.send(:institution_reports_total_basis?),
"org_data #{org.inspect} expected #{expected}"
end
end
test "missing cost basis fields return nil" do
payload = {
"market_value" => "10108.16"