Files
sure/app/models/import.rb
Guillem Arias Fauste 1ddd8bd040 feat(i18n): complete Catalan translations + extract residual hardcoded strings (#1836)
* feat(i18n): complete Catalan translations + extract residual hardcoded strings

CA coverage
- All view/model/breadcrumb/doorkeeper/mailer locale files for ca: 0 missing
  keys (was ~3,400). Translations follow informal "tu" register, sentence case,
  domain glossary (Compte/Saldo/Transacció/Posició/Operació/Pressupost/...).
- Catalan pluralization test: ca uses one/other; mirrors
  test/lib/polish_pluralization_test.rb.
- 8 LanguageTool-flagged grammar fixes applied (Connexió òrfena, Secret de
  l'API, comma-pero, apostrophe elisions, etc).

Hardcoded string extraction (also fixes EN parity)
- UI::Account::Chart#title + chart.html.erb view tabs -> UI.account.chart.*
- UI::Account::BalanceReconciliation labels + tooltips ->
  UI.account.balance_reconciliation.{labels,tooltips}.*
- transactions/_transfer_match.html.erb (Auto-matched, A/M, Confirm/Reject
  match, Payment/Transfer is confirmed) -> transactions.transfer_match.*
- AccountOrder labels (Name/Balance asc/desc) -> account_order.* keys with
  fallback to existing hardcoded labels.
- Depository::SUBTYPES surface in account list -> depositories.subtypes.*.*
- User role badge -> users.roles.* (admin / member / super_admin).
- 110+ country names -> countries.* (config/locales/countries.ca.yml).

Breadcrumb locale fix
- Breadcrumbable was a before_action that ran before Localize's around_action
  switched I18n.locale, so default crumbs rendered in EN even when locale=ca.
- Convert to helper_method that defers translation to render-time (when
  I18n.locale is already correct). Add all missing breadcrumb keys to ca + en.
- Layouts switched from @breadcrumbs to breadcrumbs helper.

Locale-aware helpers / formatters
- ApplicationHelper#localized_ordinal: ordinalize that respects ca
  (1r/2n/3r/4t/Nè). Wired into preferences month_start_day select.
- Family#moniker_label / moniker_label_plural: translate the default "Family"/
  "Group" monikers via shared.family_moniker.* with fallback to the family's
  custom override.
- Budget#name: use I18n.l for month_year/short/long instead of strftime("%B %Y")
  so the budget header date follows the active locale.

Tooling
- script/lt_check_ca.rb: batched LanguageTool checker (premium endpoint when
  LT_USERNAME/LT_API_KEY are set, free fallback otherwise), picky mode,
  motherTongue=en for false-friend detection.
- lib/tasks/i18n_screenshot.rake: dev-only rake to set user.locale=ca and
  role=super_admin on the demo user so the i18n surfaces can be walked.

Out of scope (pre-existing, not introduced here)
- Native browser file input "Choose Files / No file chosen" (browser locale).
- D3.js client-side chart x-axis dates (JS-side Intl.DateTimeFormat needed).
- Sankey/donut labels = seed category names (data, not i18n).
- 2 rails-i18n datetime/errors interpolation warnings inherited from
  config/locales/defaults/ca.yml.

* fix(i18n): apply idiomatic Catalan review (3-agent + native review)

Three parallel review agents flagged 203 findings (31 high / 73 medium / 99 low)
across all 111 ca.yml files. This commit applies the high-severity bugs plus a
curated subset of medium-impact fixes.

Grammar / agreement
- provider_sync_summary.health.stale_pending: `(exclòs)` -> `(exclosa/excloses)`
  to agree with feminine `transacció(s)`.
- accounts.confirm_unlink.warning_no_sync: added reflexive `es` -
  `el compte ja no es sincronitzarà`.
- sophtron_setup_required.heading: `no configurats` -> `sense configurar`
  (avoids broken agreement across "ID" masc. + "clau" fem.).
- admin.sso_providers.form.errors_title: split into one/other pluralization
  keys (en + ca); singular `ha impedit` was wrong for count > 1.

Brand consistency
- IndexaCapital -> Indexa Capital (37 occurrences across one file).
- Lunchflow -> Lunch Flow in two remaining places.

Anglicisms / domain mistranslations
- kraken_items setup_accounts.instructions: `ompliments d'operacions`
  (lit. dental/food fillings) -> `execucions d'operacions`.
- settings kraken_panel.read_only_title: `Sincronització d'intercanvi`
  (swap/trade) -> `Sincronització només de lectura amb l'exchange`.
- transactions convert_to_trade.security_custom + security_not_listed_hint:
  `cotització` (price quote) -> `ticker` (the EN field IS a ticker symbol).
- loans.form.rate_type: `Tipus d'interès` collided with sibling
  interest_rate -> `Modalitat del tipus`.
- brex_items.provider_panel.sandbox_note_html: `L'staging` (broken
  contraction) -> `el staging`.

Idiom traps
- coinbase/binance/kraken wait_for_sync: `acabi de sincronitzar` is
  ambiguous in CA (`acabar de + inf` reads as "has just done X") ->
  `acabi la sincronització`.
- chats.ai_greeting.there: `a tothom` -> `''` (the EN fallback "Hey there"
  is singular; literal CA `tothom` is plural and wrong for 1:1 chat).
- transactions.split_parent_row.split_label: `Divideix` (imperative) is
  wrong as a status badge -> `Divisió` (noun).
- transactions.keep_both (2 occurrences): infinitive `mantenir ambdues` ->
  imperative `mantén-les totes dues` to match the sibling Yes/No buttons.
- rules.clear_ai_cache: `Reinicia` (restart) -> `Buida` (empty/clear),
  which matches the success notice (`s'està netejant`).

Moniker gender breakage (cross-file)
%{moniker} is interpolated downcased from family.moniker_label and may
resolve to feminine `família`/`llar` or masculine `grup`. Strings that
hard-code a gendered article ('al teu %{moniker}', 'aquesta %{moniker}',
'aquest/a %{moniker}') broke on at least one branch. Restructured the
affected sentences to drop the gendered determiner:

- account_sharings.show.no_members
- merchants.family_empty / family_title / provider_empty
- registrations.new.join_family_title
- settings.preferences.show.currencies_subtitle / sharing_subtitle
- simplefin_items.select_existing_account.no_accounts_found
- invitations.new.subtitle
- invitation_mailer.invite_email.subject (mailers/) + body (views/)
- snaptrade_items.providers.snaptrade.free_tier_warning

Terminology consistency
- models/account_statement/ca.yml attributes aligned with view-side
  forms: `Saldo d'obertura`/`Saldo de tancament` ->
  `Saldo inicial`/`Saldo final`; `Suggeriment de...` -> `Pista de...`.
- account_statements.coverage.status.not_expected:
  `No s'esperava` -> `No previst` (status label, not past action).
- account_statements.index.empty_unmatched: aligned with the section's
  own label `Safata sense aparellar`.
- imports.create.document_provider_not_configured + document_upload_failed:
  `arxiu vectorial` -> `magatzem vectorial` (correct TermCat term).
- coinstats_items blockchain gender: `els blockchains` / `un blockchain` ->
  `les blockchains` / `una blockchain` (feminine per TermCat).
- accounts.account.remove_default: `Treu el predeterminat` ->
  `Treu com a predeterminat` (pairs with sibling `Estableix com a
  predeterminat`).
- accounts.tax_treatments.tax_deferred: `Diferit fiscalment` (lit. calque)
  -> `Tributació diferida` (standard CA tax-accounting term).
- settings.payments.show.currently_on_plan: `Actualment al` ->
  `Actualment al pla:` (was a fragment).

Out of scope (review flagged, not applied here)
- LOW-severity stylistic preferences (Veure vs Mostra, etc).
- `models/category/ca.yml` default category names — seeded at family
  creation, not via I18n at runtime, so changes wouldn't affect existing
  families.
- `models/period/ca.yml` short labels mixing EN (MTD/YTD) and CA (STD/MA)
  — needs a one-convention decision separately.

* fix(i18n,ca): drop gendered article in period_activity + tighten cash-flow terms

- pages.dashboard.investment_summary.period_activity: 'Activitat del
  %{period}' contracted 'del' = 'de el' (masc.sg.). %{period} resolves
  to mixed forms ('Setmana en curs' fem, 'Últims 30 dies' pl., 'Any en
  curs' apostrophe), so hard-coded 'del' was wrong on most labels.
  Replaced with 'Activitat — %{period}' (em-dash) to skip the
  contraction entirely.
- pages.dashboard.outflows_donut.title / total_outflows: switched from
  bare 'Sortides' / 'Total de sortides' to 'Sortides de caixa' /
  'Total de sortides de caixa' to match TermCat's precise term
  ('sortida de caixa' = cash outflow).

* fix(i18n,ca): rephrase transfer source/destination amount labels

'Import d'origen' / 'Import de destinació' were literal calques of
'Source amount' / 'Destination amount'. In a multi-currency transfer
form (sender/receiver in different currencies) the natural CA pair is
'Import enviat' / 'Import rebut'.

* fix(i18n,ca): 'Dades en brut' -> 'Dades sense processar'

The literal calque of 'Raw data' read as too technical for personal-
finance UI. 'Dades sense processar' is the more natural Catalan
equivalent for raw/unprocessed data files.

* fix(i18n): localize Import col_sep label + separator options

The CSV upload form rendered 'Col sep' (the auto-humanized attribute
name) plus hardcoded English 'Comma (,)' / 'Semicolon (;)' options
from Import::SEPARATORS.

- activerecord.attributes.import.col_sep added (en + ca: 'Column
  separator' / 'Separador de columnes').
- Import.separator_options class method returns translated tuples;
  view switched from Import::SEPARATORS to Import.separator_options.
- activerecord.attributes.import.col_seps.{comma,semicolon} added so
  the option labels follow the active locale.

* fix(i18n,ca): drop moniker apposition in sharing/currencies section titles

- sharing_title 'Compartició de %{moniker}' rendered as 'Compartició
  de Família' (a noun-noun apposition that's odd in CA) -> 'Compartició
  de comptes'.
- sharing_subtitle replaced '%{moniker}' with 'entre els membres' so
  the sentence reads naturally and doesn't depend on moniker gender.
- currencies_title 'Divises de %{moniker}' had the same apposition
  -> 'Divises'. Subtitle no longer references moniker either.

* fix(i18n,ca): keep 'Self Hosting' untranslated

Reverted 'Autoallotjament' / 'autoallotjada' / 'autoallotjats' usages
to the original English 'Self Hosting' (sidebar label, breadcrumbs,
hostings page title, chat assistant settings hint, redis configuration
subheading, LLM usages cost-estimates description).

The brand-style term reads more naturally in EN for technical users
configuring their own deployment.

* fix(i18n,ca): lowercase 'self hosting' (sentence case in labels)

* fix(i18n): extract budget_categories stepper + allocation_progress strings

Hardcoded English strings on the budget category editor:
- 'Setup' / 'Categories' stepper labels in budgets/_budget_nav.html.erb
- 'X% set' / '> 100% set' / 'left to allocate' / 'Budget exceeded by ...'
  in budget_categories/_allocation_progress.erb
- '/m avg' caption + 'Shared' placeholder + 'Leave empty to share
  parent's budget' tooltip in budget_categories/_budget_category_form
  and _uncategorized_budget_category_form

Extracted to:
- budgets.budget_nav.{setup,categories}
- budget_categories.allocation_progress.{percent_set,over_set,left_to_allocate,budget_exceeded_html}
- budget_categories.budget_category_form.{monthly_average,shared_placeholder,shared_title}

CA translations added; EN keys mirror the prior literals.

* chore(i18n): drop translation tooling from PR

These were dev-only helpers used during the Catalan translation pass:

- script/lt_check_ca.rb: LanguageTool API checker (premium/free
  endpoint, picky mode, batching). Useful for ongoing locale QA but
  shouldn't ship in this feature PR.
- lib/tasks/i18n_screenshot.rake: rake task that flips user.locale and
  role on the demo user for walking the i18n surfaces locally.

Both stay available locally; pulled out of the PR scope.

* fix(i18n): apply PR review feedback (CodeRabbit + Codex)

- balance_reconciliation crypto_items: use :end_balance_crypto tooltip
  (was :end_balance_investment). Added new UI.account.balance_reconciliation.tooltips.end_balance_crypto key in en + ca.
- doorkeeper.ca.yml confidentiality.no: was YAML boolean false, now string 'No'.
- views/categories: 'Poor contrast, choose darker color or' continued with hardcoded 'auto-adjust.' button text; extracted to categories.form.auto_adjust key (en + ca).
- imports.create.document_upload_failed: 'a l'magatzem' was broken
  contraction -> 'al magatzem'.
- invitation_mailer body + mailer subject: 'unir-se' -> 'unir-te' (was
  3rd person, should be 2nd to match the rest of the copy).
- 7 strings across mercury_items / sophtron_items / simplefin_items /
  lunchflow_items / brex_items / indexa_capital_items / other_assets:
  'se sincronitzaran' -> 'es sincronitzaran', 'se segueixen' ->
  'es segueixen' (correct reflexive pronoun before consonants).
- settings.providers.status: key was 'false' (YAML-coerced), now 'off'
  to match settings/en.yml status.off used in view lookups.
- sophtron_items.sophtron_setup_required.message: stripped trailing
  blank line from the quoted scalar.
- settings/profiles/show.html.erb: switched 'family_moniker ==
  "Group"' branch checks to 'Current.family&.moniker == "Group"'.
  After Family#moniker_label started returning translated values,
  callers using the display label for branching would render the
  household copy for group families in ca. Compare the stored sentinel
  instead.
- Did not apply CodeRabbit's webauthn 'eliminada' -> 'desada' suggestion:
  the key is wired to the destroy action (verified at
  settings/webauthn_credentials_controller.rb:55), so 'eliminada' is
  correct.
2026-05-19 13:37:10 +02:00

577 lines
17 KiB
Ruby

class Import < ApplicationRecord
MaxRowCountExceededError = Class.new(StandardError)
MappingError = Class.new(StandardError)
# Shared CSV upload/content limit for web and API imports, including preflight.
MAX_CSV_SIZE = 10.megabytes
MAX_PDF_SIZE = 25.megabytes
ALLOWED_CSV_MIME_TYPES = %w[text/csv text/plain application/vnd.ms-excel application/csv].freeze
ALLOWED_PDF_MIME_TYPES = %w[application/pdf].freeze
DOCUMENT_TYPES = %w[bank_statement credit_card_statement investment_statement financial_document contract other].freeze
TYPES = %w[TransactionImport TradeImport AccountImport MintImport ActualImport CategoryImport RuleImport PdfImport QifImport SureImport].freeze
SIGNAGE_CONVENTIONS = %w[inflows_positive inflows_negative]
SEPARATORS = [ [ "Comma (,)", "," ], [ "Semicolon (;)", ";" ] ].freeze
def self.separator_options
[
[ I18n.t("activerecord.attributes.import.col_seps.comma"), "," ],
[ I18n.t("activerecord.attributes.import.col_seps.semicolon"), ";" ]
]
end
NUMBER_FORMATS = {
"1,234.56" => { separator: ".", delimiter: "," }, # US/UK/Asia
"1.234,56" => { separator: ",", delimiter: "." }, # Most of Europe
"1 234,56" => { separator: ",", delimiter: " " }, # French/Scandinavian
"1,234" => { separator: "", delimiter: "," } # Zero-decimal currencies like JPY
}.freeze
def self.reasonable_date_range
Date.new(1970, 1, 1)..Date.today.next_year(5)
end
def self.max_csv_size
MAX_CSV_SIZE
end
AMOUNT_TYPE_STRATEGIES = %w[signed_amount custom_column].freeze
belongs_to :family
belongs_to :account, optional: true
before_validation :set_default_number_format
before_validation :ensure_utf8_encoding
scope :ordered, -> { order(created_at: :desc) }
enum :status, {
pending: "pending",
complete: "complete",
importing: "importing",
reverting: "reverting",
revert_failed: "revert_failed",
failed: "failed"
}, validate: true, default: "pending"
validates :type, inclusion: { in: TYPES }
validates :amount_type_strategy, inclusion: { in: AMOUNT_TYPE_STRATEGIES }
validates :col_sep, inclusion: { in: SEPARATORS.map(&:last) }
validates :signage_convention, inclusion: { in: SIGNAGE_CONVENTIONS }, allow_nil: true
validates :number_format, presence: true, inclusion: { in: NUMBER_FORMATS.keys }
validate :custom_column_import_requires_identifier
validates :rows_to_skip, numericality: { only_integer: true, greater_than_or_equal_to: 0 }
validate :account_belongs_to_family
validate :rows_to_skip_within_file_bounds
has_many :rows, dependent: :destroy
has_many :mappings, dependent: :destroy
has_many :accounts, dependent: :destroy
has_many :entries, dependent: :destroy
class << self
def parse_csv_str(csv_str, col_sep: ",")
CSV.parse(
(csv_str || "").strip,
headers: true,
col_sep: col_sep,
converters: [ ->(str) { str&.strip } ],
liberal_parsing: true
)
end
# Attempts to identify the best-matching date format from a list of candidates
# by trying to parse sample date strings with each format.
#
# Returns the strptime format string (e.g. "%m-%d-%Y") that best matches the
# samples, or the +fallback+ when no candidate can parse any sample.
#
# Scoring:
# 1. Formats that parse ALL samples beat those that only parse some.
# 2. Among equal parse counts, formats whose parsed dates fall within a
# reasonable range (1970..today+5y) score higher.
def detect_date_format(samples, candidates: Family::DATE_FORMATS.map(&:last), fallback: "%Y-%m-%d")
return fallback if samples.blank?
cleaned = samples.map(&:to_s).reject(&:blank?).uniq.first(50)
return fallback if cleaned.empty?
reasonable_range = reasonable_date_range
scored = candidates.map do |fmt|
parsed_count = 0
reasonable_count = 0
cleaned.each do |s|
begin
date = Date.strptime(s, fmt)
rescue Date::Error, ArgumentError
next
end
next unless date
parsed_count += 1
reasonable_count += 1 if reasonable_range.cover?(date)
end
{ format: fmt, parsed: parsed_count, reasonable: reasonable_count }
end
# Filter to candidates that parsed at least one sample
viable = scored.select { |s| s[:parsed] > 0 }
return fallback if viable.empty?
best = viable.max_by { |s| [ s[:parsed], s[:reasonable] ] }
best[:format]
end
end
def publish_later
raise MaxRowCountExceededError if row_count_exceeded?
raise "Import is not publishable" unless publishable?
update! status: :importing
ImportJob.perform_later(self)
end
def publish
raise MaxRowCountExceededError if row_count_exceeded?
import!
family.sync_later
update! status: :complete
rescue => error
update! status: :failed, error: error.message
end
def revert_later
raise "Import is not revertable" unless revertable?
update! status: :reverting
RevertImportJob.perform_later(self)
end
def revert
Import.transaction do
accounts.destroy_all
entries.destroy_all
end
family.sync_later
update! status: :pending
rescue => error
update! status: :revert_failed, error: error.message
end
def csv_rows
@csv_rows ||= parsed_csv
end
def csv_headers
parsed_csv.headers
end
def csv_sample
@csv_sample ||= parsed_csv.first(2)
end
def dry_run
mappings = {
transactions: rows_count,
categories: Import::CategoryMapping.for_import(self).creational.count,
tags: Import::TagMapping.for_import(self).creational.count
}
mappings.merge(
accounts: Import::AccountMapping.for_import(self).creational.count,
) if account.nil?
mappings
end
def required_column_keys
[]
end
# Returns false for import types that don't need CSV column mapping (e.g., PdfImport).
# Override in subclasses that handle data extraction differently.
def requires_csv_workflow?
true
end
# Subclasses that require CSV workflow must override this.
# Non-CSV imports (e.g., PdfImport) can return [].
def column_keys
raise NotImplementedError, "Subclass must implement column_keys"
end
def generate_rows_from_csv
rows.destroy_all
mapped_rows = csv_rows.map.with_index(1) do |row, index|
{
source_row_number: index,
account: csv_value(row, account_col_label, "account", "account_name").to_s,
date: csv_value(row, date_col_label, "date").to_s,
qty: sanitize_number(csv_value(row, qty_col_label, "qty", "quantity")).to_s,
ticker: csv_value(row, ticker_col_label, "ticker").to_s,
exchange_operating_mic: csv_value(row, exchange_operating_mic_col_label, "exchange_operating_mic").to_s,
price: sanitize_number(csv_value(row, price_col_label, "price")).to_s,
amount: sanitize_number(csv_value(row, amount_col_label, "amount", "balance")).to_s,
currency: (csv_value(row, currency_col_label, "currency") || default_currency).to_s,
name: (csv_value(row, name_col_label, "name") || default_row_name).to_s,
category: csv_value(row, category_col_label, "category").to_s,
tags: csv_value(row, tags_col_label, "tags").to_s,
entity_type: csv_value(row, entity_type_col_label, "entity_type", "account_type", "type").to_s,
notes: csv_value(row, notes_col_label, "notes").to_s
}
end
rows.insert_all!(mapped_rows)
update_column(:rows_count, rows.count)
end
def sync_mappings
transaction do
mapping_steps.each do |mapping_class|
mappables_by_key = mapping_class.mappables_by_key(self)
updated_mappings = mappables_by_key.map do |key, mappable|
mapping = mappings.find_or_initialize_by(key: key, import: self, type: mapping_class.name)
mapping.mappable = mappable
mapping.create_when_empty = key.present? && mappable.nil?
mapping
end
updated_mappings.each { |m| m.save(validate: false) }
mapping_class.where.not(id: updated_mappings.map(&:id)).destroy_all
end
end
end
def mapping_steps
[]
end
def rows_ordered
rows.ordered
end
def uploaded?
raw_file_str.present?
end
def configured?
uploaded? && rows_count > 0
end
def configured_for_status_detail?
configured?
end
def cleaned?
configured? && rows.all?(&:valid?)
end
def publishable?
cleaned? && mappings.all?(&:valid?)
end
def cleaned_from_validation_stats?(invalid_rows_count:)
configured? && invalid_rows_count.zero?
end
def publishable_from_validation_stats?(invalid_rows_count:)
cleaned_from_validation_stats?(invalid_rows_count: invalid_rows_count) && mappings.all?(&:valid?)
end
def mapping_status_counts
mappable_ids = mappings.pluck(:mappable_id)
{
mappings_count: mappable_ids.size,
unassigned_mappings_count: mappable_ids.count(&:nil?)
}
end
def revertable?
complete? || revert_failed?
end
def has_unassigned_account?
mappings.accounts.where(key: "").any?
end
def requires_account?
family.accounts.empty? && has_unassigned_account?
end
# Used to optionally pre-fill the configuration for the current import
def suggested_template
family.imports
.complete
.where(account: account, type: type)
.order(created_at: :desc)
.first
end
def apply_template!(import_template)
update!(
import_template.attributes.slice(
"date_col_label", "amount_col_label", "name_col_label",
"category_col_label", "tags_col_label", "account_col_label",
"qty_col_label", "ticker_col_label", "price_col_label",
"entity_type_col_label", "notes_col_label", "currency_col_label",
"date_format", "signage_convention", "number_format",
"exchange_operating_mic_col_label",
"rows_to_skip"
)
)
end
# Returns date formats that can successfully parse the file's date samples,
# filtered to dates within reasonable_date_range.
# Result: array of { label:, format:, preview: } hashes.
# Subclasses should override #raw_date_samples to provide date strings.
def valid_date_formats_with_preview
first_sample = raw_date_samples.find(&:present?)
return [] if first_sample.blank?
Family::DATE_FORMATS.filter_map do |label, fmt|
parsed = try_parse_date_sample(first_sample, format: fmt)
next unless parsed
next unless self.class.reasonable_date_range.cover?(Date.parse(parsed))
{ label: label, format: fmt, preview: parsed }
end
end
# Returns raw date strings from the import file for format detection/preview.
# Subclasses should override to extract dates from their specific format.
def raw_date_samples
[]
end
# Attempts to parse a raw date sample with the given strptime format.
# Returns ISO 8601 date string or nil. Subclasses can override for
# format-specific normalization (e.g. QIF apostrophe dates).
def try_parse_date_sample(sample, format:)
Date.strptime(sample, format).iso8601
rescue Date::Error, ArgumentError
nil
end
def max_row_count
10000
end
private
def row_count_exceeded?
rows_count > max_row_count
end
def import!
# no-op, subclasses can implement for customization of algorithm
end
def default_row_name
"Imported item"
end
def default_currency
account&.currency || family.currency
end
def csv_value(row, label, *aliases)
return if label.blank?
[ label, *aliases ].each do |candidate|
header = header_for(candidate)
next if header.blank?
value = row[header]
return value if value.present?
end
nil
end
def header_for(candidate)
return if candidate.blank?
normalized_csv_headers[normalize_header(candidate)]
end
def normalized_csv_headers
@normalized_csv_headers ||= begin
grouped_headers = Array(csv_headers)
.filter_map do |header|
normalized = normalize_header(header)
next if normalized.blank?
[ normalized, header ]
end
.group_by(&:first)
duplicate_headers = grouped_headers.values.filter_map do |headers|
originals = headers.map(&:last).uniq
originals if originals.many?
end
if duplicate_headers.any?
errors.add(:base, :duplicate_headers, columns: duplicate_headers.map { |headers| headers.join(", ") }.join("; "))
raise ActiveRecord::RecordInvalid, self
end
grouped_headers.transform_values { |headers| headers.first.last }
end
end
def normalize_header(header)
header.to_s.strip.downcase.gsub(/\*/, "").gsub(/[\s-]+/, "_")
end
def parsed_csv
return @parsed_csv if defined?(@parsed_csv)
csv_content = raw_file_str || ""
if rows_to_skip.to_i > 0
csv_content = csv_content.lines.drop(rows_to_skip).join
end
@parsed_csv = self.class.parse_csv_str(csv_content, col_sep: col_sep)
end
def sanitize_number(value)
return "" if value.nil?
format = NUMBER_FORMATS[number_format]
return "" unless format
# First, normalize spaces and remove any characters that aren't numbers, delimiters, separators, or minus signs
sanitized = value.to_s.strip
# Handle French/Scandinavian format specially
if format[:delimiter] == " "
sanitized = sanitized.gsub(/\s+/, "") # Remove all spaces first
else
sanitized = sanitized.gsub(/[^\d#{Regexp.escape(format[:delimiter])}#{Regexp.escape(format[:separator])}\-]/, "")
# Replace delimiter with empty string
if format[:delimiter].present?
sanitized = sanitized.gsub(format[:delimiter], "")
end
end
# Replace separator with period for proper float parsing
if format[:separator].present?
sanitized = sanitized.gsub(format[:separator], ".")
end
# Return empty string if not a valid number
unless sanitized =~ /\A-?\d+\.?\d*\z/
return ""
end
sanitized
end
def set_default_number_format
self.number_format ||= "1,234.56" # Default to US/UK format
end
def custom_column_import_requires_identifier
return unless amount_type_strategy == "custom_column"
if amount_type_inflow_value.blank?
errors.add(:base, I18n.t("imports.errors.custom_column_requires_inflow"))
end
end
# Common encodings to try when UTF-8 detection fails
# Windows-1250 is prioritized for Central/Eastern European languages
COMMON_ENCODINGS = [ "Windows-1250", "Windows-1252", "ISO-8859-1", "ISO-8859-2" ].freeze
def ensure_utf8_encoding
# Handle nil or empty string first (before checking if changed)
return if raw_file_str.nil? || raw_file_str.bytesize == 0
# Only process if the attribute was changed
# Use will_save_change_to_attribute? which is safer for binary data
return unless will_save_change_to_raw_file_str?
# If already valid UTF-8, nothing to do
begin
if raw_file_str.encoding == Encoding::UTF_8 && raw_file_str.valid_encoding?
return
end
rescue ArgumentError
# raw_file_str might have invalid encoding, continue to detection
end
# Detect encoding using rchardet
begin
require "rchardet"
detection = CharDet.detect(raw_file_str)
detected_encoding = detection["encoding"]
confidence = detection["confidence"]
# Only convert if we have reasonable confidence in the detection
if detected_encoding && confidence > 0.75
# Force encoding and convert to UTF-8
self.raw_file_str = raw_file_str.force_encoding(detected_encoding).encode("UTF-8", invalid: :replace, undef: :replace)
else
# Fallback: try common encodings
try_common_encodings
end
rescue LoadError
# rchardet not available, fallback to trying common encodings
try_common_encodings
rescue ArgumentError, Encoding::CompatibilityError => e
# Handle encoding errors by falling back to common encodings
try_common_encodings
end
end
def try_common_encodings
COMMON_ENCODINGS.each do |encoding|
begin
test = raw_file_str.dup.force_encoding(encoding)
if test.valid_encoding?
self.raw_file_str = test.encode("UTF-8", invalid: :replace, undef: :replace)
return
end
rescue Encoding::InvalidByteSequenceError, Encoding::UndefinedConversionError
next
end
end
# If nothing worked, force UTF-8 and replace invalid bytes
self.raw_file_str = raw_file_str.force_encoding("UTF-8").scrub("?")
end
def account_belongs_to_family
return if account.nil?
return if account.family_id == family_id
errors.add(:account, "must belong to your family")
end
def rows_to_skip_within_file_bounds
return if raw_file_str.blank?
return if rows_to_skip.to_i == 0
line_count = raw_file_str.lines.count
if rows_to_skip.to_i >= line_count
errors.add(:rows_to_skip, "must be less than the number of lines in the file (#{line_count})")
end
end
end