Compare commits

...

39 Commits

Author SHA1 Message Date
Mike Bridge
a0193dbab6 fix(versioning): replace session-state guard with InvalidRequestError catch
The previous attempt (d0520f6766) was too aggressive: skipping when
parent is in session.dirty/new/deleted bypassed the
persistent-and-clean case the hook EXISTS for. Some upstream code
paths put the dataset in session.dirty *before* this listener fires
(API controllers touching audit fields, etc.), so the
session-membership pre-check made us silently no-op on the very
scenario the hook needs to handle. CI symptom:
test_dataset_column_edit_creates_parent_version showed before=317,
after=317 (parent shadow not written).

Restore the unconditional flag_modified and catch the specific
InvalidRequestError that fires only for the session.new case
(uuid default callable hasn't populated state yet). Other states
fall through to the original behavior:
- persistent + clean → flag_modified succeeds, parent goes dirty,
  Continuum picks it up, SkipUnmodifiedPlugin keeps the row via
  _has_dirty_versioned_children. ✓
- persistent + dirty → flag_modified is harmless (already dirty).
- session.new → InvalidRequestError, skip (parent INSERTs anyway).
- session.deleted → flag_modified may or may not raise; if it does,
  we skip; if not, the delete dominates.

Should unblock test_dataset_column_edit_creates_parent_version,
test_get_version_returns_historical_snapshot_with_children, and
test_restore_with_column_edits_reverts_columns.
2026-05-20 16:47:12 -06:00
Mike Bridge
4c99cd68b6 chore(versioning): apply pre-commit autofixes (ruff + auto-walrus)
- ruff: import sort + E501 reflow on the parent-state guard in
  baseline.py
- ruff format: function-signature collapse and join-chain reflow in
  queries.py
- auto-walrus: two ``entity_kind = …; if … is not None:`` patterns
  in queries.py converted to assignment-expressions
2026-05-20 16:18:55 -06:00
Mike Bridge
0c79581ee9 fix(versioning): retention task FK violation on cross-entity transactions
When one ORM flush touches multiple versioned entities (dashboard +
slice + dataset all save at tx=X), each gets a shadow row sharing
that tx. If only the dashboard is later edited at tx=Y, the
dashboard row at tx=X is closed (end_tx=Y) while slice/dataset rows
stay live at tx=X. Retention then preserves tx=X (slice/dataset are
live there) and prunes tx=Y. The dashboard's closed row at tx=X
survives step 1, then its end_transaction_id=Y trips the FK when
step 2 deletes version_transaction row Y.

Fix: extend the shadow-row delete to also match end_transaction_id
IN tx_ids. Live rows have end_tx=NULL so they're never matched by
either predicate. Closed rows that touch a pruned tx at either
endpoint are pruned together — consistent with retention semantics
(any tx in the row's lifespan is gone, so the row's chain is broken
anyway).

Unblocks test_retention_prunes_old_rows on sqlite, mysql, postgres.
2026-05-20 16:17:21 -06:00
Mike Bridge
d0520f6766 fix(versioning): skip non-clean parents in force-parent-dirty hook
The force-parent-dirty listener was calling attributes.flag_modified
on every parent reachable from a dirty child — including parents
themselves in session.new (e.g. brand-new SqlaTable + brand-new
TableColumns from POST /api/v1/dataset/). flag_modified rejects
unloaded attributes, and a session.new SqlaTable's uuid (default=uuid4
fires at flush time) is unloaded until then. CI caught this with
InvalidRequestError cascading into 422s across dataset creation /
upload / Playwright dataset specs.

The hook is only needed for the persistent-and-clean case (child
edited, parent's own scalars untouched, dropdown otherwise empty).
Anything in session.new will flush anyway; anything in session.dirty
is already flagged; session.deleted shouldn't be touched. Short-
circuit before the flag_modified call.

Unblocks test-sqlite, test-mysql, test-postgres (previous), and
playwright dataset specs.
2026-05-20 16:13:03 -06:00
Mike Bridge
77236afa14 refactor(versioning): apply cross-PR review feedback (#39977 H1/M3/M5)
Three small follow-ups surfaced by aminghadersohi's review of the
SoftDeleteMixin PR (#39977) that apply equally here:

- H1: cache _child_to_parent_registry() with functools.cache. Called
  twice per save flush; mapping depends only on import-time model
  classes, so unbounded cache is the right shape (no invalidation).
- M5: tighten _CHILD_BASELINE_HANDLERS type from dict[str, Any] to
  dict[str, Callable[[Session, Any, int], None]] via a named alias.
  Mypy now catches a future broken handler signature.
- M3/M4: explain the inline-import pattern once in the module
  docstrings of baseline.py and changes.py. Both modules use
  pylint disable=import-outside-toplevel uniformly because they
  load during init_versioning() before mappers are configured;
  the per-callsite "why" comments would just repeat the same
  reason. Module-level explanation + a hint to comment unusual
  cases is the cleaner shape.

M6 (listener placement) doesn't apply — init_versioning() already
runs inside init_app_in_ctx(). M8 (loose OpenAPI schema in
*/api.py docstrings) is real but its own change.
2026-05-20 14:12:02 -06:00
Mike Bridge
9d5a459840 docs(versioning): record why SkipUnmodifiedPlugin doesn't clean up orphan version_transaction rows inline
Extends the existing docstring note ("the orphan is swept by retention")
with the reasoning behind not cleaning it up in the same flush. The
inline-delete is appealing in principle but would couple this plugin
to the change-records listener's buffer state via the ON DELETE
CASCADE on ``version_changes.transaction_id``: both listeners would
have to agree that the flush produced nothing before the version_transaction
row could be dropped safely. The orphan's ~40-byte storage cost +
retention's correct-by-construction handling (orphans have no parent
shadow, so they're never in the "preserve" set) make the coordination
overhead not worth it.

Captures the design decision in the file where the next reader will
look for it.
2026-05-19 18:42:06 -06:00
Mike Bridge
1ac9e50836 tidy(versioning): reading-order shuffle in baseline.py (newspaper-article order)
Pure file shuffle, zero behaviour change. Reorders ``baseline.py`` so it
reads top-down by level of abstraction (newspaper-article rule): the
public entry point at the top, supporting helpers descending below.

Before: 14 private helpers, then ``register_baseline_listener`` at the
bottom. A reader opening the file met the leaf builders first and had
to accumulate context before finding the call site.

After (top-down):

  - Entry point: ``register_baseline_listener`` + inner ``capture_baseline``
  - High-level helpers used by ``capture_baseline``:
      ``_force_parent_dirty_on_child_change``,
      ``_collect_parents_to_baseline``,
      ``_child_to_parent_registry``,
      ``_version_table_for``,
      ``_shadow_row_count``,
      ``_insert_baseline_and_children``
  - Mid-level builders:
      ``_insert_baseline_row``,
      ``_baseline_children_for_parent``
  - Per-entity child handlers + their dispatch table:
      ``_baseline_dataset_children``,
      ``_baseline_dashboard_children``,
      ``_CHILD_BASELINE_HANDLERS``
  - Leaf builders:
      ``_insert_child_baseline_rows``,
      ``_baseline_attached_slices``,
      ``_insert_synthetic_slice_baseline``

Three section-divider comments mark the abstraction levels. The
``_CHILD_BASELINE_HANDLERS`` dict literal stays after its referenced
handlers (module-level literals evaluate at import time and need names
already bound); a comment now flags this constraint.

Function bodies are byte-for-byte unchanged; ``git log -L`` on any
function shows only its relocation. 96 unit tests pass.
2026-05-19 18:42:06 -06:00
Mike Bridge
80b8891e39 tidy(versioning): extract read_row_outside_flush helper
baseline.py:_insert_baseline_row and changes.py:_read_pre_state both
issued the same "read a single row through ``session.connection()``
inside ``with session.no_autoflush:``" pattern. Same five-line block,
same intent ("read the pre-flush state without triggering the in-flight
edit's flush").

Promoted to ``superset.versioning.utils.read_row_outside_flush(session,
table, entity_id)``. Companion to ``single_flush_scope`` — they sit
next to each other in utils.py and frame the two directions of the
"don't autoflush mid-listener" pattern.

Returns ``dict[str, Any]`` (or ``None``) so callers can't accidentally
hold a cursor-bound ``RowMapping`` past the listener boundary. Both
call sites get shorter by ~5 lines.

Also picks up Decimal stringification in the changes.py docstring
update (was listed in the W4 commit but the docstring still said
"(datetime, UUID, bytes)" — now matches the implementation).

Behaviour unchanged. 96 unit tests pass.
2026-05-19 18:42:06 -06:00
Mike Bridge
77c373616e tidy(versioning): extract shared helpers between list_versions and get_version
After the SRP split (8c9cf36) put both functions in the same module
~150 lines apart, their overlap became visible: same JOIN of
version_table → version_transaction → ab_user, same baseline-first
ordering, same user-row → ``changed_by`` projection, same lookup
``_ENTITY_KIND_BY_CLASS_NAME.get(model_cls.__name__)``. About 30 lines
of duplication.

Five small helpers extracted at the module top:

- ``_resolve_version_tables(model_cls)`` returns ``(ver_tbl, tx_tbl, user_tbl)``
- ``_version_with_tx_user_join(ver_tbl, tx_tbl, user_tbl)`` builds the join
- ``_baseline_first_ordering(ver_tbl)`` returns the order-by tuple
- ``_user_select_cols(user_tbl)`` returns the user-column list with
  ``user_id`` as the stable label (normalises the prior asymmetry
  where ``list_versions`` labelled it ``user_id`` and ``get_version``
  labelled it ``_user_id`` to dodge a column-name collision — the
  ``user_id`` label collides with neither)
- ``_changed_by_from_row(row)`` projects user columns onto the API shape
- ``_entity_kind_for(model_cls)`` resolves the change-records taxonomy lookup

Both call sites get shorter and read what they do (build query / project
user / build row) rather than how. Behavior unchanged; no test changes.

Also two small inline tidyings while in the file:

- Replace the ternary
  ``changes_by_tx = list_change_records_batch(...) if entity_kind else {}``
  with an explicit two-line if-statement in both functions. The ternary
  buries the decision; the if-statement reads as one thought.
- Inline the one-shot ``meta_cols`` set declaration in ``get_version``
  into the ``if col.name in {...}`` check that uses it three lines later.

Net: about 110 lines → about 80 lines across the two functions, plus
a small helper section at the top.
2026-05-19 18:42:06 -06:00
Mike Bridge
40653d52da refactor(versioning): sqlalchemy-review follow-ups (W1–W8)
Cleanup pass from the SQLAlchemy + migration code review. Eight items,
all in the "warnings / suggestions" tier — no behaviour change visible
to the API, but each closes a real correctness, perf, or maintainability
concern surfaced in review.

baseline.py
- Delete unused ``_get_user_id`` (W1). The function wrapped a broad
  ``except Exception:  # noqa: S110`` swallow that hid bugs; grep
  confirmed no callers anywhere. The legitimate audit-field paths
  (``row.get("changed_by_fk")`` etc.) already drive the
  ``version_transaction.user_id`` write.
- Batch ``_baseline_attached_slices`` from O(N) round-trips to
  three queries (W2): one membership SELECT, one existing-shadow
  SELECT, one bulk live-row SELECT for the missing ids. The previous
  per-slice ``COUNT(*)`` + ``SELECT`` was a measurable first-save
  hotspot on dashboards with many charts. Drops the now-unused
  ``_slice_has_shadow`` helper.
- Pick a stable column name for ``flag_modified`` in
  ``_force_parent_dirty_on_child_change`` (W3). ``uuid`` is on all
  three versioned parent classes and excluded by none, so the
  flagged attribute is deterministic across SQLAlchemy versions /
  mapper-config orders instead of depending on
  ``versioned_column_properties(parent)[0]``. Falls back to the
  first available column for forks that exclude ``uuid``.

changes.py
- Add ``Decimal`` handling to ``_jsonable`` (W4) — ``json.dumps``
  rejects ``Decimal``, so any numeric column (e.g. ``SqlMetric.currency``
  contents, or fork/plugin Decimal columns) would crash the bulk
  insert. Stringify rather than ``float()`` to preserve precision;
  the diff engine compares ``from_value`` / ``to_value`` by string
  equality after this coercion so both sides round-trip identically.

queries.py
- Promote the inline ``{0: "baseline", 1: "update", 2: "delete"}``
  dict to module-level ``_OP_TYPE_LABELS`` (W7). The literal was
  duplicated across ``list_versions`` and ``get_version``; the third
  caller is one bug fix away.
- Comment on ``resolve_version_uuid``'s Python-side ``derive_version_uuid``
  loop (W8) — no portable SQL form for UUIDv5 across PostgreSQL /
  MySQL / SQLite, iteration count is bounded by the retention
  window. Flags the place to revisit if retention is ever disabled
  (``=0``) on a heavily-edited entity.

migrations/2026-05-01_23-36 (composite-PK)
- Belt-and-braces guard in ``_downgrade_mysql_table`` (W6): asserts
  ``t.name in AFFECTED_TABLES`` before interpolating into the
  backtick-quoted ALTER statements. The invariant was already
  structurally implied (callers iterate ``AFFECTED_TABLES``), but
  making it load-bearing means a future refactor can't slip an
  arbitrary table name through.

(W5 was verified-no-change: grepped ``tests/`` for ``metadata.create_all``
callers that exercise versioning tables; none. The cascade-FK
gap on ``version_changes.transaction_id`` is already documented
in ``tests/integration_tests/versioning/change_records_tests.py:27-32``.)

62 versioning unit tests pass.
2026-05-19 18:42:06 -06:00
Mike Bridge
59045f8cfe refactor(versioning): split VersionDAO into queries + restore modules
VersionDAO carried five distinct concerns under one class — UUID
derivation, version metadata queries, change-record loading,
single-version snapshot retrieval, and restore orchestration. Bob's
"and" test (the clean-code review flagged this as the next structural
fix after the dead-code purge) gives ~600 lines of "queries about
versioned state of one entity AND the workflow that mutates it."

Splits the read and write sides into purpose-built modules:

- ``superset/versioning/queries.py`` — UUID derivation
  (``VERSION_UUID_NAMESPACE``, ``derive_version_uuid``) + read-side
  helpers (``find_active_by_uuid``, ``current_version_number``,
  ``current_live_transaction_id``, ``current_live_version_uuid``,
  ``list_versions``, ``resolve_version_uuid``, ``get_version``,
  ``list_change_records_batch``). ~475 lines.

- ``superset/versioning/restore.py`` — write-side (``restore_version``,
  ``_stamp_audit_fields_for_restore``, ``_RESTORE_RELATIONS``).
  ~140 lines. Depends only on ``queries.find_active_by_uuid`` and
  ``utils.single_flush_scope``.

- ``superset/daos/version.py`` — collapsed to an ~85-line backward-compat
  façade that re-exports both modules under a single ``VersionDAO``
  class via ``staticmethod`` aliases. The module also re-exports
  ``VERSION_UUID_NAMESPACE`` and ``derive_version_uuid`` at module level
  so the ~10 existing callers (api.py handlers, command classes, the
  ETag emitter, integration tests) don't have to change their imports.
  New code is encouraged to import from the sub-modules directly.

The functions themselves are unchanged byte-for-byte aside from
internal call sites being rewritten from ``VersionDAO.foo`` to the bare
function name (since they now live as module-level functions, not
class methods).

One unit-test mock target moved: ``test_restore_version_returns_none_for_unknown_entity``
now patches ``superset.versioning.restore.find_active_by_uuid`` (the
actual call site) instead of ``VersionDAO.find_active_by_uuid`` (which
is now just an alias).

Each of the three modules now has one reason to change. When the
sc-103157 soft-delete pass adds the ``deleted_at IS NULL`` filter to
``find_active_by_uuid``, it touches only ``queries.py``. When a
per-entity-type restore Strategy replaces the string-keyed
``_RESTORE_RELATIONS`` dispatch, it touches only ``restore.py``.
2026-05-19 18:42:06 -06:00
Mike Bridge
76bbb18fdb temp(versioning): strip URL params from dashboard restore navigation; regen lockfile
DashboardList demo dropdown previously instructed the user to "Reload
the page to see the change" after a restore. The URL the user
returns to may still carry ``?native_filters_key=…`` /
``permalink_key`` / ``form_data_key`` from a prior session — those
point at server-cached snapshots (in ``key_value`` and the
filter-state cache) captured before the restore. On rehydration the
cached state is merged on top of the restored ``json_metadata``,
masking the rollback (e.g. dashboard-level colour-scheme restore
appears not to take effect).

Replaces the alert + manual reload with a direct ``window.location.href``
navigation to ``/superset/dashboard/<uuid>/`` — drops all URL params,
forcing hydration from the freshly restored DB state.

Also regenerates ``package-lock.json`` to pick up the ``zod 4.4.1 →
4.4.3`` bump that master's ``package.json`` already reflects.

(``temp(versioning)`` prefix per the demo dropdown's status — this
file is not part of V1 scope per ADR-005; the V2 UI SIP owns the
actual restore UI surface.)
2026-05-19 18:42:06 -06:00
Mike Bridge
f4a18cfe98 refactor(versioning): rename find_active_by_uuid public + collapse restore commands onto BaseRestoreVersionCommand
Two coupled clean-code review fixes:

(1) Rename ``VersionDAO._find_active_entity_by_uuid`` →
``find_active_by_uuid``. The leading-underscore + three
``# pylint: disable=protected-access`` suppressions in the restore
commands were the smell of a wrongly-private API. The method is a
perfectly reasonable public DAO operation; dropping the underscore
removes the suppressions.

(2) Collapse ``RestoreChartVersionCommand``, ``RestoreDashboardVersionCommand``,
``RestoreDatasetVersionCommand`` onto a shared
``BaseRestoreVersionCommand`` (``superset/commands/version_restore.py``).
The three classes were textbook copy-paste — identical except for
the model class and three exception types. Each subclass now declares
``model_cls`` + ``not_found_exc`` + ``forbidden_exc`` and overrides
``run()`` with one ``@transaction(reraise=<failed_exc>)``-decorated
line delegating to ``self._do_restore()``. ~80 lines per file →
~45 lines per file; one shared workflow instead of three drift sources.

The api.py imports of ``RestoreChartVersionCommand`` /
``RestoreDashboardVersionCommand`` / ``RestoreDatasetVersionCommand`` are
unchanged — public class names preserved.
2026-05-19 18:42:06 -06:00
Mike Bridge
18abb81fe7 refactor(versioning): purge dataset_snapshots dead code + fix get_version bug
The full-Continuum spike (ADR-004 revised) replaced the JSON-snapshot
restore path with Continuum's native Reverter and removed the
``dataset_snapshots`` / ``dashboard_snapshots`` tables from the
migration chain. Seven VersionDAO methods and two module-level
helpers that read/wrote those tables stayed in the code anyway and
went unused — dead code that looked live.

Worse, ``VersionDAO.get_version`` still read from
``dataset_snapshots`` in its SqlaTable branch. On any environment
where the snapshot tables don't exist (current production behavior),
``GET /api/v1/dataset/<uuid>/versions/<version_uuid>/`` raised
``OperationalError``. The branch is rewritten to read column and
metric state from Continuum's child shadow tables
(``table_columns_version`` / ``sql_metrics_version``) via the
existing ``_shadow_rows_valid_at`` helper.

Deleted:
- ``_deserialize_snapshot_value`` (module helper)
- ``_coerce_snapshot_list`` (module helper)
- ``RESTORE_EXCLUDE_FIELDS`` (constant — only referenced by deleted code
  and a docstring)
- ``VersionDAO._restore_dataset_children``
- ``VersionDAO._parse_slice_ids_json``
- ``VersionDAO._apply_dashboard_slices``
- ``VersionDAO._restore_dashboard_children``
- ``VersionDAO._apply_snapshot_children``

The corresponding ~17 unit tests in
``tests/unit_tests/daos/test_version_dao.py`` are removed alongside.

Stale docstring references in ``versioning/changes.py`` and
``versioning/diff.py`` that pointed at the retired snapshot tables are
also cleaned up.

Also strips an 8-line comment block in ``restore_version`` that
duplicated the docstring of ``_stamp_audit_fields_for_restore``.

Net: −290 lines from ``daos/version.py``; a production-shape bug
fixed; dead code that looked live is gone.
2026-05-19 18:42:06 -06:00
Mike Bridge
9e580c699d refactor(versioning): single_flush_scope context manager + single-revert restore
VersionDAO.restore_version previously called Continuum's Reverter
once per relation in a split-revert loop with flush + expire between
calls. That closed an autoflush race in the Reverter when multiple
relations were reverted at once, but split one logical restore across
multiple Continuum transactions — and once the change-records listener
was wired up, the listener's tx-dedup guard skipped the second pass,
silently dropping child-addition records from version_changes. A
restore that re-added a calculated column would render as an empty
"Baseline" entry in the dropdown.

Replaces the split-revert with a single ``target_version.revert(relations=relations)``
call wrapped in a new ``single_flush_scope(db.session)`` context
manager (``superset/versioning/utils.py``). The context manager
suppresses autoflush inside the block and issues one trailing flush
on clean exit; on exception, the trailing flush is skipped so the
session's normal rollback path handles cleanup. Same autoflush window
closed, one Continuum transaction instead of N, the change-records
listener sees the complete shadow state in one after_flush pass.

The wrapper carries the full autoflush-race / cascade-add rationale
in its docstring so the restore_version call site can be a short
6-line block referencing it.

Integration coverage: ``test_restore_emits_full_child_diff_in_one_transaction``.
2026-05-19 18:42:06 -06:00
Mike Bridge
a62d85d798 feat(versioning): force-parent-dirty on versioned-child change
SQLAlchemy doesn't mark a parent as dirty when only its children
(``TableColumn`` / ``SqlMetric`` on ``SqlaTable``) are modified.
Continuum's UnitOfWork only creates operations for entities in
``session.dirty``, so a column-only edit produces shadow rows in
``table_columns_version`` but no parent shadow row in
``tables_version``. ``VersionDAO.list_versions`` queries the parent
shadow, so the version dropdown is empty for child-only saves —
exactly the failure mode reported when "I edited a column description
but no version appeared."

Extends ``register_baseline_listener`` with a new before-flush hook
``_force_parent_dirty_on_child_change`` that walks the existing
``_child_to_parent_registry`` and ``attributes.flag_modified(parent,
<first non-excluded versioned column>)`` whenever a versioned child
is dirty / new / deleted but the parent's own scalars haven't been
touched. The flag puts the parent in ``session.dirty`` so Continuum's
UoW creates a parent UPDATE operation; the resulting shadow row's
scalar columns mirror the previous version (only the children
actually changed), and the row exists to anchor the transaction in
the parent's version chain.

``SkipUnmodifiedPlugin._is_no_op_update`` is updated in this commit's
predecessor to recognize the "scalars match but children dirty" case
via ``_has_dirty_versioned_children`` so the forced parent UPDATE
isn't skipped.

Integration coverage: ``test_dataset_column_edit_creates_parent_version``.
2026-05-19 18:42:06 -06:00
Mike Bridge
8a46573018 feat(versioning): SkipUnmodifiedPlugin audit-key normalize for Dashboard.json_metadata
Continuum's no-op suppression compared post-flush column values
byte-for-byte against the previous live shadow row. For
``Dashboard.json_metadata`` that produced false-positive version rows
on saves where the user authored nothing — the frontend re-stamps
``map_label_colors`` (regenerated from the ``LabelsColorMap``
singleton) on every save, plus ``chart_configuration`` /
``global_chart_configuration`` / ``show_chart_timestamps`` /
``color_namespace`` (derived from the current chart set), so two
consecutive identical saves produce different bytes for the column.
The diff engine already excluded those keys via
``DASHBOARD_JSON_METADATA_AUDIT_KEYS`` when computing change records;
the skip-plugin diverged.

Adds a ``_COLUMN_NORMALIZERS`` registry keyed on
``(class_name, column_name)`` that maps to a per-column normalizer
applied to both pre- and post-image before equating. The first
entry parses ``Dashboard.json_metadata`` as JSON and drops the
audit-key set before comparing. The same registry is the extension
point for analogous transient fields on charts and datasets.

Promotes ``_DASHBOARD_JSON_METADATA_AUDIT_KEYS`` to a public name
(``DASHBOARD_JSON_METADATA_AUDIT_KEYS``) so the skip-plugin can import
it from ``superset.versioning.diff`` without reaching across a
leading-underscore boundary.

Integration coverage: ``test_map_label_colors_only_change_does_not_create_version``.
2026-05-19 18:42:06 -06:00
Mike Bridge
a0546b8a43 fix(importer): use ORM relationship assignment for dashboard_slices
The v1 import pipeline previously wrote dashboard ↔ chart membership
via raw Core DML (``db.session.execute(delete(dashboard_slices)…)`` +
``db.session.execute(insert(dashboard_slices)…)``). With Continuum's
M2M tracker enabled by the versioning feature, those Core writes
emit malformed shadow INSERTs into ``dashboard_slices_version`` —
the tracker can't see the composite-PK columns through the Core
layer and produces rows with only ``(transaction_id, operation_type)``
populated, triggering a ``NOT NULL`` violation on
``(dashboard_id, slice_id)``.

Rewrites both import paths (``ImportAssetsCommand._import`` in
``commands/importers/v1/assets.py`` and ``ImportDashboardsCommand._import``
in ``commands/dashboard/importers/v1/__init__.py``) to use ORM-level
``dashboard.slices = [...]`` reassignment followed by an explicit
``db.session.flush()``. The explicit flush is necessary to land the
M2M rows before any subsequent autoflush fires an inner-flush event
handler that would reset the relationship change (cf. the SAWarning
``Attribute history events accumulated on N previously clean instances
within inner-flush event handlers have been reset``).

The unit tests previously called ``_import`` directly twice in the same
session — production wraps ``run()`` in ``@transaction`` so each invocation
gets its own DB+Continuum transaction. Added ``db.session.commit()`` between
calls in ``test_import_adds_dashboard_charts``,
``test_import_removes_dashboard_charts``, and
``test_dashboard_import_with_overwrite_replaces_charts`` so the tests
mirror production semantics; otherwise the second call's M2M shadow
inserts conflict with the first call's on
``UNIQUE (dashboard_id, slice_id, transaction_id)``.
2026-05-19 18:42:06 -06:00
Mike Bridge
0afeda46a0 temp(versioning): demo version-history dropdowns + French i18n
Adds debug-only ``VersionHistoryDropdown`` widgets to the chart,
dashboard, and dataset list pages so the version surface can be
exercised from the UI during the spike. Each row's actions column
gets a clock-icon dropdown that fetches ``/api/v1/{resource}/<uuid>/
versions/`` on click, lists the ten most recent versions with a
formatted change-log summary, and offers per-version restore via
``POST .../versions/<uuid>/restore``.

Strings are wrapped in ``t('...')`` with placeholder formatting
(e.g. ``t('Added %(kind)s "%(name)s"', { kind, name })``) so
translators can reorder verbs and nouns rather than concatenating
fragments. ``KIND_LABELS`` is a static map keying English layout
kinds (``chart``, ``row``, ``column``, ``tab``, ``markdown``, etc.)
to ``t(...)``-extractable labels. Empty change lists render as
"Baseline" rather than "No changes recorded" since the empty case
is overwhelmingly the ``operation_type=0`` baseline row.

Locale-aware date rendering: ``new Date(iso).toLocaleString(lang)``
where ``lang`` comes from ``document.documentElement.lang`` (set
by ``src/views/App.tsx`` from the bootstrap ``locale``), so dates
follow the user's chosen Superset locale rather than the browser's.

French translations for the new strings are appended to
``superset/translations/fr/LC_MESSAGES/messages.po`` (Ajouté,
Supprimé, Modifié, Version initiale, kind labels, …). Run
``npm run build-translation`` and ``pybabel compile -l fr`` to
regenerate the JSON / MO packs.

This commit is **demo-only** per ADR-005 (V1 is backend-only). It
is intentionally marked ``temp`` so it can be reverted before the
PR splits — the production V1 ships without UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:06 -06:00
Mike Bridge
7ce5f1d0e7 test(versioning): integration tests for SkipUnmodifiedPlugin (FR-026)
Locks in the no-op-suppression behavior implemented by
``SkipUnmodifiedPlugin`` (which lives in ``superset/versioning/factory.py``
shipping with the foundation commit). Five integration tests:

1. Owners-only edit doesn't mint a version row — exercises the
   case where every dirty column is an excluded relationship.
2. Re-save with identical scalar values doesn't mint a row —
   exercises the json_metadata re-serialise path where
   ``set_dash_metadata`` rewrites the column to a different byte
   sequence with identical parsed content; the plugin must compare
   post-flush values against the prior shadow row to detect this.
3. Real scalar change DOES mint a row — guards against the plugin
   over-suppressing.
4. Same assertion on a Slice (covers the ``String`` column path on
   a different entity type).
5. ``json_metadata`` sub-key edit DOES mint a row — covers the
   ``MediumText`` column path past the plugin's content-equality
   check.

Tests are designed so a column-type change in the parent entities
(e.g. flipping ``json_metadata`` from ``MediumText`` to ``JSON``)
will fail one of these if the plugin's Python ``!=`` comparison
breaks for the new type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:06 -06:00
Mike Bridge
9bc95ef819 feat(versioning): ETag helper module + integration tests (T055)
Helper module that derives the strong-validator ``ETag`` value from
an entity's current live ``version_uuid`` and attaches it to a
Flask response. Two functions:

- ``set_version_etag(response, version_uuid)`` — direct path used by
  PUT handlers that already compute ``new_version_uuid`` (see the
  REST API commit two prior). Cheap; no extra query.
- ``set_version_etag_by_uuid(response, model_cls, entity_uuid)`` —
  used by version endpoints that operate on ``entity_uuid``; looks
  up ``entity_id`` then derives ``version_uuid`` via ``VersionDAO``.
  Costs one extra ``SELECT id WHERE uuid = ?``; documented in the
  docstring so callers prefer the cheap variant when they have the
  id already.

Integration tests cover all three entity types and four endpoint
shapes (entity GET, save PUT, version-list GET, single-version GET)
plus the entity-with-no-versions edge case (header is correctly
absent).

The ETag is wired into the API endpoints in the REST-API commit
(group 3) and the CORS ``expose_headers: ["ETag"]`` ships with the
retention commit (group 4) since both touch ``superset/config.py``.
Locking enforcement (``If-Match`` → 412) is explicitly NOT in this
change — deferred to the follow-up UI SIP per Open Question §7.
``ETag`` is informational in v1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:06 -06:00
Mike Bridge
801d58687b feat(versioning): time-based retention via Celery beat (FR-007)
Adds a scheduled Celery task that prunes version history older than
``SUPERSET_VERSION_HISTORY_RETENTION_DAYS`` (default 30; settable
via env var; ``0`` disables retention entirely).

**Task** — ``superset.tasks.version_history_retention.prune_old_versions``:

1. Computes ``cutoff = utcnow() - timedelta(days=N)``.
2. Selects ``version_transaction.id`` rows with ``issued_at <
   cutoff`` and filters out any tx whose parent shadow includes a
   live row (``end_transaction_id IS NULL``). The live row is the
   only preservation rule — closed historical rows including the
   baseline (``operation_type=0``) age out. Per-entity minimum-history
   floor is an open question tracked in ``future-work.md``.
3. Deletes rows owned by surviving txs in each parent shadow
   table (``dashboards_version`` / ``slices_version`` /
   ``tables_version``).
4. Deletes child-shadow rows for the same transactions
   (``table_columns_version`` / ``sql_metrics_version`` /
   ``dashboard_slices_version``).
5. Drops the surviving ``version_transaction`` rows. The
   ``version_changes`` rows cascade via the FK from the previous
   commit.

Idempotent and safely retried on partial failure.

**Schedule** — ``superset/config.py`` adds the task to the default
``CeleryConfig.beat_schedule`` (nightly at 03:00). Operators who
override ``CeleryConfig`` in their ``superset_config.py`` need to
merge this entry — see UPDATING.md.

Also adds ``"expose_headers": ["ETag"]`` to the default
``CORS_OPTIONS`` so cross-origin browser clients can read the
``ETag`` header introduced in the next commit. (Co-located here
because both touch ``superset/config.py``; the ETag mechanism
itself ships in the next commit.)

**Auto-discovery** — ``superset/tasks/celery_app.py`` adds
``version_history_retention`` to its late-imports so Celery's
auto-discovery picks up the task.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:06 -06:00
Mike Bridge
8fe9a8ce4e feat(versioning): REST API endpoints + restore commands
Exposes the version surface as three new endpoints per entity type
(chart, dashboard, dataset), each carrying the standard Superset
decorator stack (``@protect()``, ``@safe``, ``@statsd_metrics``,
``@event_logger.log_this_with_context``) so they appear in FAB's
``action_log`` alongside other audited operations.

| Method | Path | Purpose |
|---|---|---|
| GET  | ``/api/v1/{resource}/<uuid>/versions/`` | List version history (oldest-first; per entry: ``version_uuid``, ``version_number``, ``transaction_id``, ``operation_type``, ``issued_at``, ``changed_by``, ``changes`` array) |
| GET  | ``/api/v1/{resource}/<uuid>/versions/<version_uuid>/`` | Read-only snapshot of the entity at the requested version (scalar fields plus ``columns`` / ``metrics`` for datasets) |
| POST | ``/api/v1/{resource}/<uuid>/versions/<version_uuid>/restore`` | Replay the snapshot onto the live entity via Continuum's ``Reverter`` (non-destructive — produces a new version row stamping the restoring user via the standard save path) |

``<version_uuid>`` is a deterministic ``UUIDv5(entity_uuid,
transaction_id)`` so it's stable across replicas and retention
pruning. Authorisation reuses the resource's existing ``can_write``
permission; workspace admins can list / restore any entity.

**Restore commands** — ``superset/commands/{chart,dashboard,dataset}/
restore_version.py`` wrap ``VersionDAO.restore_version`` in the
standard ``@transaction()`` boundary. The command resolves the
``Reverter`` once per related collection (split-revert pattern, with
``flush + expire`` between calls) so a multi-relation restore
doesn't trip Continuum's autoflush race that would otherwise mark
half the collection as ``state.deleted=True`` mid-revert.

**Save responses** — ``PUT /api/v1/{resource}/<pk>`` is updated to
include ``old_version`` / ``new_version`` (0-based numbers),
``old_transaction_id`` / ``new_transaction_id`` (stable across
pruning), and ``old_version_uuid`` / ``new_version_uuid`` body
fields so callers can correlate a save with its resulting version
row. The ``ETag`` response header in the next commit is built on
top of this, but the body fields stay — they predate the header
and remain useful for clients that don't read response headers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:06 -06:00
Mike Bridge
f7d73e2e1b feat(versioning): change records + diff engine
Adds a structured per-field change log alongside the foundational
shadow tables. Each save flush emits zero or more ``version_changes``
rows describing what changed relative to the previous version, with
shape ``[{kind, path, from_value, to_value, sequence}]`` keyed to
``version_transaction.id`` (FR-016 .. FR-021).

**Schema** — ``version_changes`` table, FK to ``version_transaction``
with ``ON DELETE CASCADE`` so retention drops dependent records
without explicit cleanup. Composite unique index on
``(transaction_id, entity_kind, entity_id, sequence)`` so the
listener can write monotonically and downstream readers see a
deterministic order.

**Diff engine** (``superset/versioning/diff.py``) — pure-function
diffing of pre-/post-state pairs:

- ``diff_scalar_fields`` for ordinary columns; emits one record per
  changed field with JSON-safe ``from_value`` / ``to_value``.
- ``diff_json_field`` for ``json_metadata`` and ``params``, walking
  the parsed structure and emitting per-sub-key records. Honours
  an ``exclude_keys`` set
  (``_DASHBOARD_JSON_METADATA_AUDIT_KEYS``: ``chart_configuration``,
  ``global_chart_configuration``, ``map_label_colors``,
  ``show_chart_timestamps``, ``color_namespace``;
  ``_CHART_PARAMS_AUDIT_KEYS``) so frontend-stamped sub-keys that
  mutate on every save don't dominate the change log (FR-022).
- ``diff_dashboard_layout`` walks ``position_json`` structurally
  and emits ``[verb, kind, id]`` records (verbs ``add``, ``remove``,
  ``move``, ``edit``; kinds from a ``CHART``/``ROW``/``COLUMN``/etc.
  → english map) so a UI can render "Added chart 'Foo'" without
  re-parsing JSON. ``HEADER_ID`` is suppressed because it duplicates
  the ``dashboard_title`` scalar record.
- ``fold_dashboard_layout_with_chart_changes`` deduplicates layout
  records against M2M / chart-membership records by UUID so an
  add-and-attach doesn't appear twice.
- ``_values_equivalent`` treats ``None`` and ``""`` as equal; this
  matches the save path's habit of normalising nullable strings to
  the empty string.

**Listener** — ``superset/versioning/changes.py`` registers a
``before_flush`` listener that captures pre-state for each dirty
entity and an ``after_flush`` listener that runs the diff engine
against the post-state and writes ``version_changes`` rows under
the resolved ``transaction_id``. Tracks processed transaction ids
on ``session.info`` so re-firings within a single transaction
(autoflush triggered by mid-commit queries) don't double-insert and
trip the unique constraint. Reads child rows via raw SELECT against
``table_columns`` / ``sql_metrics`` rather than ``dataset.columns``
because the live collection is stale during the restore path's raw
DELETE+INSERT cycle.

**Endpoint surface** — ``VersionDAO.list_change_records_batch``
batches the lookup across multiple transactions with a single
``WHERE transaction_id IN (...)`` query so the version-list
endpoint avoids N+1 round-trips. ``list_versions`` / ``get_version``
return entries with a populated ``changes`` array (empty for
``operation_type=0`` baseline rows).

**Tests** — ``test_diff.py`` covers the diff engine shape (39
unit cases across scalar, JSON, layout, child-collection, and
fold paths). ``change_records_tests.py`` exercises the listener
end-to-end with realistic save flows. ``perf_validation_tests.py``
is the T044 harness for SC-002/3/4 (list endpoint p95 < 1s,
restore < 3s, save overhead < 50ms).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:06 -06:00
Mike Bridge
be01e4552c feat(versioning): foundation — Continuum capture + parent/child shadow tables + VersionDAO
Adds SQLAlchemy-Continuum as a dependency and wires it as the
canonical capture mechanism for chart, dashboard, and dataset edits.

**Schema** — three Alembic migrations, leaving the chain at one
foundation revision plus one child-shadow revision:

- ``version_transaction`` (renamed from Continuum's default
  ``transaction``; SQL-reserved-word workaround) carries the per-save
  ``user_id`` / ``issued_at`` and is the join target for all shadow
  rows. Auto-incrementing PK; user_id has no FK so import / Celery /
  CLI saves can write rows without an active Flask user.
- Parent shadow tables for the three entity types:
  ``dashboards_version``, ``slices_version``, ``tables_version``.
- Child shadow tables for dataset children + dashboard M2M:
  ``table_columns_version``, ``sql_metrics_version``,
  ``dashboard_slices_version`` (composite PK on the M2M shadow,
  matching the live ``dashboard_slices`` reshape from
  sc-105349-composite-association-pks).

**Models** — ``Dashboard``, ``Slice``, ``SqlaTable`` (and dataset
children ``TableColumn`` / ``SqlMetric``) gain ``__versioned__``
class attributes. The exclude lists carry both M2M relationships
(``owners``, ``roles``, ``dashboards``) and the ``AuditMixin``
columns (``changed_on`` / ``created_on`` / ``changed_by_fk`` /
``created_by_fk`` plus ``last_saved_at`` / ``last_saved_by_fk``
on ``Slice``) so auto-bumped audit fields cannot trigger a
version row on their own (FR-025).

**Plugins** — ``superset/versioning/factory.py`` ships three
Continuum plugins:

- ``VersionTransactionFactory`` renames the transaction table and
  appends the unconditional ``user_id`` column.
- ``VersioningFlaskPlugin`` sources the acting user from Superset's
  ``g.user`` rather than ``flask_login.current_user`` (Superset's
  JWT auth populates ``g.user`` but leaves ``current_user``
  anonymous on API routes).
- ``SkipUnmodifiedPlugin`` filters Continuum's UPDATE operations,
  marking content-equivalent re-saves as ``processed=True`` so they
  don't mint no-op shadow rows (FR-026; see follow-up commits for
  the test). Lives in this commit because it shares the file with
  the other plugins.

**Save-path glue** — a ``before_flush`` baseline listener
(``superset/versioning/baseline.py``) inserts an ``operation_type=0``
shadow row the first time a pre-existing entity is saved, including
the slice-baseline-under-dashboard pattern that gives the dashboard
M2M shadow a row to join against. ``UpdateDashboardCommand`` wraps
its body in ``no_autoflush`` so ``process_tab_diff`` /
``process_native_filter_diff`` don't fire intermediate flushes that
would mint extra version rows. ``DatasetDAO.update_columns`` is
rewritten as a natural-key upsert keyed on ``column_name`` so child
edits flow through ORM events Continuum sees.

**DAO** — ``superset/daos/version.py`` exposes the read API used by
the version endpoints in the next commits:
``current_version_number`` (0-based index, unstable under retention
pruning), ``current_live_transaction_id`` (stable across pruning),
``current_live_version_uuid`` (deterministic UUIDv5), plus
``list_versions`` / ``get_version`` / ``restore_version`` and a
batch ``list_change_records_batch`` for N+1 avoidance.

**Initialization** — ``superset/initialization/__init__.py`` wires
``init_versioning()`` after ``make_versioned()`` runs and the
versioned mappers are configured. Registers the baseline listener
plus the change-record listener (the latter's body lives in the
next commit but the registration site is here because it shares
the init function).

**Tests** — version-capture and version-list integration tests for
each entity type, plus a ``VersionDAO`` unit test suite. Retention
test uses a backdated ``issued_at`` so it can drive
``_prune_old_versions_impl`` synchronously.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:06 -06:00
Mike Bridge
0a9fa1ac85 feat(scripts): add --dirty-duplicates-pct to seed_junction_load.py
Extends the stress-test seed script with an optional duplicate-row
injection step, used to measure the empirical cost of the migration's
``_dedupe_by_min_id`` phase.

Usage: after running the normal seed at a given scale, add
``--dirty-duplicates-pct 5`` (or any non-zero value) to inject that
percentage of duplicate ``(fk1, fk2)`` rows into each non-UNIQUE
junction (slice_user, dashboard_user, dashboard_roles —
dashboard_slices is skipped because its UNIQUE constraint, present
both pre- and post-migration, rejects duplicates).

Pre-condition: requires the DB to be at the pre-migration revision
(33d7e0e21daa). The post-migration composite PK rejects duplicates,
so attempting to inject on the upgraded schema errors out.

Empirical result on MySQL @ 10M dashboard_slices + ~2.1M other
junction rows + 105K injected duplicates (5% on the 3 non-UNIQUE
tables):
  Upgrade time: 1m 36s vs clean baseline 1m 37s
  → dedupe cost is within measurement noise; the table-scan that
    the migration already performs dominates whether or not
    duplicates exist.

This empirically confirms what the cost-model predicted: the
``_dedupe_by_min_id`` GROUP BY scan is the dominant cost of that
phase, and the actual per-duplicate DELETE is negligible.

NULL-FK injection deliberately skipped — would require altering the
six non-UNIQUE FK columns from NOT NULL back to nullable (the
migration's downgrade keeps them NOT NULL by design), which adds
per-backend ALTER complexity for a code path that's structurally
identical in cost shape (DELETE WHERE col IS NULL is the same scan
shape as the dedupe scan).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:05 -06:00
Mike Bridge
58a1a1a8d1 build(scripts): add stress-test data generator for migration timing
Add ``scripts/seed_junction_load.py``, a backend-agnostic script that
bulk-inserts synthetic parent rows (dashboards, slices, users, roles,
tables, dbs) and many-to-many junction rows for the four largest
association tables targeted by the composite-PK migration:
``dashboard_slices``, ``slice_user``, ``dashboard_user``,
``dashboard_roles``.

Designed for measuring migration runtime at varying scales — run with
a series of size flags (100K / 1M / 5M / 10M for the target table)
and time the migration at each scale to verify the predicted
``O(N log N)`` extrapolation against real numbers.

Properties:
- **Reproducible**: deterministic cross-product walk through parent IDs
  produces a stable pair sequence; re-running is replayable.
- **Idempotent**: re-running with the same target is a no-op; with a
  higher target, only new rows are added.
- **Backend-agnostic**: connects via Superset's standard ``DATABASE_*``
  env vars (or ``SUPERSET__SQLALCHEMY_DATABASE_URI``). Branches on
  dialect for ``BINARY(16)`` vs ``UUID`` vs TEXT/BLOB UUID columns.
- **Batched**: bulk INSERT 10K rows per statement.
- **Per-phase timing**: logs elapsed wall time for the parents phase,
  the junctions phase as a whole, and per junction-table.
- **Avoidance set**: loads existing junction pairs into a Python set
  so re-runs on top of pre-existing data don't collide on the
  uniqueness constraint.

Usage (inside the Superset container):

    docker exec superset-superset-1 \\
        /app/.venv/bin/python /app/scripts/seed_junction_load.py \\
        --dashboard-slices 1000000

Defaults target a "large multi-team install" shape: 1M
``dashboard_slices``, 100K each ``slice_user`` / ``dashboard_user``,
10K ``dashboard_roles``. Override per-table via flags.

Tested locally on MySQL (the user's current eval stack):
- 200/100/100/50 row mini-run produced expected counts.
- Re-running at the same target is a no-op (idempotent).
- ``--dry-run`` plans without writing.

Junction tables not yet covered (``sqlatable_user``, ``rls_filter_*``,
``report_schedule_user``) are typically small in production and
require additional parent seeding (RLS filters, report schedules)
that wasn't worth the scope here. Adding them is straightforward by
extending ``JUNCTIONS`` and writing the corresponding parent seeder.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:05 -06:00
Mike Bridge
fef0a64b21 fix(docker): MySQL examples DB + EXAMPLES_PORT override (sc-105349)
Fix two follow-on issues reported when starting the dev stack with
docker-compose-mysql.yml:

1. ``superset-init`` step 4 (load-examples) fails with
   ``MySQLdb.OperationalError: (2002, "Can't connect to server on 'db'")``
   because the analytics-examples DB connection inherits ``EXAMPLES_PORT=5432``
   (Postgres port) from ``docker/.env``. The override flipped
   ``DATABASE_DIALECT`` to ``mysql+mysqldb`` but left the EXAMPLES_*
   group on Postgres defaults, so the URI became
   ``mysql+mysqldb://examples:examples@db:5432/examples`` — MySQL
   container has no listener on 5432.

   Fix: add ``EXAMPLES_HOST/PORT/DB/USER/PASSWORD`` and a complete
   ``SUPERSET__SQLALCHEMY_EXAMPLES_URI`` to the ``mysql-env`` anchor.

2. The Postgres init scripts under
   ``docker/docker-entrypoint-initdb.d/`` (``cypress-init.sh``,
   ``examples-init.sh``) get mounted on the MySQL container too —
   compose merges volume lists. They invoke ``psql`` which doesn't
   exist in the MySQL image, abort with ``psql: command not found``,
   and prevent the ``examples`` DB from being created.

   Fix: add a MySQL-specific init script
   ``docker/mysql-init/examples-init.sql`` that creates the
   ``examples`` database and user, and mount it at
   ``/docker-entrypoint-initdb.d`` in the override. Compose's
   later-takes-precedence rule on duplicate volume targets displaces
   the Postgres init dir, so the MySQL container only sees the
   MySQL-compatible script.

   (Used a plain duplicate-target mount rather than the ``!override``
   tag because pre-commit's ``check-yaml`` doesn't recognize Compose's
   custom YAML tags.)

Recovery for an existing failed MySQL stack: ``docker compose -f
docker-compose.yml -f docker-compose-mysql.yml down``, then
``docker volume rm superset_db_home_mysql`` (so the new init script
runs on the next fresh boot), then ``up`` again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:05 -06:00
Mike Bridge
7867f30a23 build(docker): add MySQL compose override for dialect-swap evaluation
Adds ``docker-compose-mysql.yml``, a compose-override file that swaps
the default Postgres metadata DB for MySQL 8 with one extra ``-f``
flag:

  docker compose -f docker-compose.yml -f docker-compose-mysql.yml up

Useful for evaluating dialect-specific behaviour (e.g., the runtime
cost of DDL migrations on a deployment whose production metadata DB
is MySQL — the question raised by review feedback on this PR).

Mirrors the connection settings used by CI's ``test-mysql`` shard:
``mysql+mysqldb`` dialect, charset ``utf8mb4`` with binary_prefix.
Host port defaults to 13306 (configurable via ``DATABASE_PORT_MYSQL``)
to avoid colliding with a native MySQL install on 3306.

A separate volume (``db_home_mysql``) keeps MySQL data isolated from
the Postgres ``db_home`` volume, so switching between the two with
``-f`` flag toggles doesn't corrupt either side.

The Postgres-specific init scripts under
``docker/docker-entrypoint-initdb.d/`` are not mounted on the MySQL
service (they are postgres-only). Examples / cypress fixtures still
load via ``superset-init``'s post-startup steps, which run
``superset load-examples`` against whichever metadata DB is in use.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:05 -06:00
Mike Bridge
118161b0a0 docs(UPDATING): add MySQL-targeted maintenance-window queries (sc-105349)
Mirror of the PostgreSQL diagnostic queries added in 11148779ed,
adapted for MySQL/InnoDB. One important difference: InnoDB rebuilds
the clustered index on every PK change, so all eight tables undergo
a full table rebuild on MySQL — not just the two that go through
the explicit ``recreate="always"`` path. The lock-window estimate
query is updated to cover all eight rather than just two, and the
"migration_path" column makes the rebuild expectation explicit
("direct ALTER (still rebuilds InnoDB clustered index)").

Other notes:
- ``information_schema.TABLES.TABLE_ROWS`` is an InnoDB estimate,
  analogous to PostgreSQL's ``reltuples``; documented inline.
- ``KEY_COLUMN_USAGE`` carries both sides of the FK in a single
  row on MySQL, so the external-FK pre-flight check is simpler
  than the PostgreSQL version (no joins between three views).
- The aggregated dedupe query is portable standard SQL; included
  verbatim for copy-paste convenience.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:05 -06:00
Mike Bridge
3408a6f6c0 docs(UPDATING): add Postgres-targeted maintenance-window queries (sc-105349)
Add a "Sizing the maintenance window on PostgreSQL" sub-section to the
operator runbook. The simple per-table COUNT/duplicate/NULL queries
that were already there are dialect-portable but only count rows;
operators on PostgreSQL with large deployments need to characterize
the migration's runtime cost before scheduling it.

Adds four diagnostic queries:

- Per-table size, row count (from pg_class.reltuples), and which
  migration path each table will take (recreate-rewrite vs direct
  ALTER). Sizes the work concretely.
- Aggregated duplicate-row roll-up: dup_groups + total rows_dropped
  per table. Replaces eight separate per-table queries with one
  consolidated result for audit/dump-before-apply decisions.
- External-FK pre-flight check (the same one the migration runs at
  upgrade time and aborts on). Lets operators surface any blocking
  external reference ahead of the maintenance window. Should be
  empty on a stock install.
- Lock-window estimate for the two full-rewrite tables, using
  pg_relation_size and a conservative 100 MB/s rewrite throughput
  assumption. The other six use direct ALTER and are dominated by
  composite-index build time (seconds for low-millions-of-rows
  tables).

Prompted by reviewer feedback on apache/superset#39859 from a large
deployment asking how to size the maintenance window. The original
pre-flight queries are kept for cross-dialect operators (MySQL,
SQLite) since the new queries use PostgreSQL-specific catalog views.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:05 -06:00
Mike Bridge
254e826307 fix(migration): rebase down_revision onto 33d7e0e21daa (sc-105349)
CI cypress + playwright shards were red with:

  ERROR [flask_migrate] Error: Multiple head revisions are present
  for given argument 'head'

The recent rebase onto master pulled in
``33d7e0e21daa_add_semantic_layers_and_views.py`` (from PR #37815,
"semantic layer extension"), which had been authored against
``ce6bd21901ab`` as its parent — the same parent our migration
referenced. After the rebase both migrations point at
``ce6bd21901ab``, producing two heads and breaking ``flask db
upgrade head`` for any downstream consumer (CI's Cypress / Playwright
shards spin up a real Superset instance via ``superset db upgrade``,
which is why those shards failed first; the integration shards run
against a precomputed schema and didn't surface this).

Fix: chain our migration after the semantic-layer migration by
pointing ``down_revision`` at ``33d7e0e21daa``. The chain is now
linear:

    ... → ce6bd21901ab → 33d7e0e21daa (semantic layers)
                          → 2bee73611e32 (composite PK, this PR)

Verified with ``superset db heads`` (returns single head
``2bee73611e32``) and the local migration test suite (44 passed,
1 skipped).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:05 -06:00
Mike Bridge
9465e3b675 fix(migration): explicit NOT NULL on FK columns for SQLite (sc-105349)
Found by running fresh-install + round-trip against a real SQLite DB:
6 of the 8 affected tables had FK columns that were originally
declared nullable. PostgreSQL and MySQL implicitly promote the
constituent columns of an ``ALTER TABLE ... ADD PRIMARY KEY`` to
``NOT NULL``; SQLite does not (it's a long-standing SQLite quirk —
only ``INTEGER PRIMARY KEY`` enforces NOT NULL on a composite-PK
column). Result: a fresh SQLite install would accept
``INSERT INTO dashboard_slices (NULL, 5)`` despite both columns
being part of the composite PK.

Our integration tests previously masked this: the test fixture seeds
columns with ``nullable=False``, so the post-upgrade NOT NULL
assertion passed regardless of whether the migration enforced it.

Fix: add explicit ``batch_op.alter_column(fk, nullable=False)`` for
both FK columns inside the per-table batch_alter_table block. On
PostgreSQL and MySQL this is a no-op (PK already implies NOT NULL);
on SQLite it adds the missing NOT NULL declaration so a fresh
install matches the data-model.md "After" contract.

Verified end-to-end:
- Postgres + MySQL: column shape unchanged (still NOT NULL)
- SQLite fresh install + round-trip: all 8 tables now have NOT NULL
  on FK columns, ``INSERT (NULL, 5)`` correctly rejected with
  IntegrityError on dashboard_slices, dashboard_user, sqlatable_user

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:05 -06:00
Mike Bridge
65a3491861 fix(migration): MySQL downgrade FK + AUTO_INCREMENT (sc-105349)
Two MySQL-only failures in the downgrade path, found by running the
full migration history against a fresh MySQL 8 container:

1. ``MySQLdb.OperationalError: (1553, "Cannot drop index 'PRIMARY':
   needed in a foreign key constraint")``. InnoDB uses the composite
   PK index to back the FK on the leftmost column. The downgrade
   tried to drop the composite PK before dropping the FKs, orphaning
   the FK's backing index. PostgreSQL and SQLite create separate
   indexes for FK columns and don't trip on this.

2. ``Field 'id' doesn't have a default value`` on subsequent INSERT.
   ``sa.Identity(always=False)`` only emits ``AUTO_INCREMENT`` on
   MySQL when the column is created with ``primary_key=True`` — our
   portable path adds the column first then creates the PK separately,
   so MySQL leaves the column without auto-generation. Existing rows
   would all collide on id=0; future inserts fail because no default.
   Postgres' ``GENERATED BY DEFAULT AS IDENTITY`` and SQLite's
   ``INTEGER PRIMARY KEY`` rowid alias don't have this gap.

Fix: extract ``_downgrade_mysql_table()`` that emits the canonical
MySQL idiom — drop FKs, then a single ALTER combining
``DROP PRIMARY KEY, ADD COLUMN id INT NOT NULL AUTO_INCREMENT,
ADD PRIMARY KEY (id)`` (which backfills existing rows with sequential
ids and preserves AUTO_INCREMENT), restore the redundant UNIQUE on
the 2 tables that originally had it, and re-add the FKs with their
original names. Postgres and SQLite keep the existing portable
``batch_alter_table`` path.

Raw SQL is unavoidable for the combined-ALTER form; per the
constitution it's allowed for dialect-specific DDL with no SQLA
equivalent, with triple-quoted strings for legibility.

Verified end-to-end: upgrade → downgrade → upgrade against a fresh
MySQL 8 container with INSERT-without-id sanity check showing the
restored ``id`` column auto-increments correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:05 -06:00
Mike Bridge
56c36fde54 fix(migration): drop FKs before recreate on MySQL (sc-105349)
CI test-mysql failed with:

  MySQLdb.OperationalError: (1826, "Duplicate foreign key constraint
  name 'fk_dashboard_slices_slice_id_slices'")

Root cause: MySQL scopes foreign-key constraint names per-database,
not per-table (PostgreSQL and SQLite scope per-table). The
``batch_alter_table(... recreate="always", copy_from=...)`` path
used for ``dashboard_slices`` and ``report_schedule_user`` builds
``_alembic_tmp_<table>`` carrying the original FK names from
``copy_from`` while the original table still holds those names — MySQL
rejects the temp-table creation with ERROR 1826.

Fix: on MySQL only, drop the original FK constraints by name before
the ``batch_alter_table`` runs. The ``copy_from`` re-creates them on
the rebuilt table with their original names, so the post-migration
shape is unchanged. On PostgreSQL and SQLite the original code path
still runs unchanged.

Local SQLite tests (44 passed, 1 skipped) still pass; CI will validate
on MySQL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:05 -06:00
Mike Bridge
0d95b41aed refactor(migration): build pre-flight SQL via SQLAlchemy core (review)
Address Beto's review comments on apache/superset#39859: replace
``sa.text(f"...")`` SQL construction in the three pre-flight helpers
(``_delete_null_fk_rows``, ``_dedupe_by_min_id``, ``_assert_no_duplicates``)
with SQLAlchemy core constructs (``sa.delete``, ``sa.select``,
``sa.func``, ``.subquery()``, ``.notin_()``).

A small ``_table_clause()`` helper builds a lightweight ``TableClause``
exposing the columns the queries reference; the three helpers consume
it. Removes all ``# noqa: S608`` comments — they are no longer needed
because there is no string-interpolated SQL.

Verified the compiled SQL is identical on Postgres, MySQL, and SQLite,
including the MySQL ERROR 1093 workaround (the inner aggregation is
wrapped in a derived table via ``.subquery()``, producing
``... NOT IN (SELECT keep_id FROM (SELECT min(id) ...) AS keep_min)``).

Also drops the redundant ``f`` prefix on the two non-interpolating
lines of the ``_check_no_external_fks_to_id`` error message.

44 migration tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:05 -06:00
Mike Bridge
6086d9c52a docs(migration): address SQLAlchemy review follow-ups
Four operator-experience improvements from the second review pass:

1. ``TABLES_WITH_NULLABLE_FKS`` is now explicitly documented as an
   informational set that is not consulted at runtime; the comment
   explains the previous ``dashboard_roles`` omission was the bug
   that motivated the always-run cleanup.
2. ``_delete_null_fk_rows`` docstring updated to match the
   "always run" semantics (was still claiming "called only on tables
   in TABLES_WITH_NULLABLE_FKS").
3. ``_check_no_external_fks_to_id`` now documents its scope
   limitation: ``Inspector.get_table_names()`` returns the default
   schema only, so cross-schema FKs in non-standard multi-schema
   PostgreSQL deployments would not be caught. The single-schema
   case (Superset's documented deployment) is fully covered.
4. ``_dedupe_by_min_id`` now logs a sample of up to 10 discarded
   ``(fk1, fk2, id)`` tuples at WARN before deletion, so operators
   can audit which rows the ``MIN(id)`` policy drops. The keep-
   original policy is correct in practice but discards later
   re-grants on ownership tables; the sample makes that visible.
5. ``UPDATING.md`` documents the upgrade/downgrade primary-key
   name divergence (``pk_<table>`` vs ``<table>_pkey``) so
   operators using schema-comparison tools don't mistake it for
   migration drift.

No schema or runtime-behaviour changes. All 44 migration tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:05 -06:00
Mike Bridge
cc20fe7cae fix(migration): always run NULL-FK cleanup; correct RLS test parent name
Two cleanups from PR review:

1. ``dashboard_roles.dashboard_id`` was created nullable in revision
   e11ccdd12658 but was missing from ``TABLES_WITH_NULLABLE_FKS``. A
   production database with a stray NULL ``dashboard_id`` row would have
   failed the PK-add with a cryptic constraint violation. Fix by running
   the NULL-FK cleanup on every affected table — it is a no-op DELETE on
   tables whose FK columns are already NOT NULL, and it eliminates the
   risk of further drift in the hardcoded set. ``dashboard_roles`` is
   added to the documentation set; the runtime now does not consult it.

2. The unit-test parent-table name for ``rls_filter_roles`` and
   ``rls_filter_tables`` was ``rls_filter`` (does not exist) instead of
   the real parent ``row_level_security_filters``. Test passes either
   way (the in-memory FK is self-consistent), but the parameter is now
   accurate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:05 -06:00
Mike Bridge
5958e12fc0 refactor(db): composite PK on M2M association tables (sc-105349)
Replace synthetic id INTEGER PRIMARY KEY with composite PRIMARY KEY (fk1, fk2)
on the eight pure-junction tables: dashboard_roles, dashboard_slices,
dashboard_user, report_schedule_user, rls_filter_roles, rls_filter_tables,
slice_user, sqlatable_user. The redundant UNIQUE(fk1, fk2) on dashboard_slices
and report_schedule_user is dropped (subsumed by the new PK).

Migration handles dialect quirks: copy_from for tables with pre-existing
UNIQUE (so SQLite's anonymous-constraint reflection doesn't matter), wrapped-
subquery dedupe for MySQL (ERROR 1093), sa.Identity(always=False) on downgrade
to backfill the restored id column without NOT NULL violations, and distinct
PK names per direction (pk_<table> on upgrade, <table>_pkey on downgrade) to
avoid round-trip index-name collisions on Postgres.

ORM Table() definitions updated to match. UPDATING.md entry added with
operator runbook (BI-tool impact, pre-flight inventory queries, dedupe-row-
loss notice, pg_dump workaround, FK-NOT-NULL downgrade asymmetry note).

Tests: 8 schema-shape assertions (post-upgrade), 8 duplicate-rejection unit
tests, 8 distinct-pair sanity tests, 1 round-trip + idempotency test
(in-memory SQLite via Alembic MigrationContext).

Continuum-restore verification against the new shape is out of scope for this
PR; it is the responsibility of the versioning epic (sc-103156).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 18:42:05 -06:00
70 changed files with 13455 additions and 184 deletions

View File

@@ -24,6 +24,56 @@ assists people when migrating to a new version.
## Next
### Entity version history for charts, dashboards, and datasets
Saves of charts, dashboards, and datasets now automatically produce a version history — browsable and restorable via new API endpoints. No frontend UI in this release; the backend plumbing is the deliverable.
**New endpoints** (per entity type — same pattern for `chart`, `dashboard`, and `dataset`):
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/api/v1/{resource}/<uuid>/versions/` | List the entity's version history (0-based `version_number`, `version_uuid`, `issued_at`, `changed_by`) |
| `GET` | `/api/v1/{resource}/<uuid>/versions/<version_uuid>/` | Get a single version snapshot (scalar fields at that version; plus `columns` / `metrics` for datasets) |
| `POST` | `/api/v1/{resource}/<uuid>/versions/<version_uuid>/restore` | Restore the entity to the state captured by that version |
`<version_uuid>` is a deterministic `UUIDv5` derived from the entity's UUID and the Continuum transaction id — stable across replicas and retention pruning. Authorisation reuses the resource's existing `can_write` permission; workspace admins can list/restore any entity.
**Version response shape — `changes` array:**
Each entry returned by `GET /api/v1/{resource}/<uuid>/versions/` and `GET .../versions/<version_uuid>/` includes a `changes` array describing what changed relative to the previous version:
```json
"changes": [
{"kind": "field", "path": "slice_name", "from_value": "Old", "to_value": "New"}
]
```
The array is empty for baseline (`operation_type=0`) transactions. `kind` enumerates structured record types (`field`, layout-walker records for dashboards, dataset child diffs for `TableColumn` / `SqlMetric`); `path` is a dotted JSON-pointer-style locator; `from_value` / `to_value` are JSON-safe scalars or compact records.
**Save-response and ETag headers:**
- Save responses (`PUT /api/v1/{resource}/<pk>`) include `old_version_uuid` and `new_version_uuid` body fields so the client can correlate a save with its resulting version row.
- All entity GETs (`GET /api/v1/{chart,dashboard,dataset}/<pk>`), version-list GETs, single-version GETs, and save responses emit an `ETag: "<version_uuid>"` header reflecting the entity's current live version. The default `CORS_OPTIONS` now sets `expose_headers: ["ETag"]` so cross-origin browser clients can read the header. **No `If-Match` enforcement in v1**`ETag` is informational; concurrent-edit detection is deferred to a follow-up SIP.
- **Operators overriding `CORS_OPTIONS` in `superset_config.py` MUST include `"expose_headers": ["ETag"]`** (or merge with the default) for cross-origin clients to read the ETag. A bare `CORS_OPTIONS = {"origins": [...]}` will silently drop the expose-headers default.
**Behaviour changes on save:**
- Every save of a chart, dashboard, or dataset produces one new version row. Rows preserve the full post-save state (scalar fields for all three entity types; `TableColumn` / `SqlMetric` children for datasets; `dashboard_slices` chart membership for dashboards — children versioned via SQLAlchemy-Continuum shadow tables `table_columns_version`, `sql_metrics_version`, and `dashboard_slices_version`).
- First save after an entity already exists in the DB creates a retroactive baseline version so the UI can show "what this looked like before I edited it."
- Tags, owners, and roles are **not** versioned in v1 (ADR-005). A restore leaves those at their live values.
**New config key:**
| Key | Default | Purpose |
|---|---|---|
| `SUPERSET_VERSION_HISTORY_RETENTION_DAYS` | `30` | Versions older than this many days are pruned by a nightly Celery beat task (`superset.tasks.version_history_retention.prune_old_versions`). Each entity's live row (`end_transaction_id IS NULL`) is always preserved; closed historical rows including the baseline age out with the rest. Set to `0` to disable retention entirely. |
**Impact on external integrations:**
- New tables populated on every save — `dashboards_version`, `slices_version`, `tables_version` (parent shadow tables for the three entity types), `table_columns_version`, `sql_metrics_version`, `dashboard_slices_version` (child shadow tables), plus the shared `version_transaction` and `version_changes` tables. External tooling that queries Superset's DB directly will see writes to these tables proportional to save traffic.
- Existing entity endpoints (`GET`/`PUT /api/v1/{chart,dashboard,dataset}/<pk>`) gain an `ETag` response header and the save response gains `old_version_uuid` / `new_version_uuid` body fields. No existing fields are removed or repurposed.
- Version capture is always active — no feature flag.
### Granular Export Controls
A new feature flag `GRANULAR_EXPORT_CONTROLS` introduces three fine-grained permissions that replace the legacy `can_csv` permission:
@@ -310,6 +360,246 @@ See `superset/mcp_service/PRODUCTION.md` for deployment guides.
}
```
### Composite primary keys on many-to-many association tables
The eight M:N association tables listed below have been changed from a synthetic surrogate `id INTEGER PRIMARY KEY` to a composite `PRIMARY KEY (fk1, fk2)` on the two foreign-key columns. The `id` column is dropped, and the two tables that previously carried a redundant `UNIQUE (fk1, fk2)` constraint have that constraint removed (it is now subsumed by the composite primary key).
**Affected tables and their composite-PK column pairs:**
| Table | Composite PK |
|---|---|
| `dashboard_roles` | `(dashboard_id, role_id)` |
| `dashboard_slices` | `(dashboard_id, slice_id)` |
| `dashboard_user` | `(user_id, dashboard_id)` |
| `report_schedule_user` | `(user_id, report_schedule_id)` |
| `rls_filter_roles` | `(role_id, rls_filter_id)` |
| `rls_filter_tables` | `(table_id, rls_filter_id)` |
| `slice_user` | `(user_id, slice_id)` |
| `sqlatable_user` | `(user_id, table_id)` |
**Impact on external readers:** Any BI tool, custom report, backup script, or external integration that references these tables by their old surrogate `id` column (e.g., `SELECT id FROM dashboard_slices WHERE …`, `WHERE dashboard_slices.id IN (…)`) will break. Update such queries to project or filter on the FK pair (`dashboard_id, slice_id`) instead. The FK columns themselves are unchanged.
**Pre-flight inventory queries.** Before applying the upgrade, operators are encouraged to run the queries below against their database to assess what the migration will change. Two classes of pre-existing data are not preserved by the migration: duplicate `(fk1, fk2)` rows (the migration keeps `MIN(id)` and deletes the rest) and rows with `NULL` in either FK column (the migration deletes them, since FK columns are promoted to `NOT NULL` for the composite PK). Compliance- or audit-sensitive operators should also `\copy` (Postgres) or `SELECT … INTO OUTFILE` (MySQL) the affected rows for their own records before upgrading.
```sql
-- Duplicate (fk1, fk2) pairs (the migration will keep MIN(id) per group, delete the rest)
SELECT dashboard_id, role_id, COUNT(*) FROM dashboard_roles GROUP BY dashboard_id, role_id HAVING COUNT(*) > 1;
SELECT dashboard_id, slice_id, COUNT(*) FROM dashboard_slices GROUP BY dashboard_id, slice_id HAVING COUNT(*) > 1;
SELECT user_id, dashboard_id, COUNT(*) FROM dashboard_user GROUP BY user_id, dashboard_id HAVING COUNT(*) > 1;
SELECT user_id, report_schedule_id, COUNT(*) FROM report_schedule_user GROUP BY user_id, report_schedule_id HAVING COUNT(*) > 1;
SELECT role_id, rls_filter_id, COUNT(*) FROM rls_filter_roles GROUP BY role_id, rls_filter_id HAVING COUNT(*) > 1;
SELECT table_id, rls_filter_id, COUNT(*) FROM rls_filter_tables GROUP BY table_id, rls_filter_id HAVING COUNT(*) > 1;
SELECT user_id, slice_id, COUNT(*) FROM slice_user GROUP BY user_id, slice_id HAVING COUNT(*) > 1;
SELECT user_id, table_id, COUNT(*) FROM sqlatable_user GROUP BY user_id, table_id HAVING COUNT(*) > 1;
-- Rows with a NULL in either FK (the migration will delete these)
SELECT COUNT(*) FROM dashboard_roles WHERE dashboard_id IS NULL OR role_id IS NULL;
SELECT COUNT(*) FROM dashboard_slices WHERE dashboard_id IS NULL OR slice_id IS NULL;
SELECT COUNT(*) FROM dashboard_user WHERE user_id IS NULL OR dashboard_id IS NULL;
SELECT COUNT(*) FROM report_schedule_user WHERE user_id IS NULL OR report_schedule_id IS NULL;
SELECT COUNT(*) FROM rls_filter_roles WHERE role_id IS NULL OR rls_filter_id IS NULL;
SELECT COUNT(*) FROM rls_filter_tables WHERE table_id IS NULL OR rls_filter_id IS NULL;
SELECT COUNT(*) FROM slice_user WHERE user_id IS NULL OR slice_id IS NULL;
SELECT COUNT(*) FROM sqlatable_user WHERE user_id IS NULL OR table_id IS NULL;
```
**Sizing the maintenance window on PostgreSQL.** The queries above are dialect-portable but only count rows. Operators on PostgreSQL can run the diagnostic queries below to characterize the migration's runtime cost ahead of time: per-table row count and on-disk size, an aggregated duplicate roll-up, the external-FK pre-flight check (the migration runs the same check and aborts if it returns rows), and a rewrite-time estimate for the two tables that go through the slower full-table-rebuild path.
```sql
-- Per-table size, row count, and which migration path each will take.
-- Two tables ("dashboard_slices", "report_schedule_user") have a
-- redundant UNIQUE constraint that the migration drops via a full
-- table rewrite (op.batch_alter_table(recreate="always")). The other
-- six use direct ALTER TABLE, which is much cheaper.
WITH affected(name, has_unique) AS (
VALUES
('dashboard_roles', false),
('dashboard_slices', true),
('dashboard_user', false),
('report_schedule_user', true),
('rls_filter_roles', false),
('rls_filter_tables', false),
('slice_user', false),
('sqlatable_user', false)
)
SELECT
a.name AS table_name,
CASE WHEN a.has_unique THEN 'recreate (full rewrite)'
ELSE 'direct ALTER' END AS migration_path,
c.reltuples::bigint AS estimated_rows,
pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size,
pg_size_pretty(pg_relation_size(c.oid)) AS heap_size,
pg_size_pretty(pg_indexes_size(c.oid)) AS index_size
FROM affected a
JOIN pg_class c ON c.relname = a.name AND c.relkind = 'r'
ORDER BY pg_total_relation_size(c.oid) DESC;
```
```sql
-- Aggregated duplicate-row roll-up.
-- "dup_groups" is the number of (fk1, fk2) pairs that appear more
-- than once; "rows_dropped" is the total number of rows the
-- migration will delete during the dedupe pass (it keeps MIN(id) per
-- group and discards the rest).
SELECT 'dashboard_roles' AS t, COUNT(*) AS dup_groups, SUM(c) - COUNT(*) AS rows_dropped
FROM (SELECT COUNT(*) c FROM dashboard_roles GROUP BY dashboard_id, role_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'dashboard_slices', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM dashboard_slices GROUP BY dashboard_id, slice_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'dashboard_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM dashboard_user GROUP BY user_id, dashboard_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'report_schedule_user',COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM report_schedule_user GROUP BY user_id, report_schedule_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'rls_filter_roles', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM rls_filter_roles GROUP BY role_id, rls_filter_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'rls_filter_tables', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM rls_filter_tables GROUP BY table_id, rls_filter_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'slice_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM slice_user GROUP BY user_id, slice_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'sqlatable_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM sqlatable_user GROUP BY user_id, table_id HAVING COUNT(*) > 1) g
ORDER BY rows_dropped DESC NULLS LAST;
```
```sql
-- External-FK pre-flight check.
-- The migration runs the equivalent check at upgrade time and aborts
-- if any external FK references one of the soon-to-be-removed `id`
-- columns. Running it ahead of time lets you discover (and migrate)
-- any such reference before the maintenance window. On a stock
-- Superset install this should return zero rows. (Default schema
-- only; multi-schema deployments need to broaden the lookup.)
SELECT
rc.constraint_name,
kcu.table_schema || '.' || kcu.table_name AS referencing_table,
kcu.column_name AS referencing_column,
ccu.table_name AS referenced_table,
ccu.column_name AS referenced_column
FROM information_schema.referential_constraints rc
JOIN information_schema.key_column_usage kcu
ON kcu.constraint_name = rc.constraint_name
AND kcu.constraint_schema = rc.constraint_schema
JOIN information_schema.constraint_column_usage ccu
ON ccu.constraint_name = rc.constraint_name
AND ccu.constraint_schema = rc.constraint_schema
WHERE ccu.table_name IN (
'dashboard_roles','dashboard_slices','dashboard_user',
'report_schedule_user','rls_filter_roles','rls_filter_tables',
'slice_user','sqlatable_user')
AND ccu.column_name = 'id';
```
```sql
-- Lock-window estimate for the two full-rewrite tables.
-- recreate="always" takes ACCESS EXCLUSIVE on the table for the full
-- rewrite. Use heap size combined with your hardware's effective
-- write throughput (~100-200 MB/s on commodity SSD; faster on NVMe)
-- to size the maintenance window. The other six tables use direct
-- ALTER and are dominated by composite-index build time, typically
-- seconds for tables in the low millions of rows.
SELECT
c.relname AS table_name,
pg_size_pretty(pg_relation_size(c.oid)) AS heap_size,
pg_relation_size(c.oid) / 1024 / 1024 AS heap_size_mb,
ROUND(pg_relation_size(c.oid) / 1024 / 1024 / 100.0, 1) AS est_rewrite_seconds_at_100mbs
FROM pg_class c
WHERE c.relname IN ('dashboard_slices', 'report_schedule_user');
```
**Sizing the maintenance window on MySQL.** Equivalent diagnostic queries for MySQL/InnoDB. One important difference from PostgreSQL: InnoDB rebuilds the clustered index on every PK change, so *all eight* tables undergo a full table rebuild on MySQL — not just the two that go through the explicit `recreate="always"` path. The lock-window estimate query below therefore covers all eight tables.
```sql
-- Per-table size, row count, and which migration path each will take.
-- TABLE_ROWS is an InnoDB estimate (analogous to PostgreSQL's reltuples);
-- run SELECT COUNT(*) per table for an exact count if needed.
SELECT
TABLE_NAME AS table_name,
CASE WHEN TABLE_NAME IN ('dashboard_slices', 'report_schedule_user')
THEN 'recreate (explicit, drops UNIQUE)'
ELSE 'direct ALTER (still rebuilds InnoDB clustered index)'
END AS migration_path,
TABLE_ROWS AS estimated_rows,
CONCAT(ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024, 1), ' MB') AS total_size,
CONCAT(ROUND(DATA_LENGTH / 1024 / 1024, 1), ' MB') AS heap_size,
CONCAT(ROUND(INDEX_LENGTH / 1024 / 1024, 1), ' MB') AS index_size
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = DATABASE()
AND TABLE_NAME IN (
'dashboard_roles', 'dashboard_slices', 'dashboard_user',
'report_schedule_user', 'rls_filter_roles', 'rls_filter_tables',
'slice_user', 'sqlatable_user'
)
ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC;
```
```sql
-- Aggregated duplicate-row roll-up. Same SQL as the PostgreSQL version
-- (standard SQL); included here for copy-paste convenience.
SELECT 'dashboard_roles' AS t, COUNT(*) AS dup_groups, SUM(c) - COUNT(*) AS rows_dropped
FROM (SELECT COUNT(*) c FROM dashboard_roles GROUP BY dashboard_id, role_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'dashboard_slices', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM dashboard_slices GROUP BY dashboard_id, slice_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'dashboard_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM dashboard_user GROUP BY user_id, dashboard_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'report_schedule_user',COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM report_schedule_user GROUP BY user_id, report_schedule_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'rls_filter_roles', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM rls_filter_roles GROUP BY role_id, rls_filter_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'rls_filter_tables', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM rls_filter_tables GROUP BY table_id, rls_filter_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'slice_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM slice_user GROUP BY user_id, slice_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'sqlatable_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM sqlatable_user GROUP BY user_id, table_id HAVING COUNT(*) > 1) g
ORDER BY rows_dropped DESC;
```
```sql
-- External-FK pre-flight check. KEY_COLUMN_USAGE on MySQL carries
-- both sides of the FK in a single row, so this is simpler than the
-- PostgreSQL version. Should return zero rows on a stock install.
SELECT
CONSTRAINT_NAME,
CONCAT(TABLE_SCHEMA, '.', TABLE_NAME) AS referencing_table,
COLUMN_NAME AS referencing_column,
REFERENCED_TABLE_NAME AS referenced_table,
REFERENCED_COLUMN_NAME AS referenced_column
FROM information_schema.KEY_COLUMN_USAGE
WHERE TABLE_SCHEMA = DATABASE()
AND REFERENCED_TABLE_NAME IN (
'dashboard_roles', 'dashboard_slices', 'dashboard_user',
'report_schedule_user', 'rls_filter_roles', 'rls_filter_tables',
'slice_user', 'sqlatable_user'
)
AND REFERENCED_COLUMN_NAME = 'id';
```
```sql
-- Lock-window estimate for ALL EIGHT tables (InnoDB rebuilds the
-- clustered index on PK change, so even "direct ALTER" is a rewrite).
-- ADD PRIMARY KEY is INPLACE but not LOCK=NONE — it allows concurrent
-- reads but blocks writes. Use heap size combined with your effective
-- rebuild throughput (~100-200 MB/s on commodity SSD; higher on NVMe).
SELECT
TABLE_NAME AS table_name,
CONCAT(ROUND(DATA_LENGTH / 1024 / 1024, 1), ' MB') AS heap_size,
ROUND(DATA_LENGTH / 1024 / 1024, 1) AS heap_size_mb,
ROUND(DATA_LENGTH / 1024 / 1024 / 100.0, 1) AS est_rewrite_seconds_at_100mbs
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = DATABASE()
AND TABLE_NAME IN (
'dashboard_roles', 'dashboard_slices', 'dashboard_user',
'report_schedule_user', 'rls_filter_roles', 'rls_filter_tables',
'slice_user', 'sqlatable_user'
)
ORDER BY DATA_LENGTH DESC;
```
**Restoring an old `pg_dump` (or equivalent) against the new schema.** A dump taken before the migration includes `INSERT` statements that populate the now-removed `id` column. Restoring such a dump against the post-migration schema will fail. The supported workaround is to dump only the schema and reference data, then re-create the M:N associations from application data after restore — for example with `pg_dump --exclude-table-data` (or per-table `--exclude-table-data=dashboard_slices` etc.) for the eight junction tables, restore the rest, then run a one-shot script that re-INSERTs `(fk1, fk2)` pairs derived from your application export. Operators who need to restore an old dump verbatim should restore against a pre-migration Superset and then re-run the upgrade.
**Intentional downgrade asymmetry.** The migration's `downgrade()` restores the surrogate `id` column and (for `dashboard_slices` and `report_schedule_user`) the original `UNIQUE (fk1, fk2)` constraint, but it does **not** restore the original `NULL`-allowed state on the FK columns — they remain `NOT NULL`. This is intentional: under SQLAlchemy's `secondary=` semantics, a `NULL` in either FK column of a junction table is meaningless (it cannot participate in either side of the relationship). Operators downgrading are not expected to need this restored. The asymmetry is documented for completeness so that round-trip schema diffs are not mistaken for migration bugs.
**Constraint-name divergence between upgrade and downgrade.** The composite primary key created on upgrade is named `pk_<table>` (Alembic's default for `op.create_primary_key("pk_<table>", ...)`), while the surrogate `id` primary key restored on downgrade is named `<table>_pkey` (PostgreSQL's default convention for `PrimaryKeyConstraint("id")`). The two names alternate so that a round-trip (upgrade → downgrade → upgrade) does not collide on a pre-existing constraint name. Operators using schema-comparison tools (e.g. `pg_diff`, `migra`) against a downgraded database may see this as drift versus a fresh-install schema. It is cosmetic — no application code references either constraint name.
## 6.0.0
- [33055](https://github.com/apache/superset/pull/33055): Upgrades Flask-AppBuilder to 5.0.0. The AUTH_OID authentication type has been deprecated and is no longer available as an option in Flask-AppBuilder. OpenID (OID) is considered a deprecated authentication protocol - if you are using AUTH_OID, you will need to migrate to an alternative authentication method such as OAuth, LDAP, or database authentication before upgrading.
- [34871](https://github.com/apache/superset/pull/34871): Fixed Jest test hanging issue from Ant Design v5 upgrade. MessageChannel is now mocked in test environment to prevent rc-overflow from causing Jest to hang. Test environment only - no production impact.

117
docker-compose-mysql.yml Normal file
View File

@@ -0,0 +1,117 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# Compose override that swaps the default Postgres metadata DB for MySQL 8.
# Useful for evaluating dialect-specific behaviour (e.g., DDL-migration
# cost on a deployment whose production metadata DB is MySQL).
#
# Usage:
# docker compose -f docker-compose.yml -f docker-compose-mysql.yml up
# docker compose -f docker-compose.yml -f docker-compose-mysql.yml down
#
# To switch back to Postgres, just drop the second `-f` flag — the MySQL
# data lives in a separate volume (`db_home_mysql`) so neither side is
# corrupted by switching dialects.
#
# Notes:
# - Mirrors the connection settings used by CI's `test-mysql` shard:
# dialect ``mysql+mysqldb``, charset utf8mb4 with binary_prefix.
# - Host port 13306 (configurable via DATABASE_PORT_MYSQL) to avoid
# colliding with a native MySQL install on 3306.
# - The Postgres-specific init scripts under
# docker/docker-entrypoint-initdb.d/ are not mounted (they are
# postgres-only); examples / cypress fixtures still load via
# `superset-init`'s post-startup steps.
# Shared environment override applied to every Superset-side service that
# connects to the metadata DB. ``environment:`` takes precedence over the
# values inherited from the env_file in docker-compose.yml.
x-mysql-env: &mysql-env
DATABASE_DIALECT: mysql+mysqldb
DATABASE_HOST: db
DATABASE_PORT: "3306"
DATABASE_DB: superset
DATABASE_USER: superset
DATABASE_PASSWORD: superset
SQLALCHEMY_DATABASE_URI: "mysql+mysqldb://superset:superset@db:3306/superset?charset=utf8mb4&binary_prefix=true"
# Override the analytics-examples DB connection too. ``EXAMPLES_PORT``
# in docker/.env is hardcoded to 5432 (the Postgres port); without
# this override the examples connection would try MySQL on 5432 and
# fail. The examples user/DB are created by docker/mysql-init/
# examples-init.sql on first MySQL boot.
EXAMPLES_HOST: db
EXAMPLES_PORT: "3306"
EXAMPLES_DB: examples
EXAMPLES_USER: examples
EXAMPLES_PASSWORD: examples
SUPERSET__SQLALCHEMY_EXAMPLES_URI: "mysql+mysqldb://examples:examples@db:3306/examples?charset=utf8mb4&binary_prefix=true"
services:
db:
image: mysql:8.0
environment:
MYSQL_DATABASE: superset
MYSQL_USER: superset
MYSQL_PASSWORD: superset
MYSQL_ROOT_PASSWORD: root
# The original 5432 port mapping is harmless on a MySQL container
# (nothing listens on 5432 inside it) but we add 13306->3306 so the
# MySQL port is reachable from the host without colliding with a
# native MySQL on 3306. Compose merges port lists.
ports:
- "127.0.0.1:${DATABASE_PORT_MYSQL:-13306}:3306"
# Override the init-scripts mount by re-binding the same target path
# to a MySQL-compatible directory. Compose merges volume lists by
# target path; later definitions win on conflict, so this displaces
# the Postgres-specific ``./docker/docker-entrypoint-initdb.d`` mount
# from docker-compose.yml. Without this, MySQL would try to run
# ``cypress-init.sh`` (which invokes ``psql``, not in the MySQL
# image), abort the init phase, and never create the ``examples``
# database. Add the MySQL data volume separately.
volumes:
- db_home_mysql:/var/lib/mysql
- ./docker/mysql-init:/docker-entrypoint-initdb.d
command:
- --default-authentication-plugin=caching_sha2_password
- --character-set-server=utf8mb4
- --collation-server=utf8mb4_0900_ai_ci
healthcheck:
test: ["CMD-SHELL", "mysqladmin ping -h localhost -uroot -proot --silent"]
interval: 5s
timeout: 5s
retries: 20
superset:
environment: *mysql-env
superset-init:
environment: *mysql-env
superset-worker:
environment: *mysql-env
superset-worker-beat:
environment: *mysql-env
superset-node:
environment: *mysql-env
superset-tests-worker:
environment: *mysql-env
volumes:
db_home_mysql:

View File

@@ -0,0 +1,32 @@
-- Licensed to the Apache Software Foundation (ASF) under one
-- or more contributor license agreements. See the NOTICE file
-- distributed with this work for additional information
-- regarding copyright ownership. The ASF licenses this file
-- to you under the Apache License, Version 2.0 (the
-- "License"); you may not use this file except in compliance
-- with the License. You may obtain a copy of the License at
--
-- http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing,
-- software distributed under the License is distributed on an
-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-- KIND, either express or implied. See the License for the
-- specific language governing permissions and limitations
-- under the License.
-- MySQL counterpart to docker/docker-entrypoint-initdb.d/examples-init.sh.
-- Creates the analytics-examples database and user that Superset's
-- ``load-examples`` command writes to. Mounted by docker-compose-mysql.yml
-- at /docker-entrypoint-initdb.d/ so the MySQL image's first-boot
-- entrypoint runs it automatically. (The Postgres init scripts under
-- docker/docker-entrypoint-initdb.d/ are NOT mounted on the MySQL
-- service — they invoke psql, which doesn't exist in the MySQL image.)
CREATE DATABASE IF NOT EXISTS examples
CHARACTER SET utf8mb4
COLLATE utf8mb4_0900_ai_ci;
CREATE USER IF NOT EXISTS 'examples'@'%' IDENTIFIED BY 'examples';
GRANT ALL PRIVILEGES ON examples.* TO 'examples'@'%';
FLUSH PRIVILEGES;

View File

@@ -100,6 +100,7 @@ dependencies = [
"simplejson>=3.15.0",
"slack_sdk>=3.19.0, <4",
"sqlalchemy>=1.4, <2",
"sqlalchemy-continuum>=1.6.0, <2.0.0",
"sqlalchemy-utils>=0.38.0, <0.43", # expanding lowerbound to work with pydoris
"sqlglot>=28.10.0, <29",
# newer pandas needs 0.9+

View File

@@ -409,7 +409,10 @@ sqlalchemy==1.4.54
# flask-sqlalchemy
# marshmallow-sqlalchemy
# shillelagh
# sqlalchemy-continuum
# sqlalchemy-utils
sqlalchemy-continuum==1.6.0
# via apache-superset (pyproject.toml)
sqlalchemy-utils==0.42.0
# via
# apache-superset (pyproject.toml)

View File

@@ -976,9 +976,14 @@ sqlalchemy==1.4.54
# marshmallow-sqlalchemy
# shillelagh
# sqlalchemy-bigquery
# sqlalchemy-continuum
# sqlalchemy-utils
sqlalchemy-bigquery==1.15.0
# via apache-superset
sqlalchemy-continuum==1.6.0
# via
# -c requirements/base-constraint.txt
# apache-superset
sqlalchemy-utils==0.42.0
# via
# -c requirements/base-constraint.txt

View File

@@ -0,0 +1,682 @@
#!/usr/bin/env python3
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# ----------------------------------------------------------------------
# Stress-test data generator for the composite-PK migration (sc-105349).
#
# Bulk-inserts synthetic parent rows and many-to-many junction rows for
# the eight association tables that the composite-PK migration touches.
# Useful for measuring migration runtime at varying scales — run this at
# 100K / 1M / 5M / 10M rows and time the migration at each scale to
# verify the O(N log N) extrapolation.
#
# Idempotent: rerunning with the same target is a no-op; rerunning with
# a higher target adds rows up to the new total. Batched bulk INSERTs
# (10K rows per statement) make it fast on Postgres, MySQL, and SQLite.
#
# Usage (inside the Superset container):
#
# docker exec superset-superset-1 \\
# /app/.venv/bin/python /app/scripts/seed_junction_load.py \\
# --dashboard-slices 1000000 \\
# --slice-user 100000 \\
# --dashboard-user 100000
#
# Run with no flags for the defaults shown below. Use ``--dry-run`` to
# print the planned inserts without writing anything.
#
# The script connects via Superset's standard ``DATABASE_*`` env vars
# (or ``SUPERSET__SQLALCHEMY_DATABASE_URI`` if set), so it works
# automatically inside the Superset container regardless of which
# metadata DB backend is in use.
from __future__ import annotations
import argparse
import logging
import os
import sys
import time
from contextlib import contextmanager
from typing import Iterator
from uuid import uuid4
import sqlalchemy as sa
from sqlalchemy.engine import Connection, Engine
logger = logging.getLogger("seed_junction_load")
# Bulk INSERT batch size. Larger values = fewer statements but more memory.
BATCH = 10_000
# Default per-junction-table target row counts. Tuned to mimic the shape
# of a large multi-team Superset install. Override via CLI flags.
DEFAULTS: dict[str, int] = {
"dashboard_slices": 1_000_000,
"slice_user": 100_000,
"dashboard_user": 100_000,
"dashboard_roles": 10_000,
}
# (junction_table, fk1_col, fk2_col, parent1_table, parent2_table)
# parents reference id columns; we generate (fk1, fk2) pairs by sampling
# from the parents' existing IDs.
JUNCTIONS: list[tuple[str, str, str, str, str]] = [
("dashboard_slices", "dashboard_id", "slice_id", "dashboards", "slices"),
("slice_user", "user_id", "slice_id", "ab_user", "slices"),
("dashboard_user", "user_id", "dashboard_id", "ab_user", "dashboards"),
("dashboard_roles", "dashboard_id", "role_id", "dashboards", "ab_role"),
]
# Junction tables that originally carried ``UNIQUE(fk1, fk2)`` and therefore
# cannot accept duplicate ``(fk1, fk2)`` pairs even on the pre-migration
# (downgrade) schema. The other JUNCTIONS allow duplicates pre-migration.
JUNCTIONS_WITH_UNIQUE: set[str] = {"dashboard_slices", "report_schedule_user"}
# ----------------------------------------------------------------------
# Connection setup
# ----------------------------------------------------------------------
def build_engine() -> Engine:
"""Build a SQLAlchemy engine from Superset env vars."""
if uri := os.environ.get("SUPERSET__SQLALCHEMY_DATABASE_URI"):
logger.info("Using SUPERSET__SQLALCHEMY_DATABASE_URI from env")
return sa.create_engine(uri)
try:
dialect = os.environ["DATABASE_DIALECT"]
user = os.environ["DATABASE_USER"]
password = os.environ["DATABASE_PASSWORD"]
host = os.environ["DATABASE_HOST"]
port = os.environ["DATABASE_PORT"]
db = os.environ["DATABASE_DB"]
except KeyError as exc:
sys.exit(
f"Missing env var {exc}; either set DATABASE_DIALECT/USER/PASSWORD/"
f"HOST/PORT/DB or SUPERSET__SQLALCHEMY_DATABASE_URI before running."
)
uri = f"{dialect}://{user}:{password}@{host}:{port}/{db}"
logger.info(
"Built URI from DATABASE_* env vars (dialect=%s, host=%s)", dialect, host
)
return sa.create_engine(uri)
# ----------------------------------------------------------------------
# Helpers
# ----------------------------------------------------------------------
def uuid_value(dialect_name: str) -> bytes | str:
"""Return a UUID in the form the active dialect expects.
MySQL stores UUIDs as ``BINARY(16)`` (16 raw bytes); Postgres has a
native ``UUID`` type that accepts strings; SQLite stores them as
BLOB/TEXT and accepts either. Branching here keeps the seed script
backend-agnostic without depending on Superset's custom column types.
"""
if dialect_name.startswith("mysql"):
return uuid4().bytes
return str(uuid4())
@contextmanager
def time_phase(name: str) -> Iterator[None]:
"""Log elapsed wall time for a named phase."""
start = time.monotonic()
logger.info("[%s] starting", name)
try:
yield
finally:
elapsed = time.monotonic() - start
logger.info("[%s] done in %.2fs", name, elapsed)
def count_rows(conn: Connection, table: str) -> int:
return conn.scalar(sa.text(f"SELECT COUNT(*) FROM {table}")) or 0 # noqa: S608
def existing_ids(conn: Connection, table: str, limit: int | None = None) -> list[int]:
sql = f"SELECT id FROM {table} ORDER BY id" # noqa: S608
if limit is not None:
sql += f" LIMIT {limit}"
return [row[0] for row in conn.execute(sa.text(sql))]
# ----------------------------------------------------------------------
# Parent seeders
#
# Each function ensures the named parent table has at least ``target``
# rows by inserting synthetic ones with minimal-but-valid columns.
# Returns nothing; subsequent code reads back IDs via ``existing_ids``.
# ----------------------------------------------------------------------
def seed_dashboards(conn: Connection, target: int, dry_run: bool) -> None:
current = count_rows(conn, "dashboards")
if current >= target:
logger.info(
"dashboards: %d rows (target %d) — no insert needed", current, target
)
return
needed = target - current
logger.info("dashboards: %d%d (+%d)", current, target, needed)
if dry_run:
return
dialect = conn.engine.dialect.name
sql = sa.text(
"INSERT INTO dashboards (uuid, dashboard_title, slug, published) "
"VALUES (:uuid, :title, :slug, :published)"
)
for batch_start in range(0, needed, BATCH):
rows = [
{
"uuid": uuid_value(dialect),
"title": f"seed_dashboard_{current + i}",
"slug": f"seed-dashboard-{current + i}-{uuid4().hex[:8]}",
"published": False,
}
for i in range(batch_start, min(batch_start + BATCH, needed))
]
conn.execute(sql, rows)
logger.info(" dashboards: inserted %d / %d", batch_start + len(rows), needed)
def seed_dbs(conn: Connection, dry_run: bool) -> int:
"""Ensure at least one row exists in ``dbs`` (parent of ``tables``).
Returns the id to use as ``database_id`` when seeding ``tables``."""
ids = existing_ids(conn, "dbs", limit=1)
if ids:
return ids[0]
if dry_run:
return -1 # placeholder
dialect = conn.engine.dialect.name
logger.info("dbs: inserting one synthetic database (no rows present)")
conn.execute(
sa.text(
"INSERT INTO dbs (uuid, database_name, sqlalchemy_uri, expose_in_sqllab) "
"VALUES (:uuid, :name, :uri, :expose)"
),
{
"uuid": uuid_value(dialect),
"name": f"seed_db_{uuid4().hex[:8]}",
"uri": "sqlite:///seed.db",
"expose": False,
},
)
return existing_ids(conn, "dbs", limit=1)[0]
def seed_tables(conn: Connection, target: int, dry_run: bool) -> None:
current = count_rows(conn, "tables")
if current >= target:
logger.info("tables: %d rows (target %d) — no insert needed", current, target)
return
needed = target - current
logger.info("tables: %d%d (+%d)", current, target, needed)
if dry_run:
return
database_id = seed_dbs(conn, dry_run=False)
dialect = conn.engine.dialect.name
sql = sa.text(
"INSERT INTO tables (uuid, table_name, database_id) "
"VALUES (:uuid, :name, :db_id)"
)
for batch_start in range(0, needed, BATCH):
rows = [
{
"uuid": uuid_value(dialect),
"name": f"seed_table_{current + i}",
"db_id": database_id,
}
for i in range(batch_start, min(batch_start + BATCH, needed))
]
conn.execute(sql, rows)
logger.info(" tables: inserted %d / %d", batch_start + len(rows), needed)
def seed_slices(conn: Connection, target: int, dry_run: bool) -> None:
current = count_rows(conn, "slices")
if current >= target:
logger.info("slices: %d rows (target %d) — no insert needed", current, target)
return
needed = target - current
logger.info("slices: %d%d (+%d)", current, target, needed)
if dry_run:
return
# Slices reference tables.id; ensure at least one ``tables`` row exists
# so the FK is satisfiable (datasource_id is nullable but we set it for
# realism). The migration test doesn't care, but a real Superset that
# re-renders these slices does.
seed_tables(conn, target=1, dry_run=False)
table_id = existing_ids(conn, "tables", limit=1)[0]
dialect = conn.engine.dialect.name
sql = sa.text(
"INSERT INTO slices "
"(uuid, slice_name, datasource_id, datasource_type, viz_type) "
"VALUES (:uuid, :name, :ds_id, :ds_type, :viz)"
)
for batch_start in range(0, needed, BATCH):
rows = [
{
"uuid": uuid_value(dialect),
"name": f"seed_slice_{current + i}",
"ds_id": table_id,
"ds_type": "table",
"viz": "table",
}
for i in range(batch_start, min(batch_start + BATCH, needed))
]
conn.execute(sql, rows)
logger.info(" slices: inserted %d / %d", batch_start + len(rows), needed)
def seed_users(conn: Connection, target: int, dry_run: bool) -> None:
current = count_rows(conn, "ab_user")
if current >= target:
logger.info("ab_user: %d rows (target %d) — no insert needed", current, target)
return
needed = target - current
logger.info("ab_user: %d%d (+%d)", current, target, needed)
if dry_run:
return
sql = sa.text(
"INSERT INTO ab_user (first_name, last_name, username, email, active) "
"VALUES (:first, :last, :username, :email, :active)"
)
for batch_start in range(0, needed, BATCH):
rows = [
{
"first": "seed",
"last": f"user_{current + i}",
"username": f"seed_user_{current + i}_{uuid4().hex[:8]}",
"email": f"seed_user_{current + i}_{uuid4().hex[:8]}@example.invalid",
"active": True,
}
for i in range(batch_start, min(batch_start + BATCH, needed))
]
conn.execute(sql, rows)
logger.info(" ab_user: inserted %d / %d", batch_start + len(rows), needed)
def seed_roles(conn: Connection, target: int, dry_run: bool) -> None:
current = count_rows(conn, "ab_role")
if current >= target:
logger.info("ab_role: %d rows (target %d) — no insert needed", current, target)
return
needed = target - current
logger.info("ab_role: %d%d (+%d)", current, target, needed)
if dry_run:
return
sql = sa.text("INSERT INTO ab_role (name) VALUES (:name)")
for batch_start in range(0, needed, BATCH):
rows = [
{"name": f"seed_role_{current + i}_{uuid4().hex[:8]}"}
for i in range(batch_start, min(batch_start + BATCH, needed))
]
conn.execute(sql, rows)
logger.info(" ab_role: inserted %d / %d", batch_start + len(rows), needed)
# ----------------------------------------------------------------------
# Junction seeder
# ----------------------------------------------------------------------
def _load_existing_pairs(
conn: Connection, junction: str, fk1_col: str, fk2_col: str
) -> set[tuple[int, int]]:
"""Load existing ``(fk1, fk2)`` pairs from a junction table into a set.
Used so the seeder can skip them when generating new pairs (junction
tables enforce uniqueness on the FK pair). Memory is ~32 bytes/tuple
on CPython, so 10M existing pairs is ~320MB — acceptable for a dev
machine. The junction / column names come from ``JUNCTIONS``, not
user input, so the f-string interpolation is safe.
"""
sql_text = f"SELECT {fk1_col}, {fk2_col} FROM {junction}" # noqa: S608
return {(row[0], row[1]) for row in conn.execute(sa.text(sql_text))}
def _generate_new_pairs(
p1_ids: list[int],
p2_ids: list[int],
existing_pairs: set[tuple[int, int]],
) -> Iterator[tuple[int, int]]:
"""Yield ``(fk1, fk2)`` pairs from the parent1 × parent2 cross-product
that are not already in ``existing_pairs``."""
for fk1 in p1_ids:
for fk2 in p2_ids:
if (fk1, fk2) not in existing_pairs:
yield (fk1, fk2)
def seed_junction(
conn: Connection,
junction: str,
fk1_col: str,
fk2_col: str,
parent1: str,
parent2: str,
target: int,
dry_run: bool,
) -> None:
"""Bulk-insert junction rows up to ``target`` rows total.
Generates ``(fk1, fk2)`` pairs by walking the cross-product of
parent1 IDs × parent2 IDs in row-major order, skipping pairs that
already exist. Walking the cross-product deterministically keeps
the script replayable: re-running with the same target is a no-op,
and re-running with a higher target appends new pairs in a stable
order regardless of how many runs preceded.
"""
current = count_rows(conn, junction)
if current >= target:
logger.info(
"%s: %d rows (target %d) — no insert needed", junction, current, target
)
return
needed = target - current
logger.info("%s: %d%d (+%d)", junction, current, target, needed)
if dry_run:
return
p1_ids = existing_ids(conn, parent1)
p2_ids = existing_ids(conn, parent2)
max_pairs = len(p1_ids) * len(p2_ids)
if max_pairs < target:
sys.exit(
f"Cannot reach {target} rows in {junction}: "
f"only {max_pairs} unique pairs available "
f"({len(p1_ids)} × {len(p2_ids)}). "
f"Increase parent targets and rerun."
)
existing_pairs: set[tuple[int, int]] = (
_load_existing_pairs(conn, junction, fk1_col, fk2_col) if current > 0 else set()
)
if existing_pairs:
logger.info(
" %s: loaded %d existing pairs into avoidance set",
junction,
len(existing_pairs),
)
insert_sql = sa.text(
f"INSERT INTO {junction} ({fk1_col}, {fk2_col}) " # noqa: S608
f"VALUES (:fk1, :fk2)"
)
inserted = 0
batch: list[dict[str, int]] = []
for fk1, fk2 in _generate_new_pairs(p1_ids, p2_ids, existing_pairs):
batch.append({"fk1": fk1, "fk2": fk2})
inserted += 1
if len(batch) == BATCH or inserted == needed:
conn.execute(insert_sql, batch)
logger.info(" %s: inserted %d / %d", junction, inserted, needed)
batch = []
if inserted == needed:
return
if inserted < needed:
sys.exit(
f"Ran out of unique pairs at {inserted}/{needed} for {junction} "
f"(parents have {len(p1_ids)} × {len(p2_ids)} = {max_pairs} pairs, "
f"{len(existing_pairs)} already present)"
)
# ----------------------------------------------------------------------
# Orchestration
# ----------------------------------------------------------------------
def required_parent_count(target_pairs: int, other_parent: int) -> int:
"""How many rows we need in this parent so that
(this_parent × other_parent) ≥ target_pairs."""
if other_parent == 0:
# Bootstrapping: assume we'll create at least 1
other_parent = 1
return -(-target_pairs // other_parent) # ceil(target_pairs / other_parent)
def _compute_parent_requirements(targets: dict[str, int]) -> dict[str, int]:
"""For each parent table, return the minimum row count needed so that
parent1 × parent2 ≥ target for every junction it participates in.
Allocates ceil(sqrt(target)) rows per parent, balanced across the two
parents of each junction. The actual junction seeder will then walk
the cross-product to produce the target number of unique pairs.
"""
parent_req: dict[str, int] = {}
for junction, _, _, p1, p2 in JUNCTIONS:
target = targets.get(junction, 0)
if target == 0:
continue
sqrt_n = int(target**0.5) + 1
parent_req[p1] = max(parent_req.get(p1, 0), sqrt_n)
parent_req[p2] = max(parent_req.get(p2, 0), sqrt_n)
return parent_req
def _seed_parents(conn: Connection, parent_req: dict[str, int], dry_run: bool) -> None:
"""Seed parent tables in dependency order:
independent parents (ab_user, ab_role) first, then dashboards / slices /
tables (which transitively depend on dbs, seeded inside seed_tables)."""
if "ab_user" in parent_req:
seed_users(conn, parent_req["ab_user"], dry_run)
if "ab_role" in parent_req:
seed_roles(conn, parent_req["ab_role"], dry_run)
if "dashboards" in parent_req:
seed_dashboards(conn, parent_req["dashboards"], dry_run)
if "slices" in parent_req:
seed_slices(conn, parent_req["slices"], dry_run)
if "tables" in parent_req:
seed_tables(conn, parent_req["tables"], dry_run)
def _seed_all_junctions(
conn: Connection, targets: dict[str, int], dry_run: bool
) -> None:
for junction, fk1, fk2, p1, p2 in JUNCTIONS:
target = targets.get(junction, 0)
if target == 0:
continue
with time_phase(f"junction:{junction}"):
seed_junction(conn, junction, fk1, fk2, p1, p2, target, dry_run)
def inject_duplicates(
conn: Connection,
junction: str,
fk1_col: str,
fk2_col: str,
pct: float,
dry_run: bool,
) -> None:
"""Insert duplicate ``(fk1, fk2)`` rows on a non-UNIQUE junction table.
Used to stress-test the migration's ``_dedupe_by_min_id`` phase, which
is otherwise a no-op on cleanly-seeded data. Computes ``count =
current_rows * pct / 100`` and inserts that many rows by re-sampling
existing ``(fk1, fk2)`` pairs in row-major order. The synthetic
duplicates land on top of distinct existing pairs (one duplicate per
distinct pair, then wraps), so the migration's dedupe finds and
deletes them.
**Pre-condition: the table must NOT have UNIQUE on (fk1, fk2)**, i.e.,
the schema must be the pre-migration shape (after running
``superset db downgrade``). On the post-migration schema the composite
PK rejects duplicates and this function will error.
"""
if pct == 0:
return
current = count_rows(conn, junction)
count = int(current * pct / 100)
if count == 0:
logger.info(
"%s: 0 duplicates to inject (current=%d, pct=%g)",
junction,
current,
pct,
)
return
logger.info(
"%s: injecting %d duplicate rows (%g%% of %d existing)",
junction,
count,
pct,
current,
)
if dry_run:
return
select_sql = sa.text(
f"SELECT {fk1_col}, {fk2_col} FROM {junction} ORDER BY id LIMIT :n" # noqa: S608
)
sample = conn.execute(select_sql, {"n": count}).fetchall()
if not sample:
logger.warning("%s: no rows to duplicate (table is empty)", junction)
return
insert_sql = sa.text(
f"INSERT INTO {junction} ({fk1_col}, {fk2_col}) " # noqa: S608
f"VALUES (:fk1, :fk2)"
)
inserted = 0
while inserted < count:
batch: list[dict[str, int]] = []
while len(batch) < BATCH and inserted < count:
row = sample[inserted % len(sample)]
batch.append({"fk1": row[0], "fk2": row[1]})
inserted += 1
conn.execute(insert_sql, batch)
logger.info(" %s: injected %d / %d duplicates", junction, inserted, count)
def _inject_dirty_data(conn: Connection, dirty_pct: float, dry_run: bool) -> None:
"""Inject duplicate rows on every non-UNIQUE seeded junction.
The two tables that originally carried ``UNIQUE(fk1, fk2)`` are
skipped because their composite-PK successor (and their pre-migration
UNIQUE constraint) both reject duplicate inserts.
"""
if dirty_pct == 0:
return
for junction, fk1, fk2, _, _ in JUNCTIONS:
if junction in JUNCTIONS_WITH_UNIQUE:
logger.info(
"%s: skipping duplicate injection (table has UNIQUE on FK pair)",
junction,
)
continue
with time_phase(f"dirty:{junction}"):
inject_duplicates(conn, junction, fk1, fk2, dirty_pct, dry_run)
def run(targets: dict[str, int], dry_run: bool, dirty_duplicates_pct: float) -> None:
engine = build_engine()
with engine.begin() as conn:
parent_req = _compute_parent_requirements(targets)
logger.info("Required parent row counts: %s", parent_req)
with time_phase("parents"):
_seed_parents(conn, parent_req, dry_run)
with time_phase("junctions"):
_seed_all_junctions(conn, targets, dry_run)
if dirty_duplicates_pct > 0:
with time_phase("dirty-duplicates"):
_inject_dirty_data(conn, dirty_duplicates_pct, dry_run)
# ----------------------------------------------------------------------
# CLI
# ----------------------------------------------------------------------
def main() -> None:
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter,
)
for table, default in DEFAULTS.items():
parser.add_argument(
f"--{table.replace('_', '-')}",
type=int,
default=default,
help=f"target row count for {table} (default: {default:,})",
)
parser.add_argument(
"--dry-run",
"-n",
action="store_true",
help="print planned inserts without writing to the DB",
)
parser.add_argument(
"--dirty-duplicates-pct",
type=float,
default=0,
help=(
"after seeding distinct pairs, inject this percentage of duplicate "
"rows on each non-UNIQUE junction (slice_user, dashboard_user, "
"dashboard_roles). Stress-tests the migration's _dedupe_by_min_id "
"phase. Requires the DB to be at the pre-migration revision "
"(33d7e0e21daa) — the post-migration composite PK rejects "
"duplicates and this will error. Default: 0 (no duplicates)."
),
)
parser.add_argument(
"--verbose",
"-v",
action="store_true",
help="increase log verbosity",
)
args = parser.parse_args()
logging.basicConfig(
level=logging.DEBUG if args.verbose else logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
datefmt="%H:%M:%S",
)
targets = {table: getattr(args, table) for table in DEFAULTS}
logger.info("Targets: %s", targets)
logger.info("Dry run: %s", args.dry_run)
logger.info("Dirty duplicates pct: %g", args.dirty_duplicates_pct)
with time_phase("total"):
run(
targets,
dry_run=args.dry_run,
dirty_duplicates_pct=args.dirty_duplicates_pct,
)
if __name__ == "__main__":
main()

View File

@@ -37,7 +37,7 @@ export const useDashboardMetadataBar = (dashboardInfo: DashboardInfo) => {
type: MetadataType.Owner as const,
createdBy: getOwnerName(dashboardInfo.created_by) || t('Not available'),
owners:
dashboardInfo.owners.length > 0
dashboardInfo.owners && dashboardInfo.owners.length > 0
? dashboardInfo.owners.map(getOwnerName)
: t('None'),
createdOn: dashboardInfo.created_on_delta_humanized,

View File

@@ -75,7 +75,7 @@ export const useLanguageMenuItems = ({
type: 'submenu' as const,
label: (
<span className="f16" aria-label={t('Languages')}>
<i className={`flag ${languages[locale]?.flag ?? 'us'}`} />
<i className={`flag ${languages[locale]?.flag ?? ''}`} />
</span>
),
icon: <Icons.CaretDownOutlined iconSize="xs" />,

View File

@@ -0,0 +1,347 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
// TEMP: Demo aid for sc-103156 entity-versioning. Lets a user open a
// dropdown of recent versions on a chart and restore one. Not part
// of the merged feature scope (ADR-005 limits v1 to backend); revert
// before pushing the versioning branch.
import { useState, useCallback } from 'react';
import { SupersetClient } from '@superset-ui/core';
import { t } from '@apache-superset/core/translation';
import { Dropdown, Tooltip, Icons } from '@superset-ui/core/components';
interface Change {
kind: string;
path: string[];
from_value: unknown;
to_value: unknown;
}
interface ChangedBy {
id: number;
username: string;
first_name: string;
last_name: string;
}
interface Version {
version_uuid: string;
version_number: number;
transaction_id: number;
operation_type: string;
issued_at: string;
changed_by: ChangedBy | null;
changes: Change[];
}
interface Props {
chartUuid: string;
onRestored?: () => void;
}
// Layout-record path verbs (set by ``diff_dashboard_layout`` on the
// backend): path = [verb, kind, id]. Same shape across the three
// debug widgets so chart/dataset dropdowns also recognise them — even
// though they don't normally produce layout records, the formatter
// stays uniform.
const LAYOUT_VERBS = new Set(['add', 'remove', 'move', 'edit']);
// Localized labels for the kinds emitted by the backend (layout walker
// + dataset child diff). Defined statically so xgettext can extract them.
const KIND_LABELS: Record<string, string> = {
chart: t('chart'),
row: t('row'),
column: t('column'),
tab: t('tab'),
tabs: t('tabs'),
header: t('header'),
markdown: t('markdown'),
divider: t('divider'),
metric: t('metric'),
};
const localizedKind = (k: string): string => KIND_LABELS[k] ?? k;
function summarizeChange(c: Change): string {
if (c.path.length === 3 && LAYOUT_VERBS.has(String(c.path[0]))) {
const verb = String(c.path[0]);
const kind = localizedKind(String(c.path[1]));
const payload =
((c.to_value ?? c.from_value) as { name?: string } | null) ?? null;
const name = payload?.name;
if (verb === 'add') {
return name
? t('Added %(kind)s "%(name)s"', { kind, name })
: t('Added %(kind)s', { kind });
}
if (verb === 'remove') {
return name
? t('Removed %(kind)s "%(name)s"', { kind, name })
: t('Removed %(kind)s', { kind });
}
if (verb === 'move') {
return name
? t('Moved %(kind)s "%(name)s"', { kind, name })
: t('Moved %(kind)s', { kind });
}
return name
? t('Edited %(kind)s "%(name)s"', { kind, name })
: t('Edited %(kind)s', { kind });
}
const isAdd = c.from_value == null && c.to_value != null;
const isRemove = c.from_value != null && c.to_value == null;
if (c.path.length === 2 && (c.kind === 'column' || c.kind === 'metric')) {
const kind = localizedKind(c.kind);
const name = String(c.path[1]);
if (isAdd) return t('Added %(kind)s "%(name)s"', { kind, name });
if (isRemove) return t('Removed %(kind)s "%(name)s"', { kind, name });
return t('Changed %(kind)s "%(name)s"', { kind, name });
}
if (c.path[0] === 'slices') {
const id = String(c.path[1] ?? '');
if (isAdd) return t('Added chart %(id)s', { id }).trim();
if (isRemove) return t('Removed chart %(id)s', { id }).trim();
return t('Changed chart %(id)s', { id }).trim();
}
if (c.kind === 'field') {
const fieldName = String(c.path[c.path.length - 1]);
const fieldLabel: string =
fieldName === 'dashboard_title'
? t('title')
: fieldName === 'slice_name'
? t('chart name')
: fieldName === 'table_name'
? t('table name')
: fieldName;
const isShortScalar =
c.to_value !== null &&
c.to_value !== undefined &&
(typeof c.to_value === 'string' ||
typeof c.to_value === 'number' ||
typeof c.to_value === 'boolean') &&
String(c.to_value).length <= 80;
if (!isAdd && !isRemove && isShortScalar) {
return t('Changed %(field)s to "%(value)s"', {
field: fieldLabel,
value: String(c.to_value),
});
}
if (isRemove) {
return t('Cleared %(field)s', { field: fieldLabel });
}
if (isAdd && isShortScalar) {
return t('Set %(field)s to "%(value)s"', {
field: fieldLabel,
value: String(c.to_value),
});
}
if (isAdd) return t('Added %(field)s', { field: fieldLabel });
if (isRemove) return t('Removed %(field)s', { field: fieldLabel });
return t('Changed %(field)s', { field: fieldLabel });
}
const kind = localizedKind(c.kind);
if (c.path.length) {
const detail = String(c.path[c.path.length - 1]);
if (isAdd) return t('Added %(kind)s %(detail)s', { kind, detail });
if (isRemove) return t('Removed %(kind)s %(detail)s', { kind, detail });
return t('Changed %(kind)s %(detail)s', { kind, detail });
}
if (isAdd) return t('Added %(kind)s', { kind });
if (isRemove) return t('Removed %(kind)s', { kind });
return t('Changed %(kind)s', { kind });
}
function formatChangeTitle(changes: Change[]): string {
if (!changes.length) return t('Baseline');
const first = summarizeChange(changes[0]);
if (changes.length === 1) return first;
return t('%(first)s (+%(more)s more)', {
first,
more: changes.length - 1,
});
}
function formatUser(by: ChangedBy | null): string {
if (!by) return t('system');
if (by.first_name || by.last_name) {
return `${by.first_name ?? ''} ${by.last_name ?? ''}`.trim();
}
return by.username;
}
function formatDate(iso: string): string {
try {
// Match the Superset locale set in src/views/App.tsx on
// ``document.documentElement.lang`` rather than the browser default.
const lang = document.documentElement.lang || undefined;
return new Date(iso).toLocaleString(lang);
} catch {
return iso;
}
}
export default function VersionHistoryDropdown({
chartUuid,
onRestored,
}: Props) {
const [versions, setVersions] = useState<Version[] | null>(null);
const [loading, setLoading] = useState(false);
const loadVersions = useCallback(async () => {
setLoading(true);
try {
const { json } = await SupersetClient.get({
endpoint: `/api/v1/chart/${chartUuid}/versions/`,
});
const result = (json as { result: Version[] }).result || [];
// Newest first (API returns oldest-first)
setVersions([...result].reverse().slice(0, 20));
} catch (e) {
console.error('Failed to load versions', e);
setVersions([]);
} finally {
setLoading(false);
}
}, [chartUuid]);
const handleRestore = useCallback(
async (version: Version) => {
const summary = formatChangeTitle(version.changes);
if (
// eslint-disable-next-line no-alert
!window.confirm(
t(
'Restore this chart to version %(num)s (%(summary)s)? This will overwrite the current state.',
{ num: version.version_number, summary },
),
)
) {
return;
}
try {
await SupersetClient.post({
endpoint: `/api/v1/chart/${chartUuid}/versions/${version.version_uuid}/restore`,
});
// eslint-disable-next-line no-alert
window.alert(t('Restored. Reload the page to see the change.'));
if (onRestored) onRestored();
} catch (e) {
console.error('Restore failed', e);
// eslint-disable-next-line no-alert
window.alert(t('Restore failed — see browser console for details.'));
}
},
[chartUuid, onRestored],
);
const items = (() => {
if (loading) {
return [{ key: 'loading', label: t('Loading…'), disabled: true }];
}
if (!versions) {
return [
{ key: 'empty', label: t('Click to load versions'), disabled: true },
];
}
if (versions.length === 0) {
return [{ key: 'empty', label: t('No versions yet'), disabled: true }];
}
// versions is already newest-first, so [0] is the live/current version.
return versions.map((v, idx) => {
const isCurrent = idx === 0;
return {
key: String(v.transaction_id),
// antd's `disabled: true` greys the item and blocks default
// click handling; combined with the inner div NOT having an
// onClick when current, the row becomes informational only.
disabled: isCurrent,
label: (
<div
style={{ minWidth: 280, lineHeight: 1.4, padding: '4px 0' }}
onClick={isCurrent ? undefined : () => handleRestore(v)}
>
<div style={{ fontWeight: 600 }}>
#{v.version_number} {formatChangeTitle(v.changes)}
{isCurrent && (
<span
style={{
marginLeft: 8,
fontWeight: 400,
fontSize: 12,
opacity: 0.7,
}}
>
{t('(current)')}
</span>
)}
</div>
<div style={{ fontSize: 12, opacity: 0.75 }}>
{formatUser(v.changed_by)} · {formatDate(v.issued_at)}
</div>
{v.changes.length > 1 && (
<ul
style={{
margin: '4px 0 0 18px',
padding: 0,
fontSize: 12,
opacity: 0.85,
listStyle: 'disc',
}}
>
{v.changes.slice(0, 5).map((c, i) => (
<li key={i}>{summarizeChange(c)}</li>
))}
{v.changes.length > 5 && (
<li style={{ opacity: 0.6 }}>
{t('+%(n)s more', { n: v.changes.length - 5 })}
</li>
)}
</ul>
)}
</div>
),
};
});
})();
return (
<Dropdown
trigger={['click']}
menu={{ items }}
onOpenChange={open => {
if (open && versions === null && !loading) loadVersions();
}}
>
<Tooltip
id="version-history-tooltip"
title={t('Version history (demo)')}
placement="bottom"
>
<span role="button" tabIndex={0} className="action-button">
<Icons.HistoryOutlined iconSize="l" />
</span>
</Tooltip>
</Dropdown>
);
}

View File

@@ -84,6 +84,8 @@ import { QueryObjectColumns } from 'src/views/CRUD/types';
import { WIDER_DROPDOWN_WIDTH } from 'src/components/ListView/utils';
import { Tag } from 'src/components/Tag';
import { datasetLabel } from 'src/features/semanticLayers/label';
// TEMP: sc-103156 versioning demo. Revert before any commit.
import VersionHistoryDropdown from './VersionHistoryDropdown';
const FlexRowContainer = styled.div`
align-items: center;
@@ -576,6 +578,13 @@ function ChartList(props: ChartListProps) {
)}
</ConfirmStatusChange>
)}
{/* TEMP: sc-103156 versioning demo. Revert before any commit. */}
{original.uuid && canEdit && (
<VersionHistoryDropdown
chartUuid={original.uuid}
onRestored={() => refreshData()}
/>
)}
</StyledActions>
);
},

View File

@@ -0,0 +1,363 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
// TEMP: Demo aid for sc-103156 entity-versioning. Lets a user open a
// dropdown of recent versions on a dashboard and restore one. Not part
// of the merged feature scope (ADR-005 limits v1 to backend); revert
// before pushing the versioning branch.
import { useState, useCallback } from 'react';
import { SupersetClient } from '@superset-ui/core';
import { t } from '@apache-superset/core/translation';
import { Dropdown, Tooltip, Icons } from '@superset-ui/core/components';
interface Change {
kind: string;
path: string[];
from_value: unknown;
to_value: unknown;
}
interface ChangedBy {
id: number;
username: string;
first_name: string;
last_name: string;
}
interface Version {
version_uuid: string;
version_number: number;
transaction_id: number;
operation_type: string;
issued_at: string;
changed_by: ChangedBy | null;
changes: Change[];
}
interface Props {
dashboardUuid: string;
onRestored?: () => void;
}
// Layout-record path verbs (set by ``diff_dashboard_layout`` on the
// backend): path = [verb, kind, id].
const LAYOUT_VERBS = new Set(['add', 'remove', 'move', 'edit']);
// Localized labels for the kinds emitted by the backend (layout walker
// + dataset child diff). Defined statically so xgettext can extract them.
const KIND_LABELS: Record<string, string> = {
chart: t('chart'),
row: t('row'),
column: t('column'),
tab: t('tab'),
tabs: t('tabs'),
header: t('header'),
markdown: t('markdown'),
divider: t('divider'),
metric: t('metric'),
};
const localizedKind = (k: string): string => KIND_LABELS[k] ?? k;
function summarizeChange(c: Change): string {
// Layout record (dashboard): path = [verb, kind, id], with payload
// carrying ``name`` / ``chartId`` etc.
if (c.path.length === 3 && LAYOUT_VERBS.has(String(c.path[0]))) {
const verb = String(c.path[0]);
const kind = localizedKind(String(c.path[1]));
const payload =
((c.to_value ?? c.from_value) as { name?: string } | null) ?? null;
const name = payload?.name;
if (verb === 'add') {
return name
? t('Added %(kind)s "%(name)s"', { kind, name })
: t('Added %(kind)s', { kind });
}
if (verb === 'remove') {
return name
? t('Removed %(kind)s "%(name)s"', { kind, name })
: t('Removed %(kind)s', { kind });
}
if (verb === 'move') {
return name
? t('Moved %(kind)s "%(name)s"', { kind, name })
: t('Moved %(kind)s', { kind });
}
return name
? t('Edited %(kind)s "%(name)s"', { kind, name })
: t('Edited %(kind)s', { kind });
}
const isAdd = c.from_value == null && c.to_value != null;
const isRemove = c.from_value != null && c.to_value == null;
// Dataset child: path = [columns | metrics, <name>]. ``kind`` is
// ``column`` / ``metric`` so we can rebuild a readable summary.
if (c.path.length === 2 && (c.kind === 'column' || c.kind === 'metric')) {
const kind = localizedKind(c.kind);
const name = String(c.path[1]);
if (isAdd) return t('Added %(kind)s "%(name)s"', { kind, name });
if (isRemove) return t('Removed %(kind)s "%(name)s"', { kind, name });
return t('Changed %(kind)s "%(name)s"', { kind, name });
}
// Slice membership (mostly folded into layout records server-side,
// but may still appear if the layout walk didn't catch a chart).
if (c.path[0] === 'slices') {
const id = String(c.path[1] ?? '');
if (isAdd) return t('Added chart %(id)s', { id }).trim();
if (isRemove) return t('Removed chart %(id)s', { id }).trim();
return t('Changed chart %(id)s', { id }).trim();
}
// Scalar field record: path = [field_name] or [json_field, sub_key].
if (c.kind === 'field') {
const fieldName = String(c.path[c.path.length - 1]);
// Friendly labels for the most user-visible fields.
const fieldLabel: string =
fieldName === 'dashboard_title'
? t('title')
: fieldName === 'slice_name'
? t('chart name')
: fieldName === 'table_name'
? t('table name')
: fieldName;
// If the new value is a short primitive (string/number/bool), show
// "Changed <field> to <value>" — much more useful than just naming
// the field. Long strings, dicts and arrays fall through to the
// generic verb-only summary.
const isShortScalar =
c.to_value !== null &&
c.to_value !== undefined &&
(typeof c.to_value === 'string' ||
typeof c.to_value === 'number' ||
typeof c.to_value === 'boolean') &&
String(c.to_value).length <= 80;
if (!isAdd && !isRemove && isShortScalar) {
return t('Changed %(field)s to "%(value)s"', {
field: fieldLabel,
value: String(c.to_value),
});
}
if (isRemove) {
return t('Cleared %(field)s', { field: fieldLabel });
}
if (isAdd && isShortScalar) {
return t('Set %(field)s to "%(value)s"', {
field: fieldLabel,
value: String(c.to_value),
});
}
if (isAdd) return t('Added %(field)s', { field: fieldLabel });
if (isRemove) return t('Removed %(field)s', { field: fieldLabel });
return t('Changed %(field)s', { field: fieldLabel });
}
// Fallback: kind plus the trailing path segment (if any).
const kind = localizedKind(c.kind);
if (c.path.length) {
const detail = String(c.path[c.path.length - 1]);
if (isAdd) return t('Added %(kind)s %(detail)s', { kind, detail });
if (isRemove) return t('Removed %(kind)s %(detail)s', { kind, detail });
return t('Changed %(kind)s %(detail)s', { kind, detail });
}
if (isAdd) return t('Added %(kind)s', { kind });
if (isRemove) return t('Removed %(kind)s', { kind });
return t('Changed %(kind)s', { kind });
}
function formatChangeTitle(changes: Change[]): string {
if (!changes.length) return t('Baseline');
const first = summarizeChange(changes[0]);
if (changes.length === 1) return first;
return t('%(first)s (+%(more)s more)', {
first,
more: changes.length - 1,
});
}
function formatUser(by: ChangedBy | null): string {
if (!by) return t('system');
if (by.first_name || by.last_name) {
return `${by.first_name ?? ''} ${by.last_name ?? ''}`.trim();
}
return by.username;
}
function formatDate(iso: string): string {
try {
// Match the Superset locale set in src/views/App.tsx on
// ``document.documentElement.lang`` rather than the browser default.
const lang = document.documentElement.lang || undefined;
return new Date(iso).toLocaleString(lang);
} catch {
return iso;
}
}
export default function VersionHistoryDropdown({
dashboardUuid,
onRestored,
}: Props) {
const [versions, setVersions] = useState<Version[] | null>(null);
const [loading, setLoading] = useState(false);
const loadVersions = useCallback(async () => {
setLoading(true);
try {
const { json } = await SupersetClient.get({
endpoint: `/api/v1/dashboard/${dashboardUuid}/versions/`,
});
const result = (json as { result: Version[] }).result || [];
// Newest first (API returns oldest-first)
setVersions([...result].reverse().slice(0, 20));
} catch (e) {
console.error('Failed to load versions', e);
setVersions([]);
} finally {
setLoading(false);
}
}, [dashboardUuid]);
const handleRestore = useCallback(
async (version: Version) => {
const summary = formatChangeTitle(version.changes);
if (
// eslint-disable-next-line no-alert
!window.confirm(
t(
'Restore this dashboard to version %(num)s (%(summary)s)? This will overwrite the current state.',
{ num: version.version_number, summary },
),
)
) {
return;
}
try {
await SupersetClient.post({
endpoint: `/api/v1/dashboard/${dashboardUuid}/versions/${version.version_uuid}/restore`,
});
onRestored?.();
// Navigate to the dashboard with no URL params. A previous
// ``?native_filters_key=…`` (or ``permalink_key`` / ``form_data_key``)
// points at a server-cached snapshot from before the restore;
// the next page hydration would merge it on top of the freshly
// restored ``json_metadata`` and effectively mask the rollback
// (e.g. dashboard-level colour scheme changes don't appear).
// A clean URL forces hydration from the restored DB state.
window.location.href = `/superset/dashboard/${dashboardUuid}/`;
} catch (e) {
console.error('Restore failed', e);
// eslint-disable-next-line no-alert
window.alert(t('Restore failed — see browser console for details.'));
}
},
[dashboardUuid, onRestored],
);
const items = (() => {
if (loading) {
return [{ key: 'loading', label: t('Loading…'), disabled: true }];
}
if (!versions) {
return [
{ key: 'empty', label: t('Click to load versions'), disabled: true },
];
}
if (versions.length === 0) {
return [{ key: 'empty', label: t('No versions yet'), disabled: true }];
}
// versions is already newest-first, so [0] is the live/current version.
return versions.map((v, idx) => {
const isCurrent = idx === 0;
return {
key: String(v.transaction_id),
// antd's `disabled: true` greys the item and blocks default
// click handling; combined with the inner div NOT having an
// onClick when current, the row becomes informational only.
disabled: isCurrent,
label: (
<div
style={{ minWidth: 280, lineHeight: 1.4, padding: '4px 0' }}
onClick={isCurrent ? undefined : () => handleRestore(v)}
>
<div style={{ fontWeight: 600 }}>
#{v.version_number} {formatChangeTitle(v.changes)}
{isCurrent && (
<span
style={{
marginLeft: 8,
fontWeight: 400,
fontSize: 12,
opacity: 0.7,
}}
>
{t('(current)')}
</span>
)}
</div>
<div style={{ fontSize: 12, opacity: 0.75 }}>
{formatUser(v.changed_by)} · {formatDate(v.issued_at)}
</div>
{v.changes.length > 1 && (
<ul
style={{
margin: '4px 0 0 18px',
padding: 0,
fontSize: 12,
opacity: 0.85,
listStyle: 'disc',
}}
>
{v.changes.slice(0, 5).map((c, i) => (
<li key={i}>{summarizeChange(c)}</li>
))}
{v.changes.length > 5 && (
<li style={{ opacity: 0.6 }}>
{t('+%(n)s more', { n: v.changes.length - 5 })}
</li>
)}
</ul>
)}
</div>
),
};
});
})();
return (
<Dropdown
trigger={['click']}
menu={{ items }}
onOpenChange={open => {
if (open && versions === null && !loading) loadVersions();
}}
>
<Tooltip
id="version-history-tooltip"
title={t('Version history (demo)')}
placement="bottom"
>
<span role="button" tabIndex={0} className="action-button">
<Icons.HistoryOutlined iconSize="l" />
</span>
</Tooltip>
</Dropdown>
);
}

View File

@@ -77,6 +77,8 @@ import { UserWithPermissionsAndRoles } from 'src/types/bootstrapTypes';
import { findPermission } from 'src/utils/findPermission';
import { navigateTo } from 'src/utils/navigationUtils';
import { WIDER_DROPDOWN_WIDTH } from 'src/components/ListView/utils';
// TEMP: sc-103156 versioning demo. Revert before any commit.
import VersionHistoryDropdown from './VersionHistoryDropdown';
const PAGE_SIZE = 25;
const PASSWORDS_NEEDED_MESSAGE = t(
@@ -122,6 +124,10 @@ const Actions = styled.div`
const DASHBOARD_COLUMNS_TO_FETCH = [
'id',
// TEMP: sc-103156 versioning demo. The version-history dropdown
// calls /api/v1/dashboard/<uuid>/versions/, so the row needs `uuid`.
// Revert this entry along with the dropdown component.
'uuid',
'dashboard_title',
'published',
'url',
@@ -504,6 +510,13 @@ function DashboardList(props: DashboardListProps) {
)}
</ConfirmStatusChange>
)}
{/* TEMP: sc-103156 versioning demo. Revert before any commit. */}
{original.uuid && canEdit && (
<VersionHistoryDropdown
dashboardUuid={original.uuid}
onRestored={() => refreshData()}
/>
)}
</Actions>
);
},

View File

@@ -0,0 +1,343 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
// TEMP: Demo aid for sc-103156 entity-versioning. Lets a user open a
// dropdown of recent versions on a dataset and restore one. Not part
// of the merged feature scope (ADR-005 limits v1 to backend); revert
// before pushing the versioning branch.
import { useState, useCallback } from 'react';
import { SupersetClient } from '@superset-ui/core';
import { t } from '@apache-superset/core/translation';
import { Dropdown, Tooltip, Icons } from '@superset-ui/core/components';
interface Change {
kind: string;
path: string[];
from_value: unknown;
to_value: unknown;
}
interface ChangedBy {
id: number;
username: string;
first_name: string;
last_name: string;
}
interface Version {
version_uuid: string;
version_number: number;
transaction_id: number;
operation_type: string;
issued_at: string;
changed_by: ChangedBy | null;
changes: Change[];
}
interface Props {
datasetUuid: string;
onRestored?: () => void;
}
// Layout-record path verbs (set by ``diff_dashboard_layout`` on the
// backend): path = [verb, kind, id]. Same shape across the three
// debug widgets so chart/dataset dropdowns also recognise them — even
// though they don't normally produce layout records, the formatter
// stays uniform.
const LAYOUT_VERBS = new Set(['add', 'remove', 'move', 'edit']);
// Localized labels for the kinds emitted by the backend (layout walker
// + dataset child diff). Defined statically so xgettext can extract them.
const KIND_LABELS: Record<string, string> = {
chart: t('chart'),
row: t('row'),
column: t('column'),
tab: t('tab'),
tabs: t('tabs'),
header: t('header'),
markdown: t('markdown'),
divider: t('divider'),
metric: t('metric'),
};
const localizedKind = (k: string): string => KIND_LABELS[k] ?? k;
function summarizeChange(c: Change): string {
if (c.path.length === 3 && LAYOUT_VERBS.has(String(c.path[0]))) {
const verb = String(c.path[0]);
const kind = localizedKind(String(c.path[1]));
const payload =
((c.to_value ?? c.from_value) as { name?: string } | null) ?? null;
const name = payload?.name;
if (verb === 'add') {
return name
? t('Added %(kind)s "%(name)s"', { kind, name })
: t('Added %(kind)s', { kind });
}
if (verb === 'remove') {
return name
? t('Removed %(kind)s "%(name)s"', { kind, name })
: t('Removed %(kind)s', { kind });
}
if (verb === 'move') {
return name
? t('Moved %(kind)s "%(name)s"', { kind, name })
: t('Moved %(kind)s', { kind });
}
return name
? t('Edited %(kind)s "%(name)s"', { kind, name })
: t('Edited %(kind)s', { kind });
}
const isAdd = c.from_value == null && c.to_value != null;
const isRemove = c.from_value != null && c.to_value == null;
if (c.path.length === 2 && (c.kind === 'column' || c.kind === 'metric')) {
const kind = localizedKind(c.kind);
const name = String(c.path[1]);
if (isAdd) return t('Added %(kind)s "%(name)s"', { kind, name });
if (isRemove) return t('Removed %(kind)s "%(name)s"', { kind, name });
return t('Changed %(kind)s "%(name)s"', { kind, name });
}
if (c.path[0] === 'slices') {
const id = String(c.path[1] ?? '');
if (isAdd) return t('Added chart %(id)s', { id }).trim();
if (isRemove) return t('Removed chart %(id)s', { id }).trim();
return t('Changed chart %(id)s', { id }).trim();
}
if (c.kind === 'field') {
const fieldName = String(c.path[c.path.length - 1]);
const fieldLabel: string =
fieldName === 'dashboard_title'
? t('title')
: fieldName === 'slice_name'
? t('chart name')
: fieldName === 'table_name'
? t('table name')
: fieldName;
const isShortScalar =
c.to_value !== null &&
c.to_value !== undefined &&
(typeof c.to_value === 'string' ||
typeof c.to_value === 'number' ||
typeof c.to_value === 'boolean') &&
String(c.to_value).length <= 80;
if (!isAdd && !isRemove && isShortScalar) {
return t('Changed %(field)s to "%(value)s"', {
field: fieldLabel,
value: String(c.to_value),
});
}
if (isRemove) {
return t('Cleared %(field)s', { field: fieldLabel });
}
if (isAdd && isShortScalar) {
return t('Set %(field)s to "%(value)s"', {
field: fieldLabel,
value: String(c.to_value),
});
}
if (isAdd) return t('Added %(field)s', { field: fieldLabel });
if (isRemove) return t('Removed %(field)s', { field: fieldLabel });
return t('Changed %(field)s', { field: fieldLabel });
}
const kind = localizedKind(c.kind);
if (c.path.length) {
const detail = String(c.path[c.path.length - 1]);
if (isAdd) return t('Added %(kind)s %(detail)s', { kind, detail });
if (isRemove) return t('Removed %(kind)s %(detail)s', { kind, detail });
return t('Changed %(kind)s %(detail)s', { kind, detail });
}
if (isAdd) return t('Added %(kind)s', { kind });
if (isRemove) return t('Removed %(kind)s', { kind });
return t('Changed %(kind)s', { kind });
}
function formatChangeTitle(changes: Change[]): string {
if (!changes.length) return t('Baseline');
const first = summarizeChange(changes[0]);
if (changes.length === 1) return first;
return t('%(first)s (+%(more)s more)', {
first,
more: changes.length - 1,
});
}
function formatUser(by: ChangedBy | null): string {
if (!by) return t('system');
if (by.first_name || by.last_name) {
return `${by.first_name ?? ''} ${by.last_name ?? ''}`.trim();
}
return by.username;
}
function formatDate(iso: string): string {
try {
// Match the Superset locale set in src/views/App.tsx on
// ``document.documentElement.lang`` rather than the browser default.
const lang = document.documentElement.lang || undefined;
return new Date(iso).toLocaleString(lang);
} catch {
return iso;
}
}
export default function VersionHistoryDropdown({
datasetUuid,
onRestored,
}: Props) {
const [versions, setVersions] = useState<Version[] | null>(null);
const [loading, setLoading] = useState(false);
const loadVersions = useCallback(async () => {
setLoading(true);
try {
const { json } = await SupersetClient.get({
endpoint: `/api/v1/dataset/${datasetUuid}/versions/`,
});
const result = (json as { result: Version[] }).result || [];
// Newest first (API returns oldest-first)
setVersions([...result].reverse().slice(0, 20));
} catch (e) {
console.error('Failed to load versions', e);
setVersions([]);
} finally {
setLoading(false);
}
}, [datasetUuid]);
const handleRestore = useCallback(
async (version: Version) => {
const summary = formatChangeTitle(version.changes);
if (
// eslint-disable-next-line no-alert
!window.confirm(
t(
'Restore this dataset to version %(num)s (%(summary)s)? This will overwrite the current state.',
{ num: version.version_number, summary },
),
)
) {
return;
}
try {
await SupersetClient.post({
endpoint: `/api/v1/dataset/${datasetUuid}/versions/${version.version_uuid}/restore`,
});
// eslint-disable-next-line no-alert
window.alert(t('Restored. Reload the page to see the change.'));
if (onRestored) onRestored();
} catch (e) {
console.error('Restore failed', e);
// eslint-disable-next-line no-alert
window.alert(t('Restore failed — see browser console for details.'));
}
},
[datasetUuid, onRestored],
);
const items = (() => {
if (loading) {
return [{ key: 'loading', label: t('Loading…'), disabled: true }];
}
if (!versions) {
return [
{ key: 'empty', label: t('Click to load versions'), disabled: true },
];
}
if (versions.length === 0) {
return [{ key: 'empty', label: t('No versions yet'), disabled: true }];
}
return versions.map((v, idx) => {
const isCurrent = idx === 0;
return {
key: String(v.transaction_id),
disabled: isCurrent,
label: (
<div
style={{ minWidth: 280, lineHeight: 1.4, padding: '4px 0' }}
onClick={isCurrent ? undefined : () => handleRestore(v)}
>
<div style={{ fontWeight: 600 }}>
#{v.version_number} {formatChangeTitle(v.changes)}
{isCurrent && (
<span
style={{
marginLeft: 8,
fontWeight: 400,
fontSize: 12,
opacity: 0.7,
}}
>
{t('(current)')}
</span>
)}
</div>
<div style={{ fontSize: 12, opacity: 0.75 }}>
{formatUser(v.changed_by)} · {formatDate(v.issued_at)}
</div>
{v.changes.length > 1 && (
<ul
style={{
margin: '4px 0 0 18px',
padding: 0,
fontSize: 12,
opacity: 0.85,
listStyle: 'disc',
}}
>
{v.changes.slice(0, 5).map((c, i) => (
<li key={i}>{summarizeChange(c)}</li>
))}
{v.changes.length > 5 && (
<li style={{ opacity: 0.6 }}>
{t('+%(n)s more', { n: v.changes.length - 5 })}
</li>
)}
</ul>
)}
</div>
),
};
});
})();
return (
<Dropdown
trigger={['click']}
menu={{ items }}
onOpenChange={open => {
if (open && versions === null && !loading) loadVersions();
}}
>
<Tooltip
id="version-history-tooltip"
title={t('Version history (demo)')}
placement="bottom"
>
<span role="button" tabIndex={0} className="action-button">
<Icons.HistoryOutlined iconSize="l" />
</span>
</Tooltip>
</Dropdown>
);
}

View File

@@ -99,6 +99,8 @@ import { useSelector } from 'react-redux';
import { QueryObjectColumns } from 'src/views/CRUD/types';
import { WIDER_DROPDOWN_WIDTH } from 'src/components/ListView/utils';
import type { BootstrapData } from 'src/types/bootstrapTypes';
// TEMP: sc-103156 versioning demo. Revert before any commit.
import VersionHistoryDropdown from './VersionHistoryDropdown';
const SEMANTIC_LAYERS_FLAG = 'SEMANTIC_LAYERS' as FeatureFlag;
type DatasetExtra = {
@@ -165,6 +167,7 @@ type Dataset = {
source_type?: 'database' | 'semantic_layer';
explore_url: string;
id: number;
uuid?: string;
owners: Array<Owner>;
schema: string | null;
table_name: string;
@@ -936,6 +939,13 @@ const DatasetList: FunctionComponent<DatasetListProps> = ({
</span>
</Tooltip>
)}
{/* TEMP: sc-103156 versioning demo. Revert before any commit. */}
{original.uuid && canEdit && (
<VersionHistoryDropdown
datasetUuid={original.uuid}
onRestored={() => refreshData()}
/>
)}
</Actions>
);
},

View File

@@ -130,6 +130,9 @@ class ChartRestApi(BaseSupersetModelRestApi):
"screenshot",
"cache_screenshot",
"warm_up_cache",
"list_versions",
"get_version",
"restore_version",
}
class_permission_name = "Chart"
method_permission_name = MODEL_API_RW_METHOD_PERMISSION_MAP
@@ -308,7 +311,13 @@ class ChartRestApi(BaseSupersetModelRestApi):
try:
dash = ChartDAO.get_by_id_or_uuid(id_or_uuid)
result = self.chart_get_response_schema.dump(dash)
return self.response(200, result=result)
from superset.daos.version import VersionDAO
from superset.versioning.etag import set_version_etag
return set_version_etag(
self.response(200, result=result),
VersionDAO.current_live_version_uuid(Slice, dash.id, dash.uuid),
)
except ChartNotFoundError:
return self.response_404()
@@ -415,6 +424,34 @@ class ChartRestApi(BaseSupersetModelRestApi):
type: number
result:
$ref: '#/components/schemas/{{self.__class__.__name__}}.put'
old_version:
type: integer
nullable: true
description: >-
0-based version_number of the live row before this
update. Unstable under retention pruning — see
old_transaction_id for a stable identifier.
new_version:
type: integer
nullable: true
description: >-
0-based version_number of the newly-live row after
this update. Can equal old_version when no
versioned column changed, or when retention
pruning dropped an older closed row in the same
commit.
old_transaction_id:
type: integer
nullable: true
description: Continuum transaction_id of the live
row before this update. Stable across pruning.
new_transaction_id:
type: integer
nullable: true
description: Continuum transaction_id of the live
row after this update. Differs from
old_transaction_id when the update produced a new
version row.
400:
$ref: '#/components/responses/400'
401:
@@ -433,9 +470,43 @@ class ChartRestApi(BaseSupersetModelRestApi):
# This validates custom Schema with custom validations
except ValidationError as error:
return self.response_400(message=error.messages)
# pylint: disable=import-outside-toplevel
from superset.daos.version import VersionDAO
from superset.extensions import db as _db
pre_chart = _db.session.query(Slice).filter(Slice.id == pk).one_or_none()
old_version = VersionDAO.current_version_number(Slice, pk)
old_transaction_id = VersionDAO.current_live_transaction_id(Slice, pk)
old_version_uuid = (
VersionDAO.current_live_version_uuid(Slice, pk, pre_chart.uuid)
if pre_chart is not None
else None
)
try:
changed_model = UpdateChartCommand(pk, item).run()
response = self.response(200, id=changed_model.id, result=item)
new_version = VersionDAO.current_version_number(Slice, changed_model.id)
new_transaction_id = VersionDAO.current_live_transaction_id(
Slice, changed_model.id
)
new_version_uuid = VersionDAO.current_live_version_uuid(
Slice, changed_model.id, changed_model.uuid
)
response = self.response(
200,
id=changed_model.id,
result=item,
old_version=old_version,
new_version=new_version,
old_transaction_id=old_transaction_id,
new_transaction_id=new_transaction_id,
old_version_uuid=str(old_version_uuid) if old_version_uuid else None,
new_version_uuid=str(new_version_uuid) if new_version_uuid else None,
)
from superset.versioning.etag import set_version_etag
set_version_etag(response, new_version_uuid)
except ChartNotFoundError:
response = self.response_404()
except ChartForbiddenError:
@@ -1199,3 +1270,223 @@ class ChartRestApi(BaseSupersetModelRestApi):
)
command.run()
return self.response(200, message="OK")
@expose("/<uuid_str>/versions/", methods=("GET",))
@protect()
@safe
@statsd_metrics
@event_logger.log_this_with_context(
action=lambda self, *args, **kwargs: f"{self.__class__.__name__}.list_versions",
log_to_statsd=False,
)
def list_versions(self, uuid_str: str) -> Response:
"""List version history for a chart.
---
get:
summary: Return the version history for a chart
parameters:
- in: path
schema:
type: string
format: uuid
name: uuid_str
description: Chart UUID
responses:
200:
description: Version history ordered by oldest first
content:
application/json:
schema:
type: object
properties:
result:
type: array
items:
type: object
count:
type: integer
400:
$ref: '#/components/responses/400'
401:
$ref: '#/components/responses/401'
403:
$ref: '#/components/responses/403'
404:
$ref: '#/components/responses/404'
"""
# pylint: disable=import-outside-toplevel
from uuid import UUID
from superset.daos.version import VersionDAO
try:
entity_uuid = UUID(uuid_str)
except ValueError:
return self.response_400(message="Invalid UUID")
versions = VersionDAO.list_versions(Slice, entity_uuid)
if versions is None:
return self.response_404()
from superset.versioning.etag import set_version_etag_by_uuid
return set_version_etag_by_uuid(
self.response(200, result=versions, count=len(versions)),
Slice,
entity_uuid,
)
@expose(
"/<uuid_str>/versions/<version_uuid_str>/",
methods=("GET",),
)
@protect()
@safe
@statsd_metrics
@event_logger.log_this_with_context(
action=lambda self, *args, **kwargs: f"{self.__class__.__name__}.get_version", # noqa: E501
log_to_statsd=False,
)
def get_version(self, uuid_str: str, version_uuid_str: str) -> Response:
"""Return the chart's state at a specific version.
---
get:
summary: Read-only snapshot of the chart at a given version
parameters:
- in: path
schema:
type: string
format: uuid
name: uuid_str
description: Chart UUID
- in: path
schema:
type: string
format: uuid
name: version_uuid_str
description: Version UUID as returned by the list endpoint
responses:
200:
description: Snapshot of the chart at the target version
content:
application/json:
schema:
type: object
properties:
result:
type: object
400:
$ref: '#/components/responses/400'
401:
$ref: '#/components/responses/401'
403:
$ref: '#/components/responses/403'
404:
$ref: '#/components/responses/404'
"""
# pylint: disable=import-outside-toplevel
from uuid import UUID
from superset.daos.version import VersionDAO
try:
entity_uuid = UUID(uuid_str)
except ValueError:
return self.response_400(message="Invalid UUID")
try:
version_uuid = UUID(version_uuid_str)
except ValueError:
return self.response_400(message="Invalid version UUID")
snapshot = VersionDAO.get_version(Slice, entity_uuid, version_uuid)
if snapshot is None:
return self.response_404()
from superset.versioning.etag import set_version_etag_by_uuid
return set_version_etag_by_uuid(
self.response(200, result=snapshot), Slice, entity_uuid
)
@expose(
"/<uuid_str>/versions/<version_uuid_str>/restore",
methods=("POST",),
)
@protect()
@safe
@statsd_metrics
@event_logger.log_this_with_context(
action=lambda self, *args, **kwargs: (
f"{self.__class__.__name__}.restore_version"
), # noqa: E501
log_to_statsd=False,
)
def restore_version(self, uuid_str: str, version_uuid_str: str) -> Response:
"""Restore a chart to a previous version.
---
post:
summary: Revert a chart to an earlier version (non-destructive)
parameters:
- in: path
schema:
type: string
format: uuid
name: uuid_str
description: Chart UUID
- in: path
schema:
type: string
format: uuid
name: version_uuid_str
description: >-
Version UUID as returned by the list-versions endpoint.
Stable across retention pruning.
responses:
200:
description: Chart was restored
content:
application/json:
schema:
type: object
properties:
message:
type: string
400:
$ref: '#/components/responses/400'
401:
$ref: '#/components/responses/401'
403:
$ref: '#/components/responses/403'
404:
$ref: '#/components/responses/404'
422:
$ref: '#/components/responses/422'
"""
# pylint: disable=import-outside-toplevel
from uuid import UUID
from superset.commands.chart.restore_version import (
RestoreChartVersionCommand,
)
try:
entity_uuid = UUID(uuid_str)
except ValueError:
return self.response_400(message="Invalid UUID")
try:
version_uuid = UUID(version_uuid_str)
except ValueError:
return self.response_400(message="Invalid version UUID")
try:
RestoreChartVersionCommand(entity_uuid, version_uuid).run()
except ChartNotFoundError:
return self.response_404()
except ChartForbiddenError:
return self.response_403()
except ChartUpdateFailedError as ex:
logger.error("Error restoring chart version: %s", ex)
return self.response_422(message=str(ex))
from superset.versioning.etag import set_version_etag_by_uuid
return set_version_etag_by_uuid(
self.response(200, message="OK"), Slice, entity_uuid
)

View File

@@ -0,0 +1,49 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Command that restores a chart to a previous version."""
from __future__ import annotations
from functools import partial
from superset.commands.chart.exceptions import (
ChartForbiddenError,
ChartNotFoundError,
ChartUpdateFailedError,
)
from superset.commands.version_restore import BaseRestoreVersionCommand
from superset.models.slice import Slice
from superset.utils.decorators import on_error, transaction
class RestoreChartVersionCommand(BaseRestoreVersionCommand):
"""Revert a chart to a previous version.
The restore is non-destructive: it produces a new version row (authored
by the restoring user), so prior versions remain in the history and the
change is itself reversible. ``@transaction`` wraps :meth:`run` so the
commit that fires Continuum's ``after_flush`` hook — the one that writes
the new version row — is bound to this command's lifecycle.
"""
model_cls = Slice
not_found_exc = ChartNotFoundError
forbidden_exc = ChartForbiddenError
@transaction(on_error=partial(on_error, reraise=ChartUpdateFailedError))
def run(self) -> Slice:
return self._do_restore()

View File

@@ -22,7 +22,7 @@ from typing import Any
from marshmallow import Schema
from sqlalchemy.orm import Session # noqa: F401
from sqlalchemy.sql import delete, select
from sqlalchemy.sql import select
from superset import db
from superset.charts.schemas import ImportV1ChartSchema
@@ -47,6 +47,7 @@ from superset.datasets.schemas import ImportV1DatasetSchema
from superset.extensions import feature_flag_manager
from superset.migrations.shared.native_filters import migrate_dashboard
from superset.models.dashboard import Dashboard, dashboard_slices
from superset.models.slice import Slice
from superset.themes.schemas import ImportV1ThemeSchema
logger = logging.getLogger(__name__)
@@ -167,8 +168,18 @@ class ImportDashboardsCommand(ImportModelsCommand):
)
# import dashboards
#
# Dashboard → charts associations go through the ORM relationship
# (``dashboard.slices = [...]``) rather than Core
# ``delete()``/``insert()`` on the ``dashboard_slices`` table.
# Bulk DML via Core would emit a malformed INSERT into
# ``dashboard_slices_version`` (missing the composite-PK columns)
# because SQLAlchemy-Continuum's M2M tracker can't see per-row
# column values when the DELETE/INSERT goes through the Core
# layer. The same pattern is applied in
# ``superset/commands/importers/v1/assets.py`` and the spike's
# ``DatasetDAO.update_columns`` rewrite.
dashboards: list[Dashboard] = []
dashboard_chart_ids: list[tuple[int, int]] = []
for file_name, config in configs.items():
if file_name.startswith("dashboards/"):
config = update_id_refs(config, chart_ids, dataset_info)
@@ -183,16 +194,9 @@ class ImportDashboardsCommand(ImportModelsCommand):
dashboard = import_dashboard(config, overwrite=overwrite)
dashboards.append(dashboard)
# When overwriting, first delete all existing chart relationships
# so the dashboard is replaced rather than merged
if overwrite:
db.session.execute(
delete(dashboard_slices).where(
dashboard_slices.c.dashboard_id == dashboard.id
)
)
# Collect chart IDs to associate with this dashboard
# Resolve the dashboard's chart membership from the imported
# position_json and apply it to the ORM relationship.
target_chart_ids: list[int] = []
for uuid in find_chart_uuids(config["position"]):
if uuid not in chart_ids:
continue
@@ -201,7 +205,31 @@ class ImportDashboardsCommand(ImportModelsCommand):
overwrite
or (dashboard.id, chart_id) not in existing_relationships
):
dashboard_chart_ids.append((dashboard.id, chart_id))
target_chart_ids.append(chart_id)
if overwrite:
# Replace the dashboard's chart membership entirely.
dashboard.slices = (
db.session.query(Slice)
.filter(Slice.id.in_(target_chart_ids))
.all()
if target_chart_ids
else []
)
# Flush eagerly so the M2M rows land in
# ``dashboard_slices`` before any subsequent
# autoflush fires an inner-flush event handler
# that would reset the relationship change.
db.session.flush()
elif target_chart_ids:
# Append only the new associations to existing ones.
new_slices = (
db.session.query(Slice)
.filter(Slice.id.in_(target_chart_ids))
.all()
)
dashboard.slices = list(dashboard.slices) + new_slices
db.session.flush()
# Handle tags using import_tag function
if feature_flag_manager.is_feature_enabled("TAGGING_SYSTEM"):
@@ -215,14 +243,6 @@ class ImportDashboardsCommand(ImportModelsCommand):
db.session,
)
# set ref in the dashboard_slices table
if dashboard_chart_ids:
values = [
{"dashboard_id": dashboard_id, "slice_id": chart_id}
for (dashboard_id, chart_id) in dashboard_chart_ids
]
db.session.execute(dashboard_slices.insert(), values)
# Migrate any filter-box charts to native dashboard filters.
for dashboard in dashboards:
migrate_dashboard(dashboard)

View File

@@ -0,0 +1,46 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Command that restores a dashboard to a previous version."""
from __future__ import annotations
from functools import partial
from superset.commands.dashboard.exceptions import (
DashboardForbiddenError,
DashboardNotFoundError,
DashboardUpdateFailedError,
)
from superset.commands.version_restore import BaseRestoreVersionCommand
from superset.models.dashboard import Dashboard
from superset.utils.decorators import on_error, transaction
class RestoreDashboardVersionCommand(BaseRestoreVersionCommand):
"""Revert a dashboard (including its chart associations) to a previous
version. See
:class:`superset.commands.chart.restore_version.RestoreChartVersionCommand`
for the general contract.
"""
model_cls = Dashboard
not_found_exc = DashboardNotFoundError
forbidden_exc = DashboardForbiddenError
@transaction(on_error=partial(on_error, reraise=DashboardUpdateFailedError))
def run(self) -> Dashboard:
return self._do_restore()

View File

@@ -59,23 +59,31 @@ class UpdateDashboardCommand(UpdateMixin, BaseCommand):
def run(self) -> Model:
self.validate()
assert self._model is not None
self.process_tab_diff()
self.process_native_filter_diff()
# Suppress autoflush during the update body so that Continuum's
# before_flush baseline listener does not fire mid-operation while
# the session is only partially populated.
with db.session.no_autoflush:
self.process_tab_diff()
self.process_native_filter_diff()
# Update tags
if (tags := self._properties.pop("tags", None)) is not None:
update_tags(ObjectType.dashboard, self._model.id, self._model.tags, tags)
# Update tags
if (tags := self._properties.pop("tags", None)) is not None:
update_tags(
ObjectType.dashboard, self._model.id, self._model.tags, tags
)
# Re-serialize position_json to escape 4-byte Unicode characters
if position_json := self._properties.get("position_json"):
self._properties["position_json"] = json.dumps(json.loads(position_json))
# Re-serialize position_json to escape 4-byte Unicode characters
if position_json := self._properties.get("position_json"):
self._properties["position_json"] = json.dumps(
json.loads(position_json)
)
dashboard = DashboardDAO.update(self._model, self._properties)
if self._properties.get("json_metadata"):
DashboardDAO.set_dash_metadata(
dashboard,
data=json.loads(self._properties.get("json_metadata", "{}")),
)
dashboard = DashboardDAO.update(self._model, self._properties)
if self._properties.get("json_metadata"):
DashboardDAO.set_dash_metadata(
dashboard,
data=json.loads(self._properties.get("json_metadata", "{}")),
)
return dashboard
def validate(self) -> None:

View File

@@ -0,0 +1,47 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Command that restores a dataset (and its columns/metrics) to a
previous version."""
from __future__ import annotations
from functools import partial
from superset.commands.dataset.exceptions import (
DatasetForbiddenError,
DatasetNotFoundError,
DatasetUpdateFailedError,
)
from superset.commands.version_restore import BaseRestoreVersionCommand
from superset.connectors.sqla.models import SqlaTable
from superset.utils.decorators import on_error, transaction
class RestoreDatasetVersionCommand(BaseRestoreVersionCommand):
"""Revert a dataset (and its columns + metrics) to a previous version.
See
:class:`superset.commands.chart.restore_version.RestoreChartVersionCommand`
for the general contract.
"""
model_cls = SqlaTable
not_found_exc = DatasetNotFoundError
forbidden_exc = DatasetForbiddenError
@transaction(on_error=partial(on_error, reraise=DatasetUpdateFailedError))
def run(self) -> SqlaTable:
return self._do_restore()

View File

@@ -19,7 +19,6 @@ from typing import Any, Optional
from marshmallow import Schema
from marshmallow.exceptions import ValidationError
from sqlalchemy.sql import delete, insert
from superset import db
from superset.charts.schemas import ImportV1ChartSchema
@@ -49,7 +48,7 @@ from superset.datasets.schemas import ImportV1DatasetSchema
from superset.extensions import feature_flag_manager
from superset.migrations.shared.native_filters import migrate_dashboard
from superset.models.core import Database
from superset.models.dashboard import Dashboard, dashboard_slices
from superset.models.dashboard import Dashboard
from superset.models.slice import Slice
from superset.models.sql_lab import SavedQuery
from superset.queries.saved_queries.schemas import ImportV1SavedQuerySchema
@@ -165,23 +164,33 @@ class ImportAssetsCommand(BaseCommand):
dashboard = import_dashboard(config, overwrite=overwrite)
# set ref in the dashboard_slices table
dashboard_chart_ids: list[dict[str, int]] = []
# Use ORM-level reassignment instead of Core
# delete()/insert() so SQLAlchemy-Continuum's M2M tracker
# sees per-row changes through the ORM. Bulk DML via Core
# would emit a malformed INSERT into
# ``dashboard_slices_version`` (missing the composite-PK
# columns) — see the parallel rewrite in
# ``DatasetDAO.update_columns`` and the test-factory's
# ``delete_dashboard_slices_associations`` for the same
# reason.
slice_ids: list[int] = []
for uuid in find_chart_uuids(config["position"]):
if uuid not in chart_ids:
break
chart_id = chart_ids[uuid]
dashboard_chart_id = {
"dashboard_id": dashboard.id,
"slice_id": chart_id,
}
dashboard_chart_ids.append(dashboard_chart_id)
slice_ids.append(chart_ids[uuid])
db.session.execute(
delete(dashboard_slices).where(
dashboard_slices.c.dashboard_id == dashboard.id
)
dashboard.slices = (
db.session.query(Slice).filter(Slice.id.in_(slice_ids)).all()
if slice_ids
else []
)
db.session.execute(insert(dashboard_slices).values(dashboard_chart_ids))
# Flush eagerly so the M2M rows land in
# ``dashboard_slices`` before any subsequent autoflush
# fires an inner-flush event handler that would reset
# the relationship change (cf. the SAWarning at
# ``superset/models/helpers.py`` re. "attribute history
# events accumulated ... have been reset").
db.session.flush()
# Handle tags using import_tag function
if feature_flag_manager.is_feature_enabled("TAGGING_SYSTEM"):

View File

@@ -0,0 +1,97 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Shared base for the per-entity restore-version commands.
The three concrete commands (:mod:`superset.commands.chart.restore_version`,
:mod:`superset.commands.dashboard.restore_version`,
:mod:`superset.commands.dataset.restore_version`) differ only in:
* the model class they operate on
* the per-entity ``NotFoundError`` / ``ForbiddenError`` / ``UpdateFailedError``
triplet they raise
Everything else — lookup, ownership check, version-uuid resolution,
restore dispatch, transactional boundary — is identical. The base
defines the workflow; each subclass declares its three exception
classes and decorates :meth:`run` with the right ``failed_exc``.
"""
from __future__ import annotations
import logging
from typing import Any
from uuid import UUID
from superset import security_manager
from superset.commands.base import BaseCommand
from superset.daos.version import VersionDAO
from superset.exceptions import SupersetSecurityException
logger = logging.getLogger(__name__)
class BaseRestoreVersionCommand(BaseCommand):
"""Workflow for a non-destructive version restore on one entity.
Subclasses declare the model class plus the three entity-specific
exception classes; they also decorate :meth:`run` with
``@transaction(on_error=partial(on_error, reraise=<their failed_exc>))``
so the transactional commit boundary maps to the right HTTP-level
error on failure.
"""
#: Subclass overrides — the versioned model class (``Slice`` /
#: ``Dashboard`` / ``SqlaTable``).
model_cls: type
#: Subclass overrides — exception classes raised on the matching
#: failure modes. ``not_found_exc`` covers both "no such entity"
#: and "version_uuid not on this entity"; the API handler maps
#: either to HTTP 404. ``forbidden_exc`` covers the row-level
#: ownership denial; the handler maps it to HTTP 403.
not_found_exc: type[Exception]
forbidden_exc: type[Exception]
def __init__(self, entity_uuid: UUID, version_uuid: UUID) -> None:
self._uuid = entity_uuid
self._version_uuid = version_uuid
def _do_restore(self) -> Any:
"""The actual restore work — call from a ``@transaction``-decorated
:meth:`run` in each subclass."""
self.validate()
version_number = VersionDAO.resolve_version_uuid(
self.model_cls, self._uuid, self._version_uuid
)
if version_number is None:
raise self.not_found_exc()
entity = VersionDAO.restore_version(
self.model_cls, self._uuid, version_number
)
if entity is None:
# Race: entity deleted between validate() and now.
raise self.not_found_exc()
return entity
def validate(self) -> None:
entity = VersionDAO.find_active_by_uuid(self.model_cls, self._uuid)
if entity is None:
raise self.not_found_exc()
try:
security_manager.raise_for_ownership(entity)
except SupersetSecurityException as ex:
raise self.forbidden_exc() from ex

View File

@@ -1164,7 +1164,11 @@ CORS_OPTIONS: dict[Any, Any] = {
"origins": [
"https://tile.openstreetmap.org",
"https://tile.osm.ch",
]
],
# Make the entity-version-history `ETag` header readable by cross-origin
# browser clients. Without this, `fetch()` callers cannot read the header
# even when CORS is otherwise permissive.
"expose_headers": ["ETag"],
}
# Sanitizes the HTML content used in markdowns to allow its rendering in a safe manner.
@@ -1340,6 +1344,17 @@ DATETIME_FORMAT_DETECTION_SAMPLE_SIZE = 1000
# The limit for the Superset Meta DB when the feature flag ENABLE_SUPERSET_META_DB is on
SUPERSET_META_DB_LIMIT: int | None = 1000
# Retention window (days) for entity version history. Version rows
# whose owning ``version_transaction.issued_at`` is older than this
# value are pruned by the ``version_history.prune_old_versions``
# Celery beat task (registered below in ``CeleryConfig.beat_schedule``).
# The live row (``end_transaction_id IS NULL``) and baseline rows
# (``operation_type=0``) are never pruned. ``0`` disables pruning.
# Read from environment variable of the same name.
SUPERSET_VERSION_HISTORY_RETENTION_DAYS: int = int(
os.environ.get("SUPERSET_VERSION_HISTORY_RETENTION_DAYS", "30")
)
# Adds a warning message on sqllab save query and schedule query modals.
SQLLAB_SAVE_WARNING_MESSAGE = None
SQLLAB_SCHEDULE_WARNING_MESSAGE = None
@@ -1404,6 +1419,13 @@ class CeleryConfig: # pylint: disable=too-few-public-methods
"task": "reports.prune_log",
"schedule": crontab(minute=0, hour=0),
},
# Entity version-history retention. Daily at 03:00; the task
# itself short-circuits when SUPERSET_VERSION_HISTORY_RETENTION_DAYS
# is 0 (disabled).
"version_history.prune_old_versions": {
"task": "version_history.prune_old_versions",
"schedule": crontab(minute=0, hour=3),
},
# Uncomment to enable pruning of the query table
# "prune_query": {
# "task": "prune_query",

View File

@@ -872,6 +872,15 @@ class TableColumn(AuditMixinNullable, ImportExportMixin, CertificationMixin, Mod
__tablename__ = "table_columns"
__table_args__ = (UniqueConstraint("table_id", "column_name"),)
# SPIKE (sc-103156-versioning-full-continuum-spike): Continuum-versioned
# again, with audit-field exclusions to suppress the per-column-per-save
# noise rows that ADR-004 flagged as Failure 3. ``changed_on`` refreshes
# on every parent dataset save even when the column itself wasn't user-
# edited; capturing it produced one shadow row per column per save with
# no user signal.
__versioned__: dict[str, Any] = {
"exclude": ["changed_on", "created_on", "changed_by_fk", "created_by_fk"]
}
id = Column(Integer, primary_key=True)
column_name = Column(String(255), nullable=False)
@@ -1117,6 +1126,10 @@ class SqlMetric(AuditMixinNullable, ImportExportMixin, CertificationMixin, Model
__tablename__ = "sql_metrics"
__table_args__ = (UniqueConstraint("table_id", "metric_name"),)
# SPIKE: same audit-field exclusions as TableColumn (see above).
__versioned__: dict[str, Any] = {
"exclude": ["changed_on", "created_on", "changed_by_fk", "created_by_fk"]
}
id = Column(Integer, primary_key=True)
metric_name = Column(String(255), nullable=False)
@@ -1212,9 +1225,18 @@ class SqlMetric(AuditMixinNullable, ImportExportMixin, CertificationMixin, Model
sqlatable_user = DBTable(
"sqlatable_user",
metadata,
Column("id", Integer, primary_key=True),
Column("user_id", Integer, ForeignKey("ab_user.id", ondelete="CASCADE")),
Column("table_id", Integer, ForeignKey("tables.id", ondelete="CASCADE")),
Column(
"user_id",
Integer,
ForeignKey("ab_user.id", ondelete="CASCADE"),
primary_key=True,
),
Column(
"table_id",
Integer,
ForeignKey("tables.id", ondelete="CASCADE"),
primary_key=True,
),
)
@@ -1245,6 +1267,23 @@ class SqlaTable(
owner_class = security_manager.user_model
__tablename__ = "tables"
# Exclude M2M association relationships: Continuum only captures FK columns on
# association INSERTs (not the auto-increment id), which breaks the NOT NULL PK.
# deleted_at exclusion will be added when sc-103157 (soft delete) is merged (T043).
# Audit columns are auto-bumped on every save. Excluding them lets
# Continuum's is_modified() return False on no-op saves (e.g. owners-only
# edits) so we don't create empty version rows. version_transaction.user_id
# / issued_at preserve "who/when".
__versioned__: dict[str, Any] = {
"exclude": [
"owners",
"row_level_security_filters",
"changed_on",
"created_on",
"changed_by_fk",
"created_by_fk",
]
}
# Note this uniqueness constraint is not part of the physical schema, i.e., it does
# not exist in the migrations, but is required by `import_from_dict` to ensure the
@@ -1373,7 +1412,7 @@ class SqlaTable(
name = escape(self.name)
url = escape(self.explore_url)
anchor = f'<a target="_blank" href="{url}">{name}</a>'
return Markup(anchor)
return Markup(anchor) # noqa: S704
def get_catalog_perm(self) -> str | None:
"""Returns catalog permission if present, database one otherwise."""
@@ -2147,17 +2186,25 @@ sa.event.listen(SqlaTable, "after_delete", SqlaTable.after_delete)
RLSFilterRoles = DBTable(
"rls_filter_roles",
metadata,
Column("id", Integer, primary_key=True),
Column("role_id", Integer, ForeignKey("ab_role.id"), nullable=False),
Column("rls_filter_id", Integer, ForeignKey("row_level_security_filters.id")),
Column("role_id", Integer, ForeignKey("ab_role.id"), primary_key=True),
Column(
"rls_filter_id",
Integer,
ForeignKey("row_level_security_filters.id"),
primary_key=True,
),
)
RLSFilterTables = DBTable(
"rls_filter_tables",
metadata,
Column("id", Integer, primary_key=True),
Column("table_id", Integer, ForeignKey("tables.id")),
Column("rls_filter_id", Integer, ForeignKey("row_level_security_filters.id")),
Column("table_id", Integer, ForeignKey("tables.id"), primary_key=True),
Column(
"rls_filter_id",
Integer,
ForeignKey("row_level_security_filters.id"),
primary_key=True,
),
)

View File

@@ -174,6 +174,9 @@ MODEL_API_RW_METHOD_PERMISSION_MAP = {
"put_filters": "write",
"put_colors": "write",
"sync_permissions": "write",
"list_versions": "write",
"get_version": "write",
"restore_version": "write",
}
EXTRA_FORM_DATA_APPEND_KEYS = {

View File

@@ -275,6 +275,88 @@ class DatasetDAO(BaseDAO[SqlaTable]):
return super().update(item, attributes)
@classmethod
def _validate_column_date_formats(
cls, property_columns: list[dict[str, Any]]
) -> None:
for column in property_columns:
if column.get("python_date_format") is None:
continue
if not DatasetDAO.validate_python_date_format(column["python_date_format"]):
raise ValueError(
"python_date_format is an invalid date/timestamp format."
)
@classmethod
def _override_columns(
cls, model: SqlaTable, property_columns: list[dict[str, Any]]
) -> None:
"""Replace columns by natural key (``column_name``) — update in place
rather than delete-and-reinsert.
SPIKE (sc-103156-versioning-full-continuum-spike): the previous
delete-and-reinsert pattern produced overlapping shadow rows in
``table_columns_version`` (the same ``column_name`` had a DELETE
shadow at tx N alongside an INSERT shadow at tx N for a fresh PK).
Continuum's ``Reverter`` couldn't unwind this on restore: its flush
ordering inserts the historical row before deleting the live one,
hitting the ``UNIQUE (table_id, column_name)`` constraint mid-flush
(ADR-004 Failure 1).
The natural-key upsert keeps PKs stable across metadata refresh.
Continuum captures only real field changes; new columns get plain
INSERT shadows; removed columns get plain DELETE shadows. No
natural-key collisions, so Reverter can restore cleanly.
Behaviour change vs. the previous implementation: PKs of unchanged
columns are preserved. Charts that reference columns by their
``id`` continue to work across a metadata refresh — previously
such references would be invalidated.
"""
existing_by_name = {c.column_name: c for c in model.columns}
incoming_by_name = {p["column_name"]: p for p in property_columns}
# Update columns present in both: in-place setattr.
for name, col in existing_by_name.items():
if name in incoming_by_name:
for key, value in incoming_by_name[name].items():
setattr(col, key, value)
# Insert columns present only in incoming.
for name, properties in incoming_by_name.items():
if name not in existing_by_name:
db.session.add(TableColumn(**{**properties, "table_id": model.id}))
# Delete columns present only in existing.
for name, col in existing_by_name.items():
if name not in incoming_by_name:
db.session.delete(col)
@classmethod
def _upsert_columns(
cls, model: SqlaTable, property_columns: list[dict[str, Any]]
) -> None:
columns_by_id = {column.id: column for column in model.columns}
property_columns_by_id = {
properties["id"]: properties
for properties in property_columns
if "id" in properties
}
for properties in property_columns:
if "id" not in properties:
db.session.add(TableColumn(**{**properties, "table_id": model.id}))
for properties in property_columns_by_id.values():
col = columns_by_id[properties["id"]]
for key, value in properties.items():
setattr(col, key, value)
ids_to_keep = property_columns_by_id.keys()
for col in model.columns:
if col.id not in ids_to_keep:
db.session.delete(col)
@classmethod
def update_columns(
cls,
@@ -290,64 +372,15 @@ class DatasetDAO(BaseDAO[SqlaTable]):
- If a column Dict does not have an `id` then we create a new metric.
- If there are extra columns on the metadata db that are not defined on the List
then we delete.
Uses individual ORM operations (not bulk) so that SQLAlchemy-Continuum
can capture each row change in the version history.
"""
for column in property_columns:
if (
"python_date_format" in column
and column["python_date_format"] is not None
):
if not DatasetDAO.validate_python_date_format(
column["python_date_format"]
):
raise ValueError(
"python_date_format is an invalid date/timestamp format."
)
cls._validate_column_date_formats(property_columns)
if override_columns:
db.session.query(TableColumn).filter(
TableColumn.table_id == model.id
).delete(synchronize_session="fetch")
db.session.bulk_insert_mappings(
TableColumn,
[
{**properties, "table_id": model.id}
for properties in property_columns
],
)
cls._override_columns(model, property_columns)
else:
columns_by_id = {column.id: column for column in model.columns}
property_columns_by_id = {
properties["id"]: properties
for properties in property_columns
if "id" in properties
}
db.session.bulk_insert_mappings(
TableColumn,
[
{**properties, "table_id": model.id}
for properties in property_columns
if "id" not in properties
],
)
db.session.bulk_update_mappings(
TableColumn,
[
{**columns_by_id[properties["id"]].__dict__, **properties}
for properties in property_columns_by_id.values()
],
)
db.session.query(TableColumn).filter(
TableColumn.id.in_(
{column.id for column in model.columns}
- property_columns_by_id.keys()
)
).delete(synchronize_session="fetch")
cls._upsert_columns(model, property_columns)
@classmethod
def update_metrics(
@@ -363,6 +396,9 @@ class DatasetDAO(BaseDAO[SqlaTable]):
- If a metric Dict does not have an `id` then we create a new metric.
- If there are extra metrics on the metadata db that are not defined on the List
then we delete.
Uses individual ORM operations (not bulk) so that SQLAlchemy-Continuum
can capture each row change in the version history.
"""
metrics_by_id = {metric.id: metric for metric in model.metrics}
@@ -373,28 +409,22 @@ class DatasetDAO(BaseDAO[SqlaTable]):
if "id" in properties
}
db.session.bulk_insert_mappings(
SqlMetric,
[
{**properties, "table_id": model.id}
for properties in property_metrics
if "id" not in properties
],
)
# Insert new metrics
for properties in property_metrics:
if "id" not in properties:
db.session.add(SqlMetric(**{**properties, "table_id": model.id}))
db.session.bulk_update_mappings(
SqlMetric,
[
{**metrics_by_id[properties["id"]].__dict__, **properties}
for properties in property_metrics_by_id.values()
],
)
# Update existing metrics
for properties in property_metrics_by_id.values():
metric = metrics_by_id[properties["id"]]
for key, value in properties.items():
setattr(metric, key, value)
db.session.query(SqlMetric).filter(
SqlMetric.id.in_(
{metric.id for metric in model.metrics} - property_metrics_by_id.keys()
)
).delete(synchronize_session="fetch")
# Delete removed metrics
ids_to_keep = property_metrics_by_id.keys()
for metric in model.metrics:
if metric.id not in ids_to_keep:
db.session.delete(metric)
@classmethod
def find_dataset_column(cls, dataset_id: int, column_id: int) -> TableColumn | None:

84
superset/daos/version.py Normal file
View File

@@ -0,0 +1,84 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Backward-compat façade for the entity-versioning DAO surface.
The actual implementation lives in :mod:`superset.versioning.queries`
(read side: list/get/resolve/find/UUID derivation) and
:mod:`superset.versioning.restore` (write side: restore + audit
stamping). This module re-exports both under a single ``VersionDAO``
class plus the module-level UUID helpers so existing callers keep
working without changes.
New code should import from the versioning sub-modules directly.
"""
from __future__ import annotations
from superset.versioning.queries import (
current_live_transaction_id,
current_live_version_uuid,
current_version_number,
derive_version_uuid,
find_active_by_uuid,
get_version,
list_change_records_batch,
list_versions,
resolve_version_uuid,
VERSION_UUID_NAMESPACE,
_get_version_count,
)
from superset.versioning.queries import (
derive_version_uuid as _derive_version_uuid, # noqa: F401
)
from superset.versioning.restore import (
restore_version,
_RESTORE_RELATIONS,
_stamp_audit_fields_for_restore,
)
# Re-exports for ``from superset.daos.version import …`` consumers.
__all__ = [
"VERSION_UUID_NAMESPACE",
"VersionDAO",
"derive_version_uuid",
]
class VersionDAO:
"""Thin façade over :mod:`superset.versioning.queries` and
:mod:`superset.versioning.restore`.
Preserved as a single namespace for ergonomic access from API
handlers and command classes; the underlying functions are
importable directly from their respective sub-modules.
"""
# --- read side (queries.py) -------------------------------------------
find_active_by_uuid = staticmethod(find_active_by_uuid)
_get_version_count = staticmethod(_get_version_count)
current_version_number = staticmethod(current_version_number)
current_live_transaction_id = staticmethod(current_live_transaction_id)
current_live_version_uuid = staticmethod(current_live_version_uuid)
list_change_records_batch = staticmethod(list_change_records_batch)
list_versions = staticmethod(list_versions)
resolve_version_uuid = staticmethod(resolve_version_uuid)
get_version = staticmethod(get_version)
# --- write side (restore.py) ------------------------------------------
_RESTORE_RELATIONS = _RESTORE_RELATIONS
restore_version = staticmethod(restore_version)
_stamp_audit_fields_for_restore = staticmethod(_stamp_audit_fields_for_restore)

View File

@@ -252,6 +252,9 @@ class DashboardRestApi(CustomTagsOptimizationMixin, BaseSupersetModelRestApi):
"put_chart_customizations",
"put_colors",
"export_as_example",
"list_versions",
"get_version",
"restore_version",
}
resource_name = "dashboard"
allow_browser_login = True
@@ -522,7 +525,13 @@ class DashboardRestApi(CustomTagsOptimizationMixin, BaseSupersetModelRestApi):
add_extra_log_payload(
dashboard_id=dash.id, action=f"{self.__class__.__name__}.get"
)
return self.response(200, result=result)
from superset.daos.version import VersionDAO
from superset.versioning.etag import set_version_etag
return set_version_etag(
self.response(200, result=result),
VersionDAO.current_live_version_uuid(Dashboard, dash.id, dash.uuid),
)
@expose("/<id_or_slug>/datasets", methods=("GET",))
@protect()
@@ -787,6 +796,34 @@ class DashboardRestApi(CustomTagsOptimizationMixin, BaseSupersetModelRestApi):
$ref: '#/components/schemas/{{self.__class__.__name__}}.put'
last_modified_time:
type: number
old_version:
type: integer
nullable: true
description: >-
0-based version_number of the live row before this
update. Unstable under retention pruning — see
old_transaction_id for a stable identifier.
new_version:
type: integer
nullable: true
description: >-
0-based version_number of the newly-live row after
this update. Can equal old_version when no
versioned column changed, or when retention
pruning dropped an older closed row in the same
commit.
old_transaction_id:
type: integer
nullable: true
description: Continuum transaction_id of the live
row before this update. Stable across pruning.
new_transaction_id:
type: integer
nullable: true
description: Continuum transaction_id of the live
row after this update. Differs from
old_transaction_id when the update produced a new
version row.
400:
$ref: '#/components/responses/400'
401:
@@ -805,17 +842,49 @@ class DashboardRestApi(CustomTagsOptimizationMixin, BaseSupersetModelRestApi):
# This validates custom Schema with custom validations
except ValidationError as error:
return self.response_400(message=error.messages)
# pylint: disable=import-outside-toplevel
from superset.daos.version import VersionDAO
from superset.extensions import db as _db
pre_dashboard = (
_db.session.query(Dashboard).filter(Dashboard.id == pk).one_or_none()
)
old_version = VersionDAO.current_version_number(Dashboard, pk)
old_transaction_id = VersionDAO.current_live_transaction_id(Dashboard, pk)
old_version_uuid = (
VersionDAO.current_live_version_uuid(Dashboard, pk, pre_dashboard.uuid)
if pre_dashboard is not None
else None
)
try:
changed_model = UpdateDashboardCommand(pk, item).run()
last_modified_time = changed_model.changed_on.replace(
microsecond=0
).timestamp()
new_version = VersionDAO.current_version_number(Dashboard, changed_model.id)
new_transaction_id = VersionDAO.current_live_transaction_id(
Dashboard, changed_model.id
)
new_version_uuid = VersionDAO.current_live_version_uuid(
Dashboard, changed_model.id, changed_model.uuid
)
response = self.response(
200,
id=changed_model.id,
result=item,
last_modified_time=last_modified_time,
old_version=old_version,
new_version=new_version,
old_transaction_id=old_transaction_id,
new_transaction_id=new_transaction_id,
old_version_uuid=str(old_version_uuid) if old_version_uuid else None,
new_version_uuid=str(new_version_uuid) if new_version_uuid else None,
)
from superset.versioning.etag import set_version_etag
set_version_etag(response, new_version_uuid)
except DashboardNotFoundError:
response = self.response_404()
except DashboardForbiddenError:
@@ -2205,3 +2274,223 @@ class DashboardRestApi(CustomTagsOptimizationMixin, BaseSupersetModelRestApi):
).timestamp(),
},
)
@expose("/<uuid_str>/versions/", methods=("GET",))
@protect()
@safe
@statsd_metrics
@event_logger.log_this_with_context(
action=lambda self, *args, **kwargs: f"{self.__class__.__name__}.list_versions",
log_to_statsd=False,
)
def list_versions(self, uuid_str: str) -> Response:
"""List version history for a dashboard.
---
get:
summary: Return the version history for a dashboard
parameters:
- in: path
schema:
type: string
format: uuid
name: uuid_str
description: Dashboard UUID
responses:
200:
description: Version history ordered by oldest first
content:
application/json:
schema:
type: object
properties:
result:
type: array
items:
type: object
count:
type: integer
400:
$ref: '#/components/responses/400'
401:
$ref: '#/components/responses/401'
403:
$ref: '#/components/responses/403'
404:
$ref: '#/components/responses/404'
"""
# pylint: disable=import-outside-toplevel
from uuid import UUID
from superset.daos.version import VersionDAO
try:
entity_uuid = UUID(uuid_str)
except ValueError:
return self.response_400(message="Invalid UUID")
versions = VersionDAO.list_versions(Dashboard, entity_uuid)
if versions is None:
return self.response_404()
from superset.versioning.etag import set_version_etag_by_uuid
return set_version_etag_by_uuid(
self.response(200, result=versions, count=len(versions)),
Dashboard,
entity_uuid,
)
@expose(
"/<uuid_str>/versions/<version_uuid_str>/",
methods=("GET",),
)
@protect()
@safe
@statsd_metrics
@event_logger.log_this_with_context(
action=lambda self, *args, **kwargs: f"{self.__class__.__name__}.get_version", # noqa: E501
log_to_statsd=False,
)
def get_version(self, uuid_str: str, version_uuid_str: str) -> Response:
"""Return the dashboard's state at a specific version.
---
get:
summary: Read-only snapshot of the dashboard at a given version
parameters:
- in: path
schema:
type: string
format: uuid
name: uuid_str
description: Dashboard UUID
- in: path
schema:
type: string
format: uuid
name: version_uuid_str
description: Version UUID as returned by the list endpoint
responses:
200:
description: Snapshot of the dashboard at the target version
content:
application/json:
schema:
type: object
properties:
result:
type: object
400:
$ref: '#/components/responses/400'
401:
$ref: '#/components/responses/401'
403:
$ref: '#/components/responses/403'
404:
$ref: '#/components/responses/404'
"""
# pylint: disable=import-outside-toplevel
from uuid import UUID
from superset.daos.version import VersionDAO
try:
entity_uuid = UUID(uuid_str)
except ValueError:
return self.response_400(message="Invalid UUID")
try:
version_uuid = UUID(version_uuid_str)
except ValueError:
return self.response_400(message="Invalid version UUID")
snapshot = VersionDAO.get_version(Dashboard, entity_uuid, version_uuid)
if snapshot is None:
return self.response_404()
from superset.versioning.etag import set_version_etag_by_uuid
return set_version_etag_by_uuid(
self.response(200, result=snapshot), Dashboard, entity_uuid
)
@expose(
"/<uuid_str>/versions/<version_uuid_str>/restore",
methods=("POST",),
)
@protect()
@safe
@statsd_metrics
@event_logger.log_this_with_context(
action=lambda self, *args, **kwargs: (
f"{self.__class__.__name__}.restore_version"
), # noqa: E501
log_to_statsd=False,
)
def restore_version(self, uuid_str: str, version_uuid_str: str) -> Response:
"""Restore a dashboard to a previous version.
---
post:
summary: Revert a dashboard to an earlier version (non-destructive)
parameters:
- in: path
schema:
type: string
format: uuid
name: uuid_str
description: Dashboard UUID
- in: path
schema:
type: string
format: uuid
name: version_uuid_str
description: >-
Version UUID as returned by the list-versions endpoint.
Stable across retention pruning.
responses:
200:
description: Dashboard was restored
content:
application/json:
schema:
type: object
properties:
message:
type: string
400:
$ref: '#/components/responses/400'
401:
$ref: '#/components/responses/401'
403:
$ref: '#/components/responses/403'
404:
$ref: '#/components/responses/404'
422:
$ref: '#/components/responses/422'
"""
# pylint: disable=import-outside-toplevel
from uuid import UUID
from superset.commands.dashboard.restore_version import (
RestoreDashboardVersionCommand,
)
try:
entity_uuid = UUID(uuid_str)
except ValueError:
return self.response_400(message="Invalid UUID")
try:
version_uuid = UUID(version_uuid_str)
except ValueError:
return self.response_400(message="Invalid version UUID")
try:
RestoreDashboardVersionCommand(entity_uuid, version_uuid).run()
except DashboardNotFoundError:
return self.response_404()
except DashboardForbiddenError:
return self.response_403()
except DashboardUpdateFailedError as ex:
logger.error("Error restoring dashboard version: %s", ex)
return self.response_422(message=str(ex))
from superset.versioning.etag import set_version_etag_by_uuid
return set_version_etag_by_uuid(
self.response(200, message="OK"), Dashboard, entity_uuid
)

View File

@@ -111,6 +111,9 @@ class DatasetRestApi(BaseSupersetModelRestApi):
"get_or_create_dataset",
"warm_up_cache",
"get_drill_info",
"list_versions",
"get_version",
"restore_version",
}
list_columns = [
"id",
@@ -410,6 +413,40 @@ class DatasetRestApi(BaseSupersetModelRestApi):
type: number
result:
$ref: '#/components/schemas/{{self.__class__.__name__}}.put'
old_version:
type: integer
nullable: true
description: >-
0-based version_number of the live row before this
update (null if the dataset had no prior history).
Matches the ``version_number`` field of the list
versions endpoint. Unstable under retention
pruning — see ``old_transaction_id`` for a stable
identifier.
new_version:
type: integer
nullable: true
description: >-
0-based version_number of the newly-live row after
this update. Can equal ``old_version`` when no
versioned column changed, or when retention
pruning dropped an older closed row in the same
commit.
old_transaction_id:
type: integer
nullable: true
description: >-
Continuum transaction_id of the live row before
this update. Stable across retention pruning.
new_transaction_id:
type: integer
nullable: true
description: >-
Continuum transaction_id of the live row after
this update. When this differs from
``old_transaction_id`` the update produced a new
version row (regardless of whether ``new_version``
changed).
400:
$ref: '#/components/responses/400'
401:
@@ -433,11 +470,47 @@ class DatasetRestApi(BaseSupersetModelRestApi):
# This validates custom Schema with custom validations
except ValidationError as error:
return self.response_400(message=error.messages)
# pylint: disable=import-outside-toplevel
from superset.daos.version import VersionDAO
from superset.extensions import db as _db
pre_dataset = (
_db.session.query(SqlaTable).filter(SqlaTable.id == pk).one_or_none()
)
old_version = VersionDAO.current_version_number(SqlaTable, pk)
old_transaction_id = VersionDAO.current_live_transaction_id(SqlaTable, pk)
old_version_uuid = (
VersionDAO.current_live_version_uuid(SqlaTable, pk, pre_dataset.uuid)
if pre_dataset is not None
else None
)
try:
changed_model = UpdateDatasetCommand(pk, item, override_columns).run()
if override_columns:
RefreshDatasetCommand(pk).run()
response = self.response(200, id=changed_model.id, result=item)
new_version = VersionDAO.current_version_number(SqlaTable, changed_model.id)
new_transaction_id = VersionDAO.current_live_transaction_id(
SqlaTable, changed_model.id
)
new_version_uuid = VersionDAO.current_live_version_uuid(
SqlaTable, changed_model.id, changed_model.uuid
)
response = self.response(
200,
id=changed_model.id,
result=item,
old_version=old_version,
new_version=new_version,
old_transaction_id=old_transaction_id,
new_transaction_id=new_transaction_id,
old_version_uuid=str(old_version_uuid) if old_version_uuid else None,
new_version_uuid=str(new_version_uuid) if new_version_uuid else None,
)
from superset.versioning.etag import set_version_etag
set_version_etag(response, new_version_uuid)
except DatasetNotFoundError:
response = self.response_404()
except DatasetForbiddenError:
@@ -706,8 +779,9 @@ class DatasetRestApi(BaseSupersetModelRestApi):
@safe
@statsd_metrics
@event_logger.log_this_with_context(
action=lambda self, *args, **kwargs: f"{self.__class__.__name__}"
".detect_datetime_formats",
action=lambda self, *args, **kwargs: (
f"{self.__class__.__name__}.detect_datetime_formats"
),
log_to_statsd=False,
)
def detect_datetime_formats(self, pk: int) -> Response:
@@ -788,8 +862,9 @@ class DatasetRestApi(BaseSupersetModelRestApi):
@safe
@statsd_metrics
@event_logger.log_this_with_context(
action=lambda self, *args, **kwargs: f"{self.__class__.__name__}"
f".related_objects",
action=lambda self, *args, **kwargs: (
f"{self.__class__.__name__}.related_objects"
),
log_to_statsd=False,
)
def related_objects(self, id_or_uuid: str) -> Response:
@@ -1045,8 +1120,9 @@ class DatasetRestApi(BaseSupersetModelRestApi):
@safe
@statsd_metrics
@event_logger.log_this_with_context(
action=lambda self, *args, **kwargs: f"{self.__class__.__name__}"
f".get_or_create_dataset",
action=lambda self, *args, **kwargs: (
f"{self.__class__.__name__}.get_or_create_dataset"
),
log_to_statsd=False,
)
def get_or_create_dataset(self) -> Response:
@@ -1258,7 +1334,13 @@ class DatasetRestApi(BaseSupersetModelRestApi):
except SupersetTemplateException as ex:
return self.response(ex.status, message=str(ex))
return self.response(200, **response)
from superset.daos.version import VersionDAO
from superset.versioning.etag import set_version_etag
return set_version_etag(
self.response(200, **response),
VersionDAO.current_live_version_uuid(SqlaTable, table.id, table.uuid),
)
@expose("/<int:pk>/drill_info/", methods=("GET",))
@protect()
@@ -1266,9 +1348,9 @@ class DatasetRestApi(BaseSupersetModelRestApi):
@safe
@statsd_metrics
@event_logger.log_this_with_context(
action=lambda self,
*args,
**kwargs: f"{self.__class__.__name__}.get_drill_info",
action=lambda self, *args, **kwargs: (
f"{self.__class__.__name__}.get_drill_info"
),
log_to_statsd=False,
)
def get_drill_info(self, pk: int, **kwargs: Any) -> Response:
@@ -1403,3 +1485,227 @@ class DatasetRestApi(BaseSupersetModelRestApi):
raise template_exception from ex
return data
@expose("/<uuid_str>/versions/", methods=("GET",))
@protect()
@safe
@statsd_metrics
@event_logger.log_this_with_context(
action=lambda self, *args, **kwargs: f"{self.__class__.__name__}.list_versions",
log_to_statsd=False,
)
def list_versions(self, uuid_str: str) -> Response:
"""List version history for a dataset.
---
get:
summary: Return the version history for a dataset
parameters:
- in: path
schema:
type: string
format: uuid
name: uuid_str
description: Dataset UUID
responses:
200:
description: Version history ordered by oldest first
content:
application/json:
schema:
type: object
properties:
result:
type: array
items:
type: object
count:
type: integer
400:
$ref: '#/components/responses/400'
401:
$ref: '#/components/responses/401'
403:
$ref: '#/components/responses/403'
404:
$ref: '#/components/responses/404'
"""
# pylint: disable=import-outside-toplevel
from uuid import UUID
from superset.daos.version import VersionDAO
try:
entity_uuid = UUID(uuid_str)
except ValueError:
return self.response_400(message="Invalid UUID")
versions = VersionDAO.list_versions(SqlaTable, entity_uuid)
if versions is None:
return self.response_404()
from superset.versioning.etag import set_version_etag_by_uuid
return set_version_etag_by_uuid(
self.response(200, result=versions, count=len(versions)),
SqlaTable,
entity_uuid,
)
@expose(
"/<uuid_str>/versions/<version_uuid_str>/",
methods=("GET",),
)
@protect()
@safe
@statsd_metrics
@event_logger.log_this_with_context(
action=lambda self, *args, **kwargs: f"{self.__class__.__name__}.get_version", # noqa: E501
log_to_statsd=False,
)
def get_version(self, uuid_str: str, version_uuid_str: str) -> Response:
"""Return the dataset's state at a specific version.
---
get:
summary: Read-only snapshot of the dataset at a given version
description: >-
Returns the dataset's scalar fields plus reconstructed
``columns`` and ``metrics`` lists as they were at the target
version. Does not modify live state.
parameters:
- in: path
schema:
type: string
format: uuid
name: uuid_str
description: Dataset UUID
- in: path
schema:
type: string
format: uuid
name: version_uuid_str
description: Version UUID as returned by the list endpoint
responses:
200:
description: Snapshot of the dataset at the target version
content:
application/json:
schema:
type: object
properties:
result:
type: object
400:
$ref: '#/components/responses/400'
401:
$ref: '#/components/responses/401'
403:
$ref: '#/components/responses/403'
404:
$ref: '#/components/responses/404'
"""
# pylint: disable=import-outside-toplevel
from uuid import UUID
from superset.daos.version import VersionDAO
try:
entity_uuid = UUID(uuid_str)
except ValueError:
return self.response_400(message="Invalid UUID")
try:
version_uuid = UUID(version_uuid_str)
except ValueError:
return self.response_400(message="Invalid version UUID")
snapshot = VersionDAO.get_version(SqlaTable, entity_uuid, version_uuid)
if snapshot is None:
return self.response_404()
from superset.versioning.etag import set_version_etag_by_uuid
return set_version_etag_by_uuid(
self.response(200, result=snapshot), SqlaTable, entity_uuid
)
@expose(
"/<uuid_str>/versions/<version_uuid_str>/restore",
methods=("POST",),
)
@protect()
@safe
@statsd_metrics
@event_logger.log_this_with_context(
action=lambda self, *args, **kwargs: (
f"{self.__class__.__name__}.restore_version"
), # noqa: E501
log_to_statsd=False,
)
def restore_version(self, uuid_str: str, version_uuid_str: str) -> Response:
"""Restore a dataset to a previous version.
---
post:
summary: Revert a dataset to an earlier version (non-destructive)
parameters:
- in: path
schema:
type: string
format: uuid
name: uuid_str
description: Dataset UUID
- in: path
schema:
type: string
format: uuid
name: version_uuid_str
description: >-
Version UUID as returned by the list-versions endpoint.
Stable across retention pruning.
responses:
200:
description: Dataset was restored
content:
application/json:
schema:
type: object
properties:
message:
type: string
400:
$ref: '#/components/responses/400'
401:
$ref: '#/components/responses/401'
403:
$ref: '#/components/responses/403'
404:
$ref: '#/components/responses/404'
422:
$ref: '#/components/responses/422'
"""
# pylint: disable=import-outside-toplevel
from uuid import UUID
from superset.commands.dataset.restore_version import (
RestoreDatasetVersionCommand,
)
try:
entity_uuid = UUID(uuid_str)
except ValueError:
return self.response_400(message="Invalid UUID")
try:
version_uuid = UUID(version_uuid_str)
except ValueError:
return self.response_400(message="Invalid version UUID")
try:
RestoreDatasetVersionCommand(entity_uuid, version_uuid).run()
except DatasetNotFoundError:
return self.response_404()
except DatasetForbiddenError:
return self.response_403()
except DatasetUpdateFailedError as ex:
logger.error("Error restoring dataset version: %s", ex)
return self.response_422(message=str(ex))
from superset.versioning.etag import set_version_etag_by_uuid
return set_version_etag_by_uuid(
self.response(200, message="OK"), SqlaTable, entity_uuid
)

View File

@@ -146,6 +146,31 @@ cache_manager = CacheManager()
celery_app = celery.Celery()
csrf = CSRFProtect()
db = get_sqla_class()()
# make_versioned() MUST be called immediately after db is constructed and before
# any versioned model class is defined. Continuum patches the SQLAlchemy
# metaclass at call time; models constructed before this call are silently skipped.
from sqlalchemy_continuum import ( # noqa: E402
make_versioned,
versioning_manager as _continuum_manager,
)
from superset.versioning.factory import ( # noqa: E402
SkipUnmodifiedPlugin,
VersioningFlaskPlugin,
VersionTransactionFactory,
)
# Rename the transaction table from "transaction" (SQL reserved word) to
# "version_transaction" via the custom factory before make_versioned() fires.
_continuum_manager.transaction_cls = VersionTransactionFactory()
make_versioned(
user_cls=None,
plugins=[VersioningFlaskPlugin(), SkipUnmodifiedPlugin()],
options={"strategy": "validity"},
)
_event_logger: dict[str, Any] = {}
encrypted_field_factory = EncryptedFieldFactory()
event_logger = LocalProxy(lambda: _event_logger.get("event_logger"))

View File

@@ -608,6 +608,61 @@ class SupersetAppInitializer: # pylint: disable=too-many-public-methods
# Surface exceptions during initialization of extensions
print(ex)
def init_versioning(self) -> None:
"""Register SQLAlchemy-Continuum baseline and retention listeners.
Must be called after all versioned model classes have been imported so
that VERSIONED_MODELS can be populated and configure_mappers() has run.
"""
from sqlalchemy.orm import Session # noqa: F401
from sqlalchemy_continuum import version_class
from superset.connectors.sqla.models import SqlaTable
from superset.models.dashboard import Dashboard
from superset.models.slice import Slice
from superset.versioning.baseline import (
register_baseline_listener,
VERSIONED_MODELS,
)
# Note: previously this block called ``configure_mappers()`` before
# importing the snapshot modules, believing their Table declarations
# needed ``version_transaction`` to exist. That's not actually the
# case — the snapshot tables reference ``version_transaction.id``
# only at the DB level (via the migration); the SQLAlchemy Table
# objects here intentionally declare ``transaction_id`` as a plain
# ``BigInteger`` without a FK to avoid the resolution dependency.
# Removing the global ``configure_mappers()`` avoids eagerly
# resolving relationships in other unrelated models (notably
# Flask-AppBuilder's AuditMixin on classes like Tag, whose
# ``created_by`` primaryjoin only resolves under specific class
# registry states in SQLAlchemy 1.4).
from superset.versioning.changes import ( # noqa: E402
register_change_record_listener,
)
# All versioned models — Dashboard / Slice / SqlaTable plus their
# children (TableColumn / SqlMetric) and the dashboard_slices
# M2M — go through Continuum's shadow tables. The JSON-snapshot
# path that previously backed dataset / dashboard child diffs
# has been removed (sc-103156 spike: full Continuum).
for model_cls in (Dashboard, Slice, SqlaTable):
try:
version_class(model_cls) # ensure Continuum wired this model
VERSIONED_MODELS.append(model_cls)
except Exception: # pylint: disable=broad-except # noqa: S110
pass
register_baseline_listener()
register_change_record_listener()
# Retention is time-based and runs out-of-band as a Celery beat
# task — see ``superset/tasks/version_history_retention.py``
# and the ``version_history.prune_old_versions`` entry in
# ``CELERYBEAT_SCHEDULE`` (``superset/config.py``). The previous
# synchronous after_commit listener was retired so retention
# work doesn't add latency to user saves.
def init_app_in_ctx(self) -> None:
"""
Runs init logic in the context of the app
@@ -634,6 +689,9 @@ class SupersetAppInitializer: # pylint: disable=too-many-public-methods
self.init_all_dependencies_and_extensions()
# Must run after all versioned models are imported and mappers configured.
self.init_versioning()
def check_secret_key(self) -> None:
def log_default_secret_key_warning() -> None:
top_banner = 80 * "-" + "\n" + 36 * " " + "WARNING\n" + 80 * "-"

View File

@@ -0,0 +1,258 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""add_entity_version_history_tables
Create SQLAlchemy-Continuum shadow tables for Dashboards, Charts, Datasets,
TableColumns, and SqlMetrics, plus the version_transaction audit table.
These tables store the full history of changes to each entity using Continuum's
validity strategy: the current version row has end_transaction_id = NULL.
Revision ID: 56cd24c07170
Revises: ce6bd21901ab
Create Date: 2026-04-20 00:00:00.000000
"""
from __future__ import annotations
import sqlalchemy as sa
from alembic import op
from sqlalchemy_utils import UUIDType
revision = "56cd24c07170"
# Stacked on sc-105349-composite-association-pks (2bee73611e32) so the
# Continuum shadow tables this migration creates can mirror the
# composite-PK shape of the live association tables. If sc-105349
# is removed from the stack, this should be reverted to "ce6bd21901ab".
down_revision = "2bee73611e32"
def upgrade() -> None:
bind = op.get_bind()
# version_transaction — audit log for each versioning event.
# Continuum emits `nextval('version_transaction_id_seq')` on every INSERT,
# so the sequence must exist before the table on Postgres. SQLite/MySQL
# ignore the explicit CREATE SEQUENCE (they auto-increment natively).
if bind.dialect.name == "postgresql":
op.execute("CREATE SEQUENCE IF NOT EXISTS version_transaction_id_seq")
op.create_table(
"version_transaction",
sa.Column(
"id",
sa.BigInteger(),
sa.Sequence("version_transaction_id_seq"),
primary_key=True,
autoincrement=True,
nullable=False,
),
sa.Column("issued_at", sa.DateTime(), nullable=True),
sa.Column("remote_addr", sa.String(50), nullable=True),
sa.Column("user_id", sa.Integer(), nullable=True),
)
if bind.dialect.name == "postgresql":
op.execute(
"ALTER SEQUENCE version_transaction_id_seq OWNED BY version_transaction.id"
)
# dashboards_version
op.create_table(
"dashboards_version",
sa.Column("uuid", UUIDType(binary=True), nullable=True),
sa.Column("created_on", sa.DateTime(), nullable=True),
sa.Column("changed_on", sa.DateTime(), nullable=True),
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("dashboard_title", sa.String(500), nullable=True),
sa.Column("position_json", sa.Text(), nullable=True),
sa.Column("description", sa.Text(), nullable=True),
sa.Column("css", sa.Text(), nullable=True),
sa.Column("theme_id", sa.Integer(), nullable=True),
sa.Column("certified_by", sa.Text(), nullable=True),
sa.Column("certification_details", sa.Text(), nullable=True),
sa.Column("json_metadata", sa.Text(), nullable=True),
sa.Column("slug", sa.String(255), nullable=True),
sa.Column("published", sa.Boolean(), nullable=True),
sa.Column("is_managed_externally", sa.Boolean(), nullable=True),
sa.Column("external_url", sa.Text(), nullable=True),
sa.Column("created_by_fk", sa.Integer(), nullable=True),
sa.Column("changed_by_fk", sa.Integer(), nullable=True),
sa.Column("transaction_id", sa.BigInteger(), nullable=False),
sa.Column("end_transaction_id", sa.BigInteger(), nullable=True),
sa.Column("operation_type", sa.SmallInteger(), nullable=False),
sa.PrimaryKeyConstraint("id", "transaction_id"),
sa.ForeignKeyConstraint(
["transaction_id"],
["version_transaction.id"],
name="fk_dashboards_version_transaction_id",
),
sa.ForeignKeyConstraint(
["end_transaction_id"],
["version_transaction.id"],
name="fk_dashboards_version_end_transaction_id",
),
)
op.create_index(
"ix_dashboards_version_end_transaction_id",
"dashboards_version",
["end_transaction_id"],
)
op.create_index(
"ix_dashboards_version_operation_type",
"dashboards_version",
["operation_type"],
)
op.create_index(
"ix_dashboards_version_transaction_id",
"dashboards_version",
["transaction_id"],
)
# slices_version (Charts)
op.create_table(
"slices_version",
sa.Column("uuid", UUIDType(binary=True), nullable=True),
sa.Column("created_on", sa.DateTime(), nullable=True),
sa.Column("changed_on", sa.DateTime(), nullable=True),
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("slice_name", sa.String(250), nullable=True),
sa.Column("datasource_id", sa.Integer(), nullable=True),
sa.Column("datasource_type", sa.String(200), nullable=True),
sa.Column("datasource_name", sa.String(2000), nullable=True),
sa.Column("viz_type", sa.String(250), nullable=True),
sa.Column("params", sa.Text(), nullable=True),
sa.Column("description", sa.Text(), nullable=True),
sa.Column("cache_timeout", sa.Integer(), nullable=True),
sa.Column("perm", sa.String(1000), nullable=True),
sa.Column("schema_perm", sa.String(1000), nullable=True),
sa.Column("catalog_perm", sa.String(1000), nullable=True),
sa.Column("last_saved_at", sa.DateTime(), nullable=True),
sa.Column("last_saved_by_fk", sa.Integer(), nullable=True),
sa.Column("certified_by", sa.Text(), nullable=True),
sa.Column("certification_details", sa.Text(), nullable=True),
sa.Column("is_managed_externally", sa.Boolean(), nullable=True),
sa.Column("external_url", sa.Text(), nullable=True),
sa.Column("created_by_fk", sa.Integer(), nullable=True),
sa.Column("changed_by_fk", sa.Integer(), nullable=True),
sa.Column("transaction_id", sa.BigInteger(), nullable=False),
sa.Column("end_transaction_id", sa.BigInteger(), nullable=True),
sa.Column("operation_type", sa.SmallInteger(), nullable=False),
sa.PrimaryKeyConstraint("id", "transaction_id"),
sa.ForeignKeyConstraint(
["transaction_id"],
["version_transaction.id"],
name="fk_slices_version_transaction_id",
),
sa.ForeignKeyConstraint(
["end_transaction_id"],
["version_transaction.id"],
name="fk_slices_version_end_transaction_id",
),
)
op.create_index(
"ix_slices_version_end_transaction_id",
"slices_version",
["end_transaction_id"],
)
op.create_index(
"ix_slices_version_operation_type",
"slices_version",
["operation_type"],
)
op.create_index(
"ix_slices_version_transaction_id",
"slices_version",
["transaction_id"],
)
# tables_version (SqlaTable / Datasets)
op.create_table(
"tables_version",
sa.Column("uuid", UUIDType(binary=True), nullable=True),
sa.Column("created_on", sa.DateTime(), nullable=True),
sa.Column("changed_on", sa.DateTime(), nullable=True),
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("description", sa.Text(), nullable=True),
sa.Column("default_endpoint", sa.Text(), nullable=True),
sa.Column("is_featured", sa.Boolean(), nullable=True),
sa.Column("filter_select_enabled", sa.Boolean(), nullable=True),
sa.Column("offset", sa.Integer(), nullable=True),
sa.Column("cache_timeout", sa.Integer(), nullable=True),
sa.Column("params", sa.String(1000), nullable=True),
sa.Column("perm", sa.String(1000), nullable=True),
sa.Column("schema_perm", sa.String(1000), nullable=True),
sa.Column("catalog_perm", sa.String(1000), nullable=True),
sa.Column("is_managed_externally", sa.Boolean(), nullable=True),
sa.Column("external_url", sa.Text(), nullable=True),
sa.Column("table_name", sa.String(250), nullable=True),
sa.Column("main_dttm_col", sa.String(250), nullable=True),
sa.Column("currency_code_column", sa.String(250), nullable=True),
sa.Column("database_id", sa.Integer(), nullable=True),
sa.Column("fetch_values_predicate", sa.Text(), nullable=True),
sa.Column("schema", sa.String(255), nullable=True),
sa.Column("catalog", sa.String(256), nullable=True),
sa.Column("sql", sa.Text(), nullable=True),
sa.Column("is_sqllab_view", sa.Boolean(), nullable=True),
sa.Column("template_params", sa.Text(), nullable=True),
sa.Column("extra", sa.Text(), nullable=True),
sa.Column("normalize_columns", sa.Boolean(), nullable=True),
sa.Column("always_filter_main_dttm", sa.Boolean(), nullable=True),
sa.Column("folders", sa.JSON(), nullable=True),
sa.Column("created_by_fk", sa.Integer(), nullable=True),
sa.Column("changed_by_fk", sa.Integer(), nullable=True),
sa.Column("transaction_id", sa.BigInteger(), nullable=False),
sa.Column("end_transaction_id", sa.BigInteger(), nullable=True),
sa.Column("operation_type", sa.SmallInteger(), nullable=False),
sa.PrimaryKeyConstraint("id", "transaction_id"),
sa.ForeignKeyConstraint(
["transaction_id"],
["version_transaction.id"],
name="fk_tables_version_transaction_id",
),
sa.ForeignKeyConstraint(
["end_transaction_id"],
["version_transaction.id"],
name="fk_tables_version_end_transaction_id",
),
)
op.create_index(
"ix_tables_version_end_transaction_id",
"tables_version",
["end_transaction_id"],
)
op.create_index(
"ix_tables_version_operation_type",
"tables_version",
["operation_type"],
)
op.create_index(
"ix_tables_version_transaction_id",
"tables_version",
["transaction_id"],
)
def downgrade() -> None:
op.drop_table("tables_version")
op.drop_table("slices_version")
op.drop_table("dashboards_version")
op.drop_table("version_transaction")
bind = op.get_bind()
if bind.dialect.name == "postgresql":
op.execute("DROP SEQUENCE IF EXISTS version_transaction_id_seq")

View File

@@ -0,0 +1,130 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""add_version_changes_table
Creates ``version_changes``, a field-level diff log keyed to a
(transaction, entity) pair. Each row describes one atomic change
(one field or one child-collection element) that occurred to one
entity during a save. Phase-2 UI will render these rows into
human-readable summaries via the frontend translator.
Shape:
(id, transaction_id, entity_kind, entity_id,
sequence, kind, path, from_value, to_value)
- ``transaction_id`` joins to ``version_transaction`` with ON DELETE
CASCADE so retention pruning of a version row drops its change
records automatically.
- ``entity_kind`` identifies which model type the record is about
(``"chart"`` / ``"dashboard"`` / ``"dataset"``). Required because
a single Continuum transaction can touch more than one versioned
entity (import pipelines, bulk operations, fixture loads), and the
API needs to filter a given entity's records precisely.
- ``entity_id`` is the entity's primary key — joins to ``slices.id``
/ ``dashboards.id`` / ``tables.id`` depending on ``entity_kind``.
- ``sequence`` orders records within one ``(transaction, entity)``
triple — deterministic replay is ``set(state, path, to_value)`` in
ascending sequence.
- ``kind`` is indexed for the Phase-2 "filter history by change type"
query (``WHERE kind = 'filter'``).
- ``path``, ``from_value``, ``to_value`` are JSON because they are
inherently structured (arrays of segments, scalar or object values).
See spec FR-016..FR-021 and data-model.md §``version_changes``.
Revision ID: e1f3c5a7b9d0
Revises: c9d7e21a4b3f
Create Date: 2026-04-24 10:00:00.000000
"""
from __future__ import annotations
import sqlalchemy as sa
from alembic import op
revision = "e1f3c5a7b9d0"
down_revision = "56cd24c07170"
def upgrade() -> None:
op.create_table(
"version_changes",
sa.Column(
"id",
sa.BigInteger(),
primary_key=True,
autoincrement=True,
nullable=False,
),
sa.Column(
"transaction_id",
sa.BigInteger(),
sa.ForeignKey("version_transaction.id", ondelete="CASCADE"),
nullable=False,
),
sa.Column(
"entity_kind",
sa.String(length=32),
nullable=False,
),
sa.Column(
"entity_id",
sa.Integer(),
nullable=False,
),
sa.Column(
"sequence",
sa.SmallInteger(),
nullable=False,
),
sa.Column(
"kind",
sa.String(length=32),
nullable=False,
),
sa.Column("path", sa.JSON(), nullable=False),
sa.Column("from_value", sa.JSON(), nullable=True),
sa.Column("to_value", sa.JSON(), nullable=True),
sa.UniqueConstraint(
"transaction_id",
"entity_kind",
"entity_id",
"sequence",
name="uq_version_changes_tx_entity_sequence",
),
)
op.create_index(
"ix_version_changes_kind",
"version_changes",
["kind"],
)
op.create_index(
"ix_version_changes_transaction_id",
"version_changes",
["transaction_id"],
)
op.create_index(
"ix_version_changes_entity",
"version_changes",
["entity_kind", "entity_id"],
)
def downgrade() -> None:
op.drop_table("version_changes")

View File

@@ -0,0 +1,217 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""spike_add_child_continuum_shadow_tables
SPIKE: Adds the three Continuum shadow tables Continuum auto-registers when
``__versioned__`` is re-applied to ``TableColumn`` / ``SqlMetric`` and the
``slices`` exclude is removed from ``Dashboard.__versioned__``:
- table_columns_version
- sql_metrics_version
- dashboard_slices_version (M2M association version)
Each follows the same shape as the existing parent shadow tables (mirrored
columns from the live table, plus ``transaction_id`` / ``end_transaction_id``
/ ``operation_type`` Continuum bookkeeping columns, with FKs to
``version_transaction.id``).
Generated by hand because the current Continuum + Alembic-autogenerate
interaction trips on the renamed ``transaction`` -> ``version_transaction``
table key (KeyError lookups in ``table_key_to_table``); the existing
``add_entity_version_history_tables`` migration was also hand-massaged
around this issue. The column inventory was sourced from
``version_class(TableColumn).__table__`` / ``SqlMetric`` / the
``dashboard_slices_version`` association entry in Continuum's metadata
(see spike-continuum-restore.md).
Revision ID: f7a2b3c4d5e6
Revises: e1f3c5a7b9d0
Create Date: 2026-04-30 18:00:00.000000
"""
from __future__ import annotations
import sqlalchemy as sa
from alembic import op
from sqlalchemy_utils import UUIDType
revision = "f7a2b3c4d5e6"
down_revision = "e1f3c5a7b9d0"
def upgrade() -> None:
# ------------------------------------------------------------------
# table_columns_version
# ------------------------------------------------------------------
op.create_table(
"table_columns_version",
sa.Column("uuid", UUIDType(binary=True), nullable=True),
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("column_name", sa.String(255), nullable=True),
sa.Column("verbose_name", sa.String(1024), nullable=True),
sa.Column("is_active", sa.Boolean(), nullable=True),
sa.Column("type", sa.Text(), nullable=True),
sa.Column("advanced_data_type", sa.String(255), nullable=True),
sa.Column("groupby", sa.Boolean(), nullable=True),
sa.Column("filterable", sa.Boolean(), nullable=True),
sa.Column("description", sa.Text(), nullable=True),
sa.Column("table_id", sa.Integer(), nullable=True),
sa.Column("is_dttm", sa.Boolean(), nullable=True),
sa.Column("expression", sa.Text(), nullable=True),
sa.Column("python_date_format", sa.String(255), nullable=True),
sa.Column("datetime_format", sa.String(100), nullable=True),
sa.Column("extra", sa.Text(), nullable=True),
sa.Column("transaction_id", sa.BigInteger(), nullable=False),
sa.Column("end_transaction_id", sa.BigInteger(), nullable=True),
sa.Column("operation_type", sa.SmallInteger(), nullable=False),
sa.PrimaryKeyConstraint("id", "transaction_id"),
sa.ForeignKeyConstraint(
["transaction_id"],
["version_transaction.id"],
name="fk_table_columns_version_transaction_id",
),
sa.ForeignKeyConstraint(
["end_transaction_id"],
["version_transaction.id"],
name="fk_table_columns_version_end_transaction_id",
),
)
op.create_index(
"ix_table_columns_version_end_transaction_id",
"table_columns_version",
["end_transaction_id"],
)
op.create_index(
"ix_table_columns_version_operation_type",
"table_columns_version",
["operation_type"],
)
op.create_index(
"ix_table_columns_version_transaction_id",
"table_columns_version",
["transaction_id"],
)
# ------------------------------------------------------------------
# sql_metrics_version
# ------------------------------------------------------------------
op.create_table(
"sql_metrics_version",
sa.Column("uuid", UUIDType(binary=True), nullable=True),
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("metric_name", sa.String(255), nullable=True),
sa.Column("verbose_name", sa.String(1024), nullable=True),
sa.Column("metric_type", sa.String(32), nullable=True),
sa.Column("description", sa.Text(), nullable=True),
sa.Column("d3format", sa.String(128), nullable=True),
sa.Column("currency", sa.JSON(), nullable=True),
sa.Column("warning_text", sa.Text(), nullable=True),
sa.Column("table_id", sa.Integer(), nullable=True),
sa.Column("expression", sa.Text(), nullable=True),
sa.Column("extra", sa.Text(), nullable=True),
sa.Column("transaction_id", sa.BigInteger(), nullable=False),
sa.Column("end_transaction_id", sa.BigInteger(), nullable=True),
sa.Column("operation_type", sa.SmallInteger(), nullable=False),
sa.PrimaryKeyConstraint("id", "transaction_id"),
sa.ForeignKeyConstraint(
["transaction_id"],
["version_transaction.id"],
name="fk_sql_metrics_version_transaction_id",
),
sa.ForeignKeyConstraint(
["end_transaction_id"],
["version_transaction.id"],
name="fk_sql_metrics_version_end_transaction_id",
),
)
op.create_index(
"ix_sql_metrics_version_end_transaction_id",
"sql_metrics_version",
["end_transaction_id"],
)
op.create_index(
"ix_sql_metrics_version_operation_type",
"sql_metrics_version",
["operation_type"],
)
op.create_index(
"ix_sql_metrics_version_transaction_id",
"sql_metrics_version",
["transaction_id"],
)
# ------------------------------------------------------------------
# dashboard_slices_version (M2M association)
#
# The live ``dashboard_slices`` table is reshaped by sc-105349 to a
# composite PK on ``(dashboard_id, slice_id)`` — no surrogate ``id``.
# Continuum auto-mirrors the live columns into the shadow Table at
# ``make_versioned()`` time, so the shadow's SQLAlchemy metadata
# also has no ``id``. The DB shadow PK is the natural composite key
# plus Continuum's bookkeeping (``transaction_id``, ``operation_type``);
# ``operation_type`` is included because a single transaction can in
# principle produce both INSERT and DELETE shadows for the same
# ``(dashboard_id, slice_id)`` pair (slice removed and re-added in
# one save).
#
# If sc-105349 is removed from the stack, the live table reverts to
# carrying its surrogate ``id`` and this migration would need to
# match — see ``spike-continuum-restore.md`` "Branch maintenance".
# ------------------------------------------------------------------
op.create_table(
"dashboard_slices_version",
sa.Column("dashboard_id", sa.Integer(), nullable=False),
sa.Column("slice_id", sa.Integer(), nullable=False),
sa.Column("transaction_id", sa.BigInteger(), nullable=False),
sa.Column("end_transaction_id", sa.BigInteger(), nullable=True),
sa.Column("operation_type", sa.SmallInteger(), nullable=False),
sa.PrimaryKeyConstraint(
"dashboard_id", "slice_id", "transaction_id", "operation_type"
),
sa.ForeignKeyConstraint(
["transaction_id"],
["version_transaction.id"],
name="fk_dashboard_slices_version_transaction_id",
),
sa.ForeignKeyConstraint(
["end_transaction_id"],
["version_transaction.id"],
name="fk_dashboard_slices_version_end_transaction_id",
),
)
op.create_index(
"ix_dashboard_slices_version_end_transaction_id",
"dashboard_slices_version",
["end_transaction_id"],
)
op.create_index(
"ix_dashboard_slices_version_operation_type",
"dashboard_slices_version",
["operation_type"],
)
op.create_index(
"ix_dashboard_slices_version_transaction_id",
"dashboard_slices_version",
["transaction_id"],
)
def downgrade() -> None:
op.drop_table("dashboard_slices_version")
op.drop_table("sql_metrics_version")
op.drop_table("table_columns_version")

View File

@@ -0,0 +1,462 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""composite_pk_association_tables
Replace the unused synthetic ``id INTEGER PRIMARY KEY`` on eight many-to-many
association tables with a composite primary key on the two FK columns. Drops
the now-redundant ``UniqueConstraint(fk1, fk2)`` on the two tables that
already carry one. Pre-flight: deletes rows with NULL FK values (six tables
allow them today) and any duplicate ``(fk1, fk2)`` rows.
Motivated by SQLAlchemy-Continuum issue #129 (M2M restore against junction
tables with surrogate PKs); also closes the data-integrity hole where six
of the eight tables lacked DB-level uniqueness.
Revision ID: 2bee73611e32
Revises: 33d7e0e21daa
Create Date: 2026-05-01 23:36:34.050058
"""
import logging
from typing import NamedTuple
import sqlalchemy as sa
from alembic import op
from sqlalchemy import inspect
from sqlalchemy.engine import Connection
# revision identifiers, used by Alembic.
revision = "2bee73611e32"
down_revision = "33d7e0e21daa"
logger = logging.getLogger("alembic.env")
class AssociationTable(NamedTuple):
"""A junction table being converted from surrogate-id PK to composite-FK PK."""
name: str
fk1: str
fk2: str
# Order is alphabetical by table name; deterministic for review and bisection.
AFFECTED_TABLES: list[AssociationTable] = [
AssociationTable("dashboard_roles", "dashboard_id", "role_id"),
AssociationTable("dashboard_slices", "dashboard_id", "slice_id"),
AssociationTable("dashboard_user", "user_id", "dashboard_id"),
AssociationTable("report_schedule_user", "user_id", "report_schedule_id"),
AssociationTable("rls_filter_roles", "role_id", "rls_filter_id"),
AssociationTable("rls_filter_tables", "table_id", "rls_filter_id"),
AssociationTable("slice_user", "user_id", "slice_id"),
AssociationTable("sqlatable_user", "user_id", "table_id"),
]
# These two tables already declare ``UniqueConstraint(fk1, fk2)`` in the model;
# the composite PK subsumes it, so the migration drops the redundant constraint.
TABLES_WITH_PRE_EXISTING_UNIQUE: set[str] = {
"dashboard_slices",
"report_schedule_user",
}
# Documentation set: tables whose FK columns are nullable in their original
# create_table migrations (``dashboard_roles.dashboard_id`` from revision
# e11ccdd12658 is the most recent addition). ``report_schedule_user`` is the
# only affected table created with both FK columns ``NOT NULL`` and is
# intentionally absent here. This set is no longer consulted at runtime — the
# upgrade now runs the NULL-FK cleanup on every affected table because the
# DELETE is a cheap no-op when the columns are already NOT NULL, and that
# eliminates the risk of bugs from this set going stale (the
# ``dashboard_roles`` omission caught in PR review was exactly that bug).
TABLES_WITH_NULLABLE_FKS: set[str] = {
"dashboard_roles",
"dashboard_slices",
"dashboard_user",
"rls_filter_roles",
"rls_filter_tables",
"slice_user",
"sqlatable_user",
}
def _check_no_external_fks_to_id(conn: Connection) -> None:
"""Raise ``RuntimeError`` if any foreign key in the database references one
of the eight junction-table ``id`` columns. Uses SQLAlchemy's ``Inspector``
for dialect-agnostic introspection across PostgreSQL, MySQL, and SQLite.
Scope limitation: ``Inspector.get_table_names()`` returns tables in the
connection's default schema only. On PostgreSQL deployments where Superset
metadata lives in a non-default schema, or on multi-schema deployments
that allow cross-schema FKs, an external FK in another schema would not
be detected. This is acceptable for the standard single-schema
deployment that Superset documents; operators with multi-schema
metadata should run the equivalent inventory query against
``information_schema.referential_constraints`` themselves before
applying.
"""
affected = {t.name for t in AFFECTED_TABLES}
insp = inspect(conn)
for table_name in insp.get_table_names():
if table_name in affected:
continue
for fk in insp.get_foreign_keys(table_name):
if fk["referred_table"] in affected and "id" in fk["referred_columns"]:
raise RuntimeError(
f"Cannot drop synthetic id from {fk['referred_table']}: "
f"external FK {fk.get('name', '<unnamed>')} on {table_name} "
f"references {fk['referred_table']}({fk['referred_columns']}). "
"Drop or migrate the referencing FK before applying this "
"migration."
)
def _table_clause(t: AssociationTable) -> sa.sql.expression.TableClause:
"""Build a lightweight SQLAlchemy ``TableClause`` for ``t`` exposing the
columns the helper queries reference (``id``, ``fk1``, ``fk2``). Used so
that the dedupe / cleanup / assert SQL can be expressed via SQLAlchemy
core constructs rather than via string interpolation."""
return sa.table(t.name, sa.column("id"), sa.column(t.fk1), sa.column(t.fk2))
def _delete_null_fk_rows(conn: Connection, t: AssociationTable) -> int:
"""Delete rows where ``t.fk1`` or ``t.fk2`` is NULL on ``t.name``.
Returns the deletion count. Required because primary-key columns must be
NOT NULL; the PK-add downstream would fail with a cryptic constraint
violation if any NULL-FK rows survived. Run unconditionally on every
affected table — see ``TABLES_WITH_NULLABLE_FKS`` above for the rationale.
"""
tbl = _table_clause(t)
stmt = sa.delete(tbl).where(sa.or_(tbl.c[t.fk1].is_(None), tbl.c[t.fk2].is_(None)))
result = conn.execute(stmt)
n = result.rowcount or 0
if n:
logger.warning(
"Deleted %d row(s) with NULL FK from %s before composite-PK promotion",
n,
t.name,
)
return n
def _dedupe_by_min_id(conn: Connection, t: AssociationTable) -> int:
"""Delete duplicate ``(t.fk1, t.fk2)`` rows from ``t.name`` keeping ``MIN(id)``.
Returns the deletion count. The ``NOT IN`` argument is wrapped in an
extra ``SELECT keep_id FROM (...) AS s`` derived table because MySQL
rejects ``DELETE FROM t WHERE id NOT IN (SELECT MIN(id) FROM t GROUP BY
...)`` with ERROR 1093 unless the inner SELECT is materialized through
a derived table. SQLAlchemy's ``.subquery()`` produces that wrap.
Logs a sample (up to 10) of the discarded ``(fk1, fk2, id)`` tuples at
WARN before deletion, so operators can audit which rows are dropped —
the "keep ``MIN(id)``" policy preserves the original row, which is
correct in practice but discards any later, semantically-identical
re-grants.
"""
tbl = _table_clause(t)
keep_min = (
sa.select(sa.func.min(tbl.c.id).label("keep_id"))
.group_by(tbl.c[t.fk1], tbl.c[t.fk2])
.subquery("keep_min")
)
keep_ids = sa.select(keep_min.c.keep_id)
discarded = tbl.c.id.notin_(keep_ids)
sample_stmt = (
sa.select(tbl.c[t.fk1], tbl.c[t.fk2], tbl.c.id).where(discarded).limit(10)
)
sample = list(conn.execute(sample_stmt))
delete_stmt = sa.delete(tbl).where(discarded)
result = conn.execute(delete_stmt)
n = result.rowcount or 0
if n:
logger.warning(
"Deduped %d duplicate row(s) from %s; sample of discarded "
"(%s, %s, id) tuples (up to 10): %s",
n,
t.name,
t.fk1,
t.fk2,
sample,
)
return n
def _assert_no_duplicates(conn: Connection, t: AssociationTable) -> None:
"""Raise ``RuntimeError`` if any ``(t.fk1, t.fk2)`` duplicate group remains.
Called after ``_dedupe_by_min_id`` to surface silent dialect-dependent
dedupe failures (e.g., a MySQL syntax issue) as an actionable error
before the PK-add fires with a less-helpful constraint-violation message.
"""
tbl = _table_clause(t)
duplicate_groups = (
sa.select(sa.literal(1))
.select_from(tbl)
.group_by(tbl.c[t.fk1], tbl.c[t.fk2])
.having(sa.func.count() > 1)
.subquery("duplicate_groups")
)
count_stmt = sa.select(sa.func.count()).select_from(duplicate_groups)
if remaining := conn.scalar(count_stmt) or 0:
raise RuntimeError(
f"Dedupe failed for {t.name}: {remaining} duplicate "
f"({t.fk1}, {t.fk2}) groups remain after _dedupe_by_min_id. "
f"Check the dedupe SQL for dialect {conn.dialect.name}."
)
def _build_pre_upgrade_table(
insp: sa.engine.reflection.Inspector, t: AssociationTable
) -> sa.Table:
"""Build a ``Table`` object representing the pre-upgrade schema of ``t``,
explicitly *without* any redundant ``UniqueConstraint(t.fk1, t.fk2)``.
Used as ``copy_from`` to ``batch_alter_table`` so the rebuilt table
omits the unnamed UNIQUE constraint deterministically across dialects
(SQLite reflects unnamed UNIQUEs with ``name=None``, defeating the
standard ``batch_op.drop_constraint(name)`` path).
Reflects column types and FK targets (with original FK constraint names
preserved) from the live database; only the redundant UNIQUE is omitted.
"""
md = sa.MetaData()
fks_for_col: dict[str, list[dict]] = {}
for fk in insp.get_foreign_keys(t.name):
for col_name in fk["constrained_columns"]:
fks_for_col.setdefault(col_name, []).append(fk)
cols: list[sa.Column] = []
for c in insp.get_columns(t.name):
col_kwargs = {"nullable": c.get("nullable", True)}
if c["name"] == "id":
col_kwargs["primary_key"] = True
col_kwargs["autoincrement"] = True
fk_args = []
for fk in fks_for_col.get(c["name"], []):
idx = fk["constrained_columns"].index(c["name"])
target = f"{fk['referred_table']}.{fk['referred_columns'][idx]}"
options = {}
if fk.get("options", {}).get("ondelete"):
options["ondelete"] = fk["options"]["ondelete"]
if fk.get("name"):
options["name"] = fk["name"]
fk_args.append(sa.ForeignKey(target, **options))
cols.append(sa.Column(c["name"], c["type"], *fk_args, **col_kwargs))
return sa.Table(t.name, md, *cols)
def upgrade() -> None:
conn = op.get_bind()
_check_no_external_fks_to_id(conn)
insp = inspect(conn)
for t in AFFECTED_TABLES:
# Run NULL-FK cleanup unconditionally: it is a no-op DELETE on tables
# whose FK columns are already NOT NULL (cheap), and skipping it on a
# table whose FK was nullable would leave the PK-add to fail with a
# cryptic constraint violation. Cf. ``TABLES_WITH_NULLABLE_FKS`` above
# for documentation of which tables are known to have nullable FKs.
_delete_null_fk_rows(conn, t)
_dedupe_by_min_id(conn, t)
_assert_no_duplicates(conn, t)
# For the two tables with a pre-existing redundant UNIQUE
# (``dashboard_slices``, ``report_schedule_user``) build an explicit
# ``copy_from`` Table that omits the UNIQUE; this deterministically
# drops it across all dialects, including SQLite where unnamed
# constraints reflect with ``name=None`` and can't be dropped by
# name. For the other six tables, reflection-based default
# ``batch_alter_table`` (auto-detect) is fine since there's no
# UNIQUE to drop. On PostgreSQL/MySQL, direct ALTER avoids the
# temp-table index-name collision; on SQLite, the auto-detect picks
# ``recreate=True`` because PK changes need it.
if t.name in TABLES_WITH_PRE_EXISTING_UNIQUE:
# MySQL ERROR 1826: foreign-key constraint names are unique
# per-database, not per-table. ``recreate="always"`` builds
# ``_alembic_tmp_<table>`` with the original FK names from
# ``copy_from``, but the original table still holds those
# names until it's dropped, which fails on MySQL with
# ``Duplicate foreign key constraint name``. PostgreSQL and
# SQLite scope FK names per-table, so the recreate path
# works there as-is. Drop the original FKs by name first
# on MySQL; ``copy_from`` re-creates them on the rebuilt
# table with their original names.
if conn.dialect.name == "mysql":
for fk in insp.get_foreign_keys(t.name):
if fk_name := fk.get("name"):
op.drop_constraint(fk_name, t.name, type_="foreignkey")
with op.batch_alter_table(
t.name,
recreate="always",
copy_from=_build_pre_upgrade_table(insp, t),
) as batch_op:
batch_op.drop_column("id")
batch_op.create_primary_key(f"pk_{t.name}", [t.fk1, t.fk2])
# SQLite quirk: composite PRIMARY KEY does not promote the
# constituent columns to NOT NULL (only ``INTEGER PRIMARY
# KEY`` does). PostgreSQL and MySQL implicitly promote the
# PK columns to NOT NULL when the constraint is added,
# so the explicit ``alter_column`` is a no-op on those
# backends but enforces the post-upgrade contract on
# SQLite. Without it, ``INSERT (NULL, 5)`` would succeed
# on SQLite despite the columns being part of the PK.
batch_op.alter_column(t.fk1, existing_type=sa.Integer, nullable=False)
batch_op.alter_column(t.fk2, existing_type=sa.Integer, nullable=False)
else:
with op.batch_alter_table(t.name) as batch_op:
batch_op.drop_column("id")
batch_op.create_primary_key(f"pk_{t.name}", [t.fk1, t.fk2])
# See comment above re: SQLite composite-PK NOT NULL quirk.
batch_op.alter_column(t.fk1, existing_type=sa.Integer, nullable=False)
batch_op.alter_column(t.fk2, existing_type=sa.Integer, nullable=False)
def downgrade() -> None:
# Inverse order: undo upgrade transformations from last-applied to
# first-applied. Within each table, drop the composite PK, restore the
# surrogate ``id`` column, and re-add the original ``UNIQUE`` constraint
# on the two tables that previously carried one.
#
# Note: FK columns remain NOT NULL after downgrade (intentional asymmetry
# — see UPDATING.md). Restoring the original nullable state would require
# an explicit ``alter_column`` per FK per table for no operator value;
# junction-table NULL FKs were always meaningless under ``secondary=``
# semantics.
# The downgrade names the restored PK ``<table>_pkey`` (matching Postgres'
# default constraint-naming convention, which was the original constraint
# name before this migration ran) so a downgrade-then-upgrade round-trip
# doesn't collide on the upgrade's ``pk_<table>`` name.
#
# Adding a NOT NULL ``id`` column to a table with existing rows requires
# a default that fires on the existing rows. ``sa.Identity()`` (Postgres
# 10+ / MySQL 8+) and ``sa.Sequence`` (with explicit nextval) both
# backfill existing rows during ALTER TABLE; bare ``autoincrement=True``
# does not. ``Identity`` is the modern portable choice.
conn = op.get_bind()
insp = inspect(conn)
is_mysql = conn.dialect.name == "mysql"
for t in reversed(AFFECTED_TABLES):
if is_mysql:
_downgrade_mysql_table(insp, t)
else:
with op.batch_alter_table(t.name) as batch_op:
batch_op.drop_constraint(f"pk_{t.name}", type_="primary")
batch_op.add_column(
sa.Column(
"id",
sa.Integer,
sa.Identity(always=False),
nullable=False,
)
)
batch_op.create_primary_key(f"{t.name}_pkey", ["id"])
if t.name in TABLES_WITH_PRE_EXISTING_UNIQUE:
batch_op.create_unique_constraint(
f"uq_{t.name}_{t.fk1}_{t.fk2}", [t.fk1, t.fk2]
)
def _downgrade_mysql_table(
insp: sa.engine.reflection.Inspector, t: AssociationTable
) -> None:
"""MySQL-specific downgrade for one table.
Two MySQL quirks force a dialect-specific path here:
1. **ERROR 1553 — ``Cannot drop index 'PRIMARY': needed in a foreign
key constraint``**. InnoDB uses the composite PK index to back the
FK on the leftmost column. Dropping the PK before the FKs orphans
that backing index. PostgreSQL and SQLite create separate indexes
for FK columns and don't need this dance. We drop the FKs first
and re-add them after the structural change.
2. **``Identity(always=False)`` on a non-PK column add does not emit
``AUTO_INCREMENT`` on MySQL.** SQLAlchemy 1.4 only emits
``AUTO_INCREMENT`` when the column has both ``Identity()`` and
``primary_key=True`` at create time. Our portable path adds the
column first, then creates the PK separately — which works on
Postgres (the column gets ``GENERATED BY DEFAULT AS IDENTITY``)
and SQLite (``INTEGER PRIMARY KEY`` becomes a rowid alias) but
leaves MySQL without auto-generation, so existing rows can't be
backfilled and future ``INSERT`` statements fail with
``Field 'id' doesn't have a default value``. The combined
``DROP PRIMARY KEY, ADD COLUMN AUTO_INCREMENT, ADD PRIMARY KEY``
in a single ALTER statement is the canonical MySQL idiom: MySQL
backfills existing rows with sequential values and the column
remains auto-incrementing for future inserts.
Raw SQL is unavoidable here — there is no SQLAlchemy core equivalent
for the combined-ALTER form, and the constitution allows raw SQL for
dialect-specific DDL with no programmatic equivalent (preferring
triple-quoted strings for legibility).
Belt-and-braces guard: ``t.name`` is interpolated as a backtick-quoted
identifier in the ALTER statements below. The value comes from
``AFFECTED_TABLES`` (a module-level literal), so SQL injection is
structurally precluded. The explicit ``allowed`` check here makes
that invariant load-bearing rather than implicit, so a future
refactor that loosens the call-site can't slip past review.
"""
allowed = {a.name for a in AFFECTED_TABLES}
if t.name not in allowed:
raise RuntimeError(
f"Refusing to ALTER unknown table {t.name!r}: "
f"only AFFECTED_TABLES entries may flow through this path."
)
fks = insp.get_foreign_keys(t.name)
for fk in fks:
if fk_name := fk.get("name"):
op.execute(f"ALTER TABLE `{t.name}` DROP FOREIGN KEY `{fk_name}`")
op.execute(
f"""
ALTER TABLE `{t.name}`
DROP PRIMARY KEY,
ADD COLUMN id INT NOT NULL AUTO_INCREMENT,
ADD PRIMARY KEY (id)
"""
)
if t.name in TABLES_WITH_PRE_EXISTING_UNIQUE:
op.execute(
f"""
ALTER TABLE `{t.name}`
ADD UNIQUE INDEX `uq_{t.name}_{t.fk1}_{t.fk2}`
(`{t.fk1}`, `{t.fk2}`)
"""
)
for fk in fks:
ondelete = fk.get("options", {}).get("ondelete")
ondelete_clause = f" ON DELETE {ondelete}" if ondelete else ""
local_cols = ", ".join(f"`{c}`" for c in fk["constrained_columns"])
ref_cols = ", ".join(f"`{c}`" for c in fk["referred_columns"])
op.execute(
f"""
ALTER TABLE `{t.name}`
ADD CONSTRAINT `{fk["name"]}`
FOREIGN KEY ({local_cols})
REFERENCES `{fk["referred_table"]}` ({ref_cols})
{ondelete_clause}
"""
)

View File

@@ -35,7 +35,6 @@ from sqlalchemy import (
String,
Table,
Text,
UniqueConstraint,
)
from sqlalchemy.engine.base import Connection
from sqlalchemy.orm import relationship, subqueryload
@@ -93,37 +92,53 @@ sqla.event.listen(User, "after_insert", copy_dashboard)
dashboard_slices = Table(
"dashboard_slices",
metadata,
Column("id", Integer, primary_key=True),
Column("dashboard_id", Integer, ForeignKey("dashboards.id", ondelete="CASCADE")),
Column("slice_id", Integer, ForeignKey("slices.id", ondelete="CASCADE")),
UniqueConstraint("dashboard_id", "slice_id"),
Column(
"dashboard_id",
Integer,
ForeignKey("dashboards.id", ondelete="CASCADE"),
primary_key=True,
),
Column(
"slice_id",
Integer,
ForeignKey("slices.id", ondelete="CASCADE"),
primary_key=True,
),
)
dashboard_user = Table(
"dashboard_user",
metadata,
Column("id", Integer, primary_key=True),
Column("user_id", Integer, ForeignKey("ab_user.id", ondelete="CASCADE")),
Column("dashboard_id", Integer, ForeignKey("dashboards.id", ondelete="CASCADE")),
Column(
"user_id",
Integer,
ForeignKey("ab_user.id", ondelete="CASCADE"),
primary_key=True,
),
Column(
"dashboard_id",
Integer,
ForeignKey("dashboards.id", ondelete="CASCADE"),
primary_key=True,
),
)
DashboardRoles = Table(
"dashboard_roles",
metadata,
Column("id", Integer, primary_key=True),
Column(
"dashboard_id",
Integer,
ForeignKey("dashboards.id", ondelete="CASCADE"),
nullable=False,
primary_key=True,
),
Column(
"role_id",
Integer,
ForeignKey("ab_role.id", ondelete="CASCADE"),
nullable=False,
primary_key=True,
),
)
@@ -132,6 +147,27 @@ class Dashboard(CoreDashboard, AuditMixinNullable, ImportExportMixin):
"""The dashboard object!"""
__tablename__ = "dashboards"
# deleted_at exclusion will be added when sc-103157 (soft delete) is merged (T043).
# SPIKE (sc-103156-versioning-full-continuum-spike): ``slices`` removed from
# the exclude list so Continuum auto-creates an association version table
# for ``dashboard_slices`` and ``Reverter(relations=["slices"])`` can
# restore chart membership. Owners / roles stay excluded — access metadata,
# not user-authored content (ADR-005).
# Audit columns (changed_on/created_on/changed_by_fk/created_by_fk) are
# auto-bumped by AuditMixin on every save; excluding them lets Continuum's
# is_modified() return False on no-op saves (e.g. owners-only edits) so we
# don't create empty version rows. version_transaction.user_id /
# issued_at preserve "who/when" without per-row duplication.
__versioned__: dict[str, Any] = {
"exclude": [
"owners",
"roles",
"changed_on",
"created_on",
"changed_by_fk",
"created_by_fk",
]
}
id = Column(Integer, primary_key=True)
dashboard_title = Column(String(500))
position_json = Column(utils.MediumText())
@@ -221,7 +257,7 @@ class Dashboard(CoreDashboard, AuditMixinNullable, ImportExportMixin):
@renders("dashboard_title")
def dashboard_link(self) -> Markup:
title = escape(self.dashboard_title or "<empty>")
return Markup(f'<a href="{self.url}">{title}</a>')
return Markup(f'<a href="{self.url}">{title}</a>') # noqa: S704
@property
def digest(self) -> str | None:

View File

@@ -534,14 +534,23 @@ class ImportExportMixin(UUIDMixin):
def reset_ownership(self) -> None:
"""object will belong to the user the current user"""
# make sure the object doesn't have relations to a user
# it will be filled by appbuilder on save
self.created_by = None
self.changed_by = None
# flask global context might not exist (in cli or tests for example)
# Reset the audit pointers. When a Flask request context is
# available we explicitly stamp the current user, otherwise we
# leave the attributes unset so Flask-AppBuilder's column
# defaults fill them in on save. An explicit assignment is
# required because once the ``created_by`` / ``changed_by``
# relationships are configured (which happens eagerly on models
# registered with SQLAlchemy-Continuum), setting them to
# ``None`` propagates to the FK column and suppresses the
# ``default=`` callable.
self.owners = []
if g and hasattr(g, "user"):
if g and hasattr(g, "user") and g.user:
self.created_by = g.user
self.changed_by = g.user
self.owners = [g.user]
else:
self.created_by = None
self.changed_by = None
@property
def params_dict(self) -> dict[Any, Any]:
@@ -610,7 +619,7 @@ class AuditMixinNullable(AuditMixin):
@renders("changed_on")
def changed_on_(self) -> Markup:
return Markup(f'<span class="no-wrap">{self.changed_on}</span>')
return Markup(f'<span class="no-wrap">{self.changed_on}</span>') # noqa: S704
@renders("changed_on")
def changed_on_delta_humanized(self) -> str:
@@ -654,7 +663,7 @@ class AuditMixinNullable(AuditMixin):
@renders("changed_on")
def modified(self) -> Markup:
return Markup(f'<span class="no-wrap">{self.changed_on_humanized}</span>')
return Markup(f'<span class="no-wrap">{self.changed_on_humanized}</span>') # noqa: S704
class QueryResult: # pylint: disable=too-few-public-methods

View File

@@ -58,9 +58,18 @@ metadata = Model.metadata # pylint: disable=no-member
slice_user = Table(
"slice_user",
metadata,
Column("id", Integer, primary_key=True),
Column("user_id", Integer, ForeignKey("ab_user.id", ondelete="CASCADE")),
Column("slice_id", Integer, ForeignKey("slices.id", ondelete="CASCADE")),
Column(
"user_id",
Integer,
ForeignKey("ab_user.id", ondelete="CASCADE"),
primary_key=True,
),
Column(
"slice_id",
Integer,
ForeignKey("slices.id", ondelete="CASCADE"),
primary_key=True,
),
)
logger = logging.getLogger(__name__)
@@ -73,6 +82,28 @@ class Slice( # pylint: disable=too-many-public-methods
query_context_factory: QueryContextFactory | None = None
__tablename__ = "slices"
# query_context is excluded: it is a cached/regenerated field, not user-authored.
# deleted_at exclusion will be added when sc-103157 (soft delete) is merged (T043).
# Exclude M2M association relationships: Continuum only captures FK columns on
# association INSERTs (not the auto-increment id), which breaks the NOT NULL PK.
# Ownership changes are administrative metadata, not user-authored content.
# Audit / save-marker columns are auto-bumped on every save. Excluding
# them lets Continuum's is_modified() return False on no-op saves
# (e.g. owners-only edits) so we don't create empty version rows.
# version_transaction.user_id / issued_at preserve "who/when".
__versioned__: dict[str, Any] = {
"exclude": [
"query_context",
"owners",
"dashboards",
"changed_on",
"created_on",
"changed_by_fk",
"created_by_fk",
"last_saved_at",
"last_saved_by_fk",
]
}
id = Column(Integer, primary_key=True)
slice_name = Column(String(250))
datasource_id = Column(Integer)
@@ -322,7 +353,7 @@ class Slice( # pylint: disable=too-many-public-methods
@property
def slice_link(self) -> Markup:
name = escape(self.chart)
return Markup(f'<a href="{self.url}">{name}</a>')
return Markup(f'<a href="{self.url}">{name}</a>') # noqa: S704
@property
def icons(self) -> str:

View File

@@ -101,20 +101,18 @@ class ReportSourceFormat(StrEnum):
report_schedule_user = Table(
"report_schedule_user",
metadata,
Column("id", Integer, primary_key=True),
Column(
"user_id",
Integer,
ForeignKey("ab_user.id", ondelete="CASCADE"),
nullable=False,
primary_key=True,
),
Column(
"report_schedule_id",
Integer,
ForeignKey("report_schedule.id", ondelete="CASCADE"),
nullable=False,
primary_key=True,
),
UniqueConstraint("user_id", "report_schedule_id"),
)

View File

@@ -34,7 +34,7 @@ flask_app = create_app()
# Need to import late, as the celery_app will have been setup by "create_app()"
# ruff: noqa: E402, F401
# pylint: disable=wrong-import-position, unused-import
from . import cache, scheduler
from . import cache, scheduler, version_history_retention
# Export the celery app globally for Celery (as run on the cmd line) to find
app = celery_app

View File

@@ -0,0 +1,259 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Celery task: prune old entity-version history.
Retention is time-based. The task deletes parent + child shadow rows
owned by ``version_transaction`` rows whose ``issued_at`` is older
than ``SUPERSET_VERSION_HISTORY_RETENTION_DAYS`` (default 30, env
overridable, ``0`` to disable).
One preservation rule, applied per parent shadow:
* **Live** (``end_transaction_id IS NULL``) — never pruned.
Baseline rows (``operation_type = 0``) and any closed historical row
are subject to the same retention window as everything else. An
entity that hasn't been edited within the window has only its live
row remaining; the historical chain (including the synthetic
baseline) ages out.
If a transaction's parent shadow includes the live row, the whole
transaction is preserved (along with its child shadows and
``version_changes`` rows). Otherwise, all of the transaction's shadow
rows are deleted and the ``version_transaction`` row itself is
dropped — its ``version_changes`` rows cascade via the FK.
Registered via ``CELERYBEAT_SCHEDULE`` in ``superset/config.py``.
Idempotent: a second run prunes nothing.
"""
from __future__ import annotations
import logging
from datetime import datetime, timedelta
from typing import Any
import sqlalchemy as sa
from flask import current_app
from superset.extensions import celery_app, db
logger = logging.getLogger(__name__)
def _resolve_shadow_tables() -> tuple[list[sa.Table], list[sa.Table], sa.Table | None]:
"""Resolve the (parent, child, m2m) shadow Table objects from
Continuum's mapper registry.
Returns:
(parent_tables, child_tables, dashboard_slices_version_table)
``dashboard_slices_version`` is M2M-tracked by Continuum and lives
in metadata under that name (Continuum auto-creates the Table; it
isn't registered as a versioned class). Returned separately because
it doesn't follow the parent/child class shape.
"""
# pylint: disable=import-outside-toplevel
from sqlalchemy_continuum import version_class
from superset.connectors.sqla.models import SqlaTable, SqlMetric, TableColumn
from superset.models.dashboard import Dashboard
from superset.models.slice import Slice
parent_tables: list[sa.Table] = []
for cls in (Dashboard, Slice, SqlaTable):
try:
parent_tables.append(version_class(cls).__table__)
except Exception: # pylint: disable=broad-except # noqa: S112
continue
child_tables: list[sa.Table] = []
for cls in (TableColumn, SqlMetric):
try:
child_tables.append(version_class(cls).__table__)
except Exception: # pylint: disable=broad-except # noqa: S112
continue
metadata = parent_tables[0].metadata if parent_tables else None
m2m_table = (
metadata.tables.get("dashboard_slices_version")
if metadata is not None
else None
)
return parent_tables, child_tables, m2m_table
def _candidate_transaction_ids(
conn: sa.engine.Connection,
cutoff: datetime,
parent_tables: list[sa.Table],
) -> list[int]:
"""Find ``version_transaction.id`` values that are eligible to
prune: ``issued_at < cutoff`` AND not currently the live row of
any versioned entity.
"""
from sqlalchemy_continuum import versioning_manager # noqa: E402
tx_table = versioning_manager.transaction_cls.__table__
candidate_ids = [
row[0]
for row in conn.execute(
sa.select(tx_table.c.id).where(tx_table.c.issued_at < cutoff)
)
]
if not candidate_ids:
return []
# Build the set of transaction ids whose parent shadow includes a
# live row (``end_transaction_id IS NULL``). Those transactions
# represent the current state of an entity and must be preserved
# regardless of age.
preserved_ids: set[int] = set()
for ptbl in parent_tables:
for row in conn.execute(
sa.select(ptbl.c.transaction_id)
.where(ptbl.c.transaction_id.in_(candidate_ids))
.where(ptbl.c.end_transaction_id.is_(None))
.distinct()
):
preserved_ids.add(row[0])
return [tx_id for tx_id in candidate_ids if tx_id not in preserved_ids]
def _delete_for_transactions(
conn: sa.engine.Connection,
tables: list[sa.Table],
tx_ids: list[int],
) -> int:
"""Delete shadow rows in *tables* whose lifespan touches a pruned
transaction — either ``transaction_id`` (created at) or
``end_transaction_id`` (closed at) is in *tx_ids*. Returns total
rowcount across all tables.
The ``end_transaction_id`` predicate is required to keep referential
integrity when transactions span multiple entities. A flush that
saves dashboard + slice + dataset at the same ``tx=X`` produces
three shadow rows sharing that tx. If only the dashboard is later
edited at ``tx=Y``, the dashboard row at ``tx=X`` is closed
(``end_tx=Y``) while the slice/dataset rows stay live at
``tx=X``. Retention preserves ``tx=X`` (slice/dataset are live
there) and prunes ``tx=Y``. Without the ``end_tx`` predicate, the
dashboard's closed row at ``tx=X`` survives step 1 — its
``end_transaction_id=Y`` then violates the FK when step 2 deletes
``version_transaction`` row ``Y``.
Live rows are never matched by either predicate
(``end_transaction_id IS NULL`` is not ``IN`` anything; live rows'
``transaction_id`` is preserved by construction in
:func:`_candidate_transaction_ids`).
"""
if not tx_ids:
return 0
total = 0
for tbl in tables:
result = conn.execute(
sa.delete(tbl).where(
sa.or_(
tbl.c.transaction_id.in_(tx_ids),
tbl.c.end_transaction_id.in_(tx_ids),
)
)
)
total += result.rowcount or 0
return total
def _prune_old_versions_impl(retention_days: int) -> dict[str, Any]:
"""Pure-Python implementation of the prune. Split out from the
Celery task wrapper so unit tests can call it directly without the
Celery harness.
Returns a stats dict for logging / test assertions.
"""
if retention_days <= 0:
logger.info(
"version_history_retention: SUPERSET_VERSION_HISTORY_RETENTION_DAYS "
"<= 0; skipping",
)
return {"skipped": 1}
parent_tables, child_tables, m2m_table = _resolve_shadow_tables()
if not parent_tables:
logger.warning(
"version_history_retention: no versioned classes resolved; skipping",
)
return {"skipped": 1}
cutoff = datetime.utcnow() - timedelta(days=retention_days)
# pylint: disable=import-outside-toplevel
from sqlalchemy_continuum import versioning_manager
tx_table = versioning_manager.transaction_cls.__table__
# ``engine.begin()`` opens its own transaction. The Celery task runs
# outside the request-bound DB session, so we use a fresh connection
# rather than ``db.session`` to avoid stepping on web-request state.
with db.engine.begin() as conn:
tx_ids = _candidate_transaction_ids(conn, cutoff, parent_tables)
if not tx_ids:
return {"pruned_transactions": 0, "cutoff": cutoff.isoformat()}
parent_rows = _delete_for_transactions(conn, parent_tables, tx_ids)
child_rows = _delete_for_transactions(conn, child_tables, tx_ids)
m2m_rows = (
_delete_for_transactions(conn, [m2m_table], tx_ids)
if m2m_table is not None
else 0
)
# Drop the version_transaction rows themselves. ON DELETE
# CASCADE on version_changes.transaction_id removes the
# associated change records automatically.
tx_rows = (
conn.execute(sa.delete(tx_table).where(tx_table.c.id.in_(tx_ids))).rowcount
or 0
)
stats = {
"cutoff": cutoff.isoformat(),
"pruned_transactions": tx_rows,
"pruned_parent_shadows": parent_rows,
"pruned_child_shadows": child_rows,
"pruned_m2m_shadows": m2m_rows,
}
logger.info("version_history_retention: %s", stats)
return stats
@celery_app.task(name="version_history.prune_old_versions")
def prune_old_versions() -> dict[str, Any]:
"""Celery beat task entry point. Wraps the implementation with
config lookup + broad exception handling so a single failed run
doesn't poison the schedule (the next firing retries from a clean
slate).
"""
retention_days: int = current_app.config.get(
"SUPERSET_VERSION_HISTORY_RETENTION_DAYS", 30
)
try:
return _prune_old_versions_impl(retention_days)
except Exception: # pylint: disable=broad-except
logger.exception("version_history.prune_old_versions: task failed")
return {"error": 1}

View File

@@ -18398,3 +18398,127 @@ msgstr "zone de zoom"
msgid "© Layer attribution"
msgstr "© Attribution de la couche"
# sc-103156 entity-versioning UI strings
msgid "Added"
msgstr "Ajouté"
msgid "Removed"
msgstr "Supprimé"
msgid "Changed"
msgstr "Modifié"
msgid "Moved"
msgstr "Déplacé"
msgid "Edited"
msgstr "Modifié"
msgid "Baseline"
msgstr "Version initiale"
msgid "Cleared %(field)s"
msgstr "%(field)s effacé"
msgid "title"
msgstr "titre"
msgid "chart name"
msgstr "nom du graphique"
msgid "table name"
msgstr "nom de la table"
msgid "chart"
msgstr "graphique"
msgid "row"
msgstr "ligne"
msgid "column"
msgstr "colonne"
msgid "tab"
msgstr "onglet"
msgid "tabs"
msgstr "onglets"
msgid "header"
msgstr "en-tête"
msgid "markdown"
msgstr "markdown"
msgid "divider"
msgstr "séparateur"
msgid "metric"
msgstr "mesure"
msgid "Added %(kind)s"
msgstr "%(kind)s ajouté(e)"
msgid "Added %(kind)s \"%(name)s\""
msgstr "%(kind)s « %(name)s » ajouté(e)"
msgid "Removed %(kind)s"
msgstr "%(kind)s supprimé(e)"
msgid "Removed %(kind)s \"%(name)s\""
msgstr "%(kind)s « %(name)s » supprimé(e)"
msgid "Moved %(kind)s"
msgstr "%(kind)s déplacé(e)"
msgid "Moved %(kind)s \"%(name)s\""
msgstr "%(kind)s « %(name)s » déplacé(e)"
msgid "Edited %(kind)s"
msgstr "%(kind)s modifié(e)"
msgid "Edited %(kind)s \"%(name)s\""
msgstr "%(kind)s « %(name)s » modifié(e)"
msgid "Changed %(kind)s"
msgstr "%(kind)s modifié(e)"
msgid "Changed %(kind)s \"%(name)s\""
msgstr "%(kind)s « %(name)s » modifié(e)"
msgid "Added %(kind)s %(detail)s"
msgstr "%(kind)s %(detail)s ajouté(e)"
msgid "Removed %(kind)s %(detail)s"
msgstr "%(kind)s %(detail)s supprimé(e)"
msgid "Changed %(kind)s %(detail)s"
msgstr "%(kind)s %(detail)s modifié(e)"
msgid "Added chart %(id)s"
msgstr "Graphique %(id)s ajouté"
msgid "Removed chart %(id)s"
msgstr "Graphique %(id)s supprimé"
msgid "Changed chart %(id)s"
msgstr "Graphique %(id)s modifié"
msgid "Added %(field)s"
msgstr "%(field)s ajouté"
msgid "Removed %(field)s"
msgstr "%(field)s supprimé"
msgid "Changed %(field)s"
msgstr "%(field)s modifié"
msgid "Changed %(field)s to \"%(value)s\""
msgstr "%(field)s changé en « %(value)s »"
msgid "Set %(field)s to \"%(value)s\""
msgstr "%(field)s défini à « %(value)s »"
msgid "%(first)s (+%(more)s more)"
msgstr "%(first)s (+%(more)s autres)"

View File

@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

View File

@@ -0,0 +1,566 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""before_flush listener that captures a baseline version (version 0) for entities
being updated for the first time after the versioning migration.
The module reads top-down in stepdown order: the public entry point
(``register_baseline_listener``) is at the top; helpers descend to leaf
builders at the bottom. Module-level state (``VERSIONED_MODELS``,
``_CHILD_BASELINE_HANDLERS``) sits next to the helpers that consume it.
VERSIONED_MODELS is populated at app startup by the initialisation code after
make_versioned() has run and all versioned model classes have been defined.
**Inline imports.** Several helpers below use ``# pylint: disable=
import-outside-toplevel`` for imports of ``sqlalchemy_continuum`` and
Superset model classes. The reason is uniform: this module is imported
from ``init_versioning()`` in ``superset/initialization/__init__.py``
before all SQLAlchemy mappers are configured and before Continuum's
``make_versioned()`` has finished wiring shadow classes. Top-level
imports of model classes or Continuum helpers would either trip an
unresolved-mapper error or create an init-order cycle. The lazy form
defers resolution until the helper actually runs, by which point app
init is complete. Per-call ``why-`` comments are omitted to avoid
repeating the same explanation at every callsite; unusual cases (if
any are added) should be commented explicitly.
"""
import functools
import logging
from typing import Any, Callable, Optional
import sqlalchemy as sa
from sqlalchemy import event
from sqlalchemy.exc import InvalidRequestError, OperationalError
from sqlalchemy.orm import attributes, Session
from superset.versioning.utils import read_row_outside_flush
logger = logging.getLogger(__name__)
# Populated at app startup (superset/initialization/__init__.py) before
# register_baseline_listener() is called.
VERSIONED_MODELS: list[type] = []
# ---------------------------------------------------------------------------
# Entry point
# ---------------------------------------------------------------------------
def register_baseline_listener() -> None:
"""Attach the before_flush listener that captures baseline versions.
Call this after VERSIONED_MODELS has been populated and make_versioned() has run.
"""
from superset.extensions import db # pylint: disable=import-outside-toplevel
# insert=True prepends us in the listener chain so we run BEFORE
# Continuum's before_flush. Continuum's pending Transaction object
# (added in its own before_flush) would otherwise get a lower
# auto-increment tx_id than our direct-SQL baseline insert, placing the
# baseline row after the update in version_number order. Prepending
# ensures our baseline's tx_id comes first.
@event.listens_for(db.session, "before_flush", insert=True)
def capture_baseline(session: Session, flush_context: Any, instances: Any) -> None:
if not VERSIONED_MODELS:
return
# Make sure a child-only edit promotes the parent to ``session.dirty``
# before Continuum's before_flush reads the dirty set.
_force_parent_dirty_on_child_change(session)
for obj in _collect_parents_to_baseline(session).values():
if type(obj) not in VERSIONED_MODELS:
continue
version_table = _version_table_for(obj)
if version_table is None:
continue
count = _shadow_row_count(session, obj, version_table)
if count == 0:
_insert_baseline_and_children(session, obj, version_table)
# ---------------------------------------------------------------------------
# High-level helpers used by ``capture_baseline``
# ---------------------------------------------------------------------------
def _force_parent_dirty_on_child_change(session: Session) -> None:
"""Mark a versioned parent as dirty whenever one of its versioned
children appears in ``session.dirty``/``new``/``deleted`` but the
parent's own scalars haven't been edited.
Without this hook, edits that only touch ``TableColumn`` or
``SqlMetric`` rows leave the parent ``SqlaTable`` out of
``session.dirty`` — so Continuum's UnitOfWork never creates a
parent UPDATE operation and ``list_versions`` (which queries the
parent shadow ``tables_version``) returns just the baseline. The
user-visible symptom is "I edited a column description but the
dataset's version history dropdown is empty".
We use ``attributes.flag_modified`` against the parent's first
non-excluded versioned column so SQLAlchemy adds the parent to
``session.dirty`` without altering any column values. Continuum
then writes a parent shadow row at this transaction; its scalar
columns mirror the previous version (only the children changed).
``SkipUnmodifiedPlugin._is_no_op_update`` is taught to recognize
the "scalars match but children dirty" case and keep the row.
"""
# pylint: disable=import-outside-toplevel
from sqlalchemy_continuum.utils import versioned_column_properties
child_map = _child_to_parent_registry()
for obj in list(session.dirty) + list(session.new) + list(session.deleted):
entry = child_map.get(type(obj))
if entry is None:
continue
parent_attr, parent_cls = entry
parent = getattr(obj, parent_attr, None)
if parent is None or type(parent) is not parent_cls: # noqa: E721
continue
col_keys = [prop.key for prop in versioned_column_properties(parent)]
if not col_keys:
continue
# ``uuid`` is on all three versioned parent classes (Dashboard,
# Slice, SqlaTable) and is in none of their ``__versioned__``
# excludes — pick it deterministically so the flagged attribute
# is stable across SQLAlchemy versions / mapper-configuration
# orders. Falls back to the first available column for forks or
# subclasses that excluded ``uuid``.
flag_col = "uuid" if "uuid" in col_keys else col_keys[0]
try:
attributes.flag_modified(parent, flag_col)
except InvalidRequestError:
# The parent is a freshly-constructed ``session.new`` instance
# whose ``uuid`` default (``default=uuid4``) hasn't fired yet
# — the attribute is unloaded in instance state, so
# ``flag_modified`` rejects it. The parent will INSERT in this
# flush regardless, so the flag was redundant; safely skip.
# Hit by ``test_create_dataset_item`` (POST /api/v1/dataset/).
continue
def _collect_parents_to_baseline(session: Session) -> dict[int, Any]:
"""Return parents-to-baseline as ``{id(obj): obj}`` keyed by Python
object identity to dedupe across ``session.dirty + new + deleted``.
Includes both directly-dirty versioned parents and parents reachable
from dirty/new/deleted children via the child→parent registry.
"""
parents: dict[int, Any] = {}
child_map = _child_to_parent_registry()
for obj in list(session.dirty) + list(session.new) + list(session.deleted):
if type(obj) in VERSIONED_MODELS:
parents[id(obj)] = obj
continue
entry = child_map.get(type(obj))
if entry is None:
continue
parent_attr, parent_cls = entry
parent = getattr(obj, parent_attr, None)
if parent is not None and type(parent) is parent_cls: # noqa: E721
parents[id(parent)] = parent
return parents
@functools.cache
def _child_to_parent_registry() -> dict[type, tuple[str, type]]:
"""Map child entity class → (parent-relationship-attr, parent class).
When a dirty child of a known type appears in session.dirty/new/deleted,
we walk to its parent and baseline the parent (+ siblings) under the
SAME flush so pre-edit child values land in the baseline shadow rows.
Without this, edits that only touch child rows produce a "silent" flush
A (just ``TableColumn``) followed by flush B (``SqlaTable.changed_on``);
flush B reads children from DB AFTER flush A already pushed UPDATEs,
capturing post-edit state.
Cached because this is called from ``_force_parent_dirty_on_child_change``
and ``_collect_parents_to_baseline`` on every save flush. The returned
mapping depends only on the (fixed at import time) child model classes,
so an unbounded ``functools.cache`` is the right shape — no invalidation
needed.
"""
# Lazy import: ``baseline`` is imported during ``init_versioning``, which
# runs before all model mappers are configured. Importing model classes
# at module load would either cycle or hit unresolved mappers.
# pylint: disable=import-outside-toplevel
from superset.connectors.sqla.models import SqlaTable, SqlMetric, TableColumn
return {
TableColumn: ("table", SqlaTable),
SqlMetric: ("table", SqlaTable),
}
def _version_table_for(obj: Any) -> Any:
"""Return Continuum's shadow ``Table`` for *obj*'s class, or ``None``
when the class isn't registered (forks / plugins that subclass without
``__versioned__``).
"""
# pylint: disable=import-outside-toplevel
from sqlalchemy_continuum import version_class
from sqlalchemy_continuum.exc import ClassNotVersioned
try:
return version_class(type(obj)).__table__
except ClassNotVersioned:
return None
def _shadow_row_count(session: Session, obj: Any, version_table: Any) -> Optional[int]:
"""Return number of shadow rows for *obj.id* in *version_table*, or
``None`` when the version table is missing (migration not yet applied)
or the count query raised unexpectedly.
"""
try:
with session.no_autoflush:
return (
session.connection()
.execute(
sa.select(sa.func.count())
.select_from(version_table)
.where(version_table.c.id == obj.id)
)
.scalar()
)
except OperationalError:
return None
except Exception: # pylint: disable=broad-except
logger.exception(
"baseline_listener: count query failed for %s id=%s",
type(obj).__name__,
getattr(obj, "id", None),
)
return None
def _insert_baseline_and_children(
session: Session, obj: Any, version_table: Any
) -> None:
"""Insert the parent baseline row, then baseline the parent's child
collections under the same transaction id.
Wrapped in ``no_autoflush`` so ``session.connection()`` inside
``_insert_baseline_row`` does not trigger a flush of Continuum's
pending Transaction object before our direct-SQL insert claims its
tx_id.
"""
try:
with session.no_autoflush:
tx_id = _insert_baseline_row(session, obj, version_table)
if tx_id is None:
return
_baseline_children_for_parent(session, obj, tx_id)
logger.debug(
"baseline_listener: inserted baseline tx_id=%s for %s id=%s",
tx_id,
type(obj).__name__,
getattr(obj, "id", None),
)
except Exception: # pylint: disable=broad-except
logger.exception(
"baseline_listener: failed to insert baseline for %s id=%s",
type(obj).__name__,
getattr(obj, "id", None),
)
# ---------------------------------------------------------------------------
# Mid-level builders: parent shadow + child dispatch
# ---------------------------------------------------------------------------
def _insert_baseline_row(
session: Session, obj: Any, version_table: sa.Table
) -> Optional[int]:
"""Insert a synthetic baseline row capturing the pre-edit DB state of *obj*.
Creates a version_transaction entry and an operation_type=0 version row.
All writes use the session's existing connection so they share the same
database transaction as the triggering flush.
Returns the allocated ``transaction_id`` so the caller can baseline child
collections under the same tx (see :func:`_insert_child_baseline_rows`),
or ``None`` when the entity has no live row.
"""
from sqlalchemy_continuum import (
versioning_manager, # pylint: disable=import-outside-toplevel
)
main_table = type(obj).__table__
row = read_row_outside_flush(session, main_table, obj.id)
if row is None:
return None
conn = session.connection()
# Insert a version_transaction row for the baseline.
#
# ``issued_at`` and ``user_id`` are sourced from the entity's audit fields
# (``changed_on`` / ``changed_by_fk``, falling back to ``created_on`` /
# ``created_by_fk`` if the row was never edited), so the baseline reads
# in the version-history UI as "this is the state at the time of the
# last pre-versioning edit, by that user." Using ``now()`` and the
# current user would have made the baseline look chronologically newer
# than subsequent edits and attributed historical content to the user
# who happened to trigger the first save under versioning.
baseline_issued_at = row.get("changed_on") or row.get("created_on") or sa.func.now()
baseline_user_id = row.get("changed_by_fk") or row.get("created_by_fk")
tx_table = versioning_manager.transaction_cls.__table__
result = conn.execute(
tx_table.insert().values(
issued_at=baseline_issued_at,
user_id=baseline_user_id,
remote_addr=None,
)
)
tx_id = result.inserted_primary_key[0]
# Build version row using Column objects as keys to avoid name/key mismatches
# (string-based values(**dict) raises "Unconsumed column names" when a Column's
# .key differs from its .name, which can happen with Continuum-generated tables).
meta_col_names = {"transaction_id", "end_transaction_id", "operation_type"}
col_values: dict[Any, Any] = {}
for col in version_table.columns:
if col.name in meta_col_names:
continue
if col.name in row:
col_values[col] = row[col.name]
col_values[version_table.c.transaction_id] = tx_id
col_values[version_table.c.end_transaction_id] = None
col_values[version_table.c.operation_type] = 0
conn.execute(version_table.insert().values(col_values))
return tx_id
def _baseline_children_for_parent(
session: Session, parent_obj: Any, tx_id: int
) -> None:
"""Baseline a parent's child collections under the parent's baseline tx.
Dispatches via :data:`_CHILD_BASELINE_HANDLERS` to per-entity handlers.
A handler failure is logged but does not block the parent baseline.
"""
parent_name = type(parent_obj).__name__
handler = _CHILD_BASELINE_HANDLERS.get(parent_name)
if handler is None:
return
try:
handler(session, parent_obj, tx_id)
except Exception: # pylint: disable=broad-except
logger.exception(
"baseline_listener: failed to baseline children of %s id=%s",
parent_name,
getattr(parent_obj, "id", None),
)
# ---------------------------------------------------------------------------
# Per-entity child handlers
# ---------------------------------------------------------------------------
def _baseline_dataset_children(session: Session, dataset: Any, tx_id: int) -> None:
"""Baseline a dataset's ``TableColumn`` and ``SqlMetric`` children
under the dataset's baseline tx.
"""
# pylint: disable=import-outside-toplevel
from sqlalchemy_continuum import version_class
from superset.connectors.sqla.models import SqlMetric, TableColumn
for child_cls in (TableColumn, SqlMetric):
_insert_child_baseline_rows(
session,
dataset,
child_cls.__table__,
version_class(child_cls).__table__,
"table_id",
tx_id,
)
def _baseline_dashboard_children(session: Session, dashboard: Any, tx_id: int) -> None:
"""Baseline a dashboard's ``dashboard_slices`` M2M plus synthesize
``operation_type=0`` rows in ``slices_version`` for attached slices
with no prior shadow.
Continuum's M2M version-side relationship for ``Dashboard.slices``
joins through both ``dashboard_slices_version`` AND
``slices_version``: the second exists clause filters slices by
"latest slices_version row with tx <= dashboard.tx". If a slice
has no slices_version rows at all, that join produces no match
and ``version_obj.slices`` returns empty — leaving the dashboard
restore with no slices to append. The synthetic slice baseline at
this dashboard's tx gives the M2M query a slice version it can match.
Doesn't try to be clever about slices shared across dashboards: a
slice is baselined at this dashboard's tx_id only when it has no
shadow rows at all. If a later dashboard baseline references the
same slice, this baseline (now at lower tx) is still found by
that dashboard's restore. The reverse — a dashboard baselined
AFTER the slice was first baselined under another dashboard at
a higher tx — is a residual gap deferred to a future fix.
"""
metadata = type(dashboard).__table__.metadata
live_tbl = metadata.tables.get("dashboard_slices")
shadow_tbl = metadata.tables.get("dashboard_slices_version")
if live_tbl is None or shadow_tbl is None:
return
_insert_child_baseline_rows(
session, dashboard, live_tbl, shadow_tbl, "dashboard_id", tx_id
)
_baseline_attached_slices(session, dashboard, live_tbl, tx_id)
# Dispatch table keyed by parent CLASS NAME rather than class, to avoid
# the import-cycle between baseline.py (loaded at app init) and the
# entity modules. The class-name string is set once at app start by
# the model definitions — typo-prone if extended. Declared after the
# handlers it references because module-level dict literals evaluate
# at import time and need the names already bound.
_ChildBaselineHandler = Callable[[Session, Any, int], None]
_CHILD_BASELINE_HANDLERS: dict[str, _ChildBaselineHandler] = {
"SqlaTable": _baseline_dataset_children,
"Dashboard": _baseline_dashboard_children,
}
# ---------------------------------------------------------------------------
# Leaf builders: child-row insert and synthetic slice baseline
# ---------------------------------------------------------------------------
def _insert_child_baseline_rows(
session: Session,
parent_obj: Any,
child_table: sa.Table,
child_version_table: sa.Table,
fk_column_name: str,
tx_id: int,
) -> None:
"""Synthesize ``operation_type=0`` shadow rows for every live child of
*parent_obj* under transaction id *tx_id*.
Parallels :func:`_insert_baseline_row` but iterates over child rows. Used
to give Continuum's ``Reverter`` baseline data for children of pre-existing
parents (children that predate this commit have no shadow rows otherwise,
so Reverter would treat them as "deleted at the target tx" and try to
remove them on revert — the ADR-004 Failure 1 reproduction scenario).
:param child_table: the live child SQLAlchemy ``Table`` (e.g.
``TableColumn.__table__`` or the bare ``dashboard_slices`` association)
:param child_version_table: the corresponding Continuum shadow ``Table``
:param fk_column_name: column on *child_table* that points to the parent
(e.g. ``"table_id"`` for ``TableColumn``, ``"dashboard_id"`` for
``dashboard_slices``)
"""
conn = session.connection()
fk_col = getattr(child_table.c, fk_column_name)
rows = (
conn.execute(sa.select(child_table).where(fk_col == parent_obj.id))
.mappings()
.all()
)
if not rows:
return
meta_col_names = {"transaction_id", "end_transaction_id", "operation_type"}
for row in rows:
col_values: dict[Any, Any] = {}
for col in child_version_table.columns:
if col.name in meta_col_names:
continue
if col.name in row:
col_values[col] = row[col.name]
col_values[child_version_table.c.transaction_id] = tx_id
col_values[child_version_table.c.end_transaction_id] = None
col_values[child_version_table.c.operation_type] = 0
conn.execute(child_version_table.insert().values(col_values))
def _baseline_attached_slices(
session: Session, dashboard: Any, live_tbl: sa.Table, tx_id: int
) -> None:
"""Insert ``operation_type=0`` rows in ``slices_version`` for each
slice attached to *dashboard* that has no shadow row yet.
Batched: one membership SELECT, one existing-shadow SELECT, one live
SELECT for the missing slices. Per-slice work happens only on
``_insert_synthetic_slice_baseline``. The previous per-slice
``COUNT(*)`` + ``SELECT`` pattern was O(N) round-trips and surfaced
as a measurable first-save hotspot on dashboards with many charts.
"""
# pylint: disable=import-outside-toplevel
from sqlalchemy_continuum import version_class
from superset.models.slice import Slice
slice_ver_table = version_class(Slice).__table__
slice_table = Slice.__table__
conn = session.connection()
attached_slice_ids = [
r.slice_id
for r in conn.execute(
sa.select(live_tbl.c.slice_id).where(
live_tbl.c.dashboard_id == dashboard.id
)
).all()
]
if not attached_slice_ids:
return
existing_shadow_ids = {
row[0]
for row in conn.execute(
sa.select(slice_ver_table.c.id.distinct()).where(
slice_ver_table.c.id.in_(attached_slice_ids)
)
).all()
}
missing_ids = [sid for sid in attached_slice_ids if sid not in existing_shadow_ids]
if not missing_ids:
return
slice_rows = (
conn.execute(sa.select(slice_table).where(slice_table.c.id.in_(missing_ids)))
.mappings()
.all()
)
for slice_row in slice_rows:
_insert_synthetic_slice_baseline(conn, slice_ver_table, slice_row, tx_id)
def _insert_synthetic_slice_baseline(
conn: Any, slice_ver_table: sa.Table, slice_row: Any, tx_id: int
) -> None:
meta_col_names = {"transaction_id", "end_transaction_id", "operation_type"}
col_values: dict[Any, Any] = {}
for col in slice_ver_table.columns:
if col.name in meta_col_names:
continue
if col.name in slice_row:
col_values[col] = slice_row[col.name]
col_values[slice_ver_table.c.transaction_id] = tx_id
col_values[slice_ver_table.c.end_transaction_id] = None
col_values[slice_ver_table.c.operation_type] = 0
conn.execute(slice_ver_table.insert().values(col_values))

View File

@@ -0,0 +1,809 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Capture listener for ``version_changes`` (T048).
Two session events cooperate:
- ``before_flush``: for each versioned entity in ``session.dirty``,
reads the pre-save scalar state from the DB via raw SQL inside
``session.no_autoflush`` (same idiom as the baseline listener, not
Continuum's internal ``units_of_work`` which is a private API), reads
the post-save state from the in-memory ORM object, calls the diff
engine, and buffers the resulting :class:`ChangeRecord` list on
``session.info``. This must run before the flush because after the
flush the DB already reflects the post-state; we can't recover the
pre-state from it.
- ``after_flush``: drains the buffer, resolves the current Continuum
transaction id via ``versioning_manager.units_of_work``, and bulk-
inserts one ``version_changes`` row per record with a monotonic
``sequence`` number. Records accumulated across multiple before_flush
calls within one transaction share the same ``transaction_id`` and
contiguous sequence numbers.
Scope in this iteration:
- Slice, Dashboard, SqlaTable **scalar fields** (via
:func:`scalar_fields_for` — new columns are picked up automatically
without editing this module).
- ``Slice.params`` kind-classification (filter / metric / time_range /
color_palette / dimension, plus generic ``field`` fallback).
Child-collection diffs (dataset ``TableColumn`` / ``SqlMetric``,
dashboard ``dashboard_slices``) read the pre- and post-state from
Continuum shadow tables via :func:`_shadow_rows_valid_at`, executed in
``after_flush`` once Continuum has written its tx-N rows.
``session.new`` entities are not processed in this listener:
operation_type=0 transactions (baseline capture and first-save INSERTs)
produce zero change records per spec §Clarifications 2026-04-24.
**Inline imports.** Several helpers below use ``# pylint: disable=
import-outside-toplevel`` for imports of ``sqlalchemy_continuum`` and
Superset model classes. The reason is uniform with ``baseline.py``:
this module is imported from ``init_versioning()`` before all SQLAlchemy
mappers are configured and before Continuum's ``make_versioned()`` has
finished wiring shadow classes. Top-level imports would either trip an
unresolved-mapper error or create an init-order cycle. The lazy form
defers resolution until the helper runs. Unusual cases (if any are
added) should be commented explicitly.
"""
from __future__ import annotations
import logging
from datetime import date, datetime
from decimal import Decimal
from typing import Any, Optional
from uuid import UUID
import sqlalchemy as sa
from flask_appbuilder import Model
from sqlalchemy import event
from sqlalchemy.exc import OperationalError
from sqlalchemy.orm import Session
from superset.versioning.diff import (
ChangeRecord,
diff_dashboard,
diff_dashboard_slices,
diff_dataset,
diff_dataset_columns,
diff_dataset_metrics,
diff_slice,
fold_dashboard_layout_with_chart_changes,
scalar_fields_for,
)
from superset.versioning.utils import read_row_outside_flush
logger = logging.getLogger(__name__)
# Declared against the shared Model.metadata so integration tests that
# build schema via ``metadata.create_all()`` pick it up without the
# Alembic migration running. Mirrors the shape of the T046 migration
# (``e1f3c5a7b9d0_add_version_changes_table``) byte-for-byte. Typed
# columns (``sa.JSON`` for path / values) are required so the
# connection's bulk-insert path marshals Python lists/dicts into JSON
# — a lightweight ``sa.table(...)`` would not carry the type info and
# SQLite's driver would reject the ``list`` as an unsupported bind.
_metadata = Model.metadata # pylint: disable=no-member
version_changes_table = sa.Table(
"version_changes",
_metadata,
sa.Column("id", sa.BigInteger, primary_key=True, autoincrement=True),
# ``transaction_id`` references ``version_transaction.id`` at the DB
# level only — the FK + ON DELETE CASCADE live in the Alembic
# migration. Declaring the FK here would fail to resolve at Table
# creation time because ``version_transaction`` is built
# dynamically by SQLAlchemy-Continuum at mapper-configuration time;
# integration tests that materialise schema via ``metadata.create_all``
# before Continuum runs would hit ``NoReferencedTableError``. Same
# pattern as the other versioning tables.
sa.Column("transaction_id", sa.BigInteger, nullable=False),
sa.Column("entity_kind", sa.String(32), nullable=False),
sa.Column("entity_id", sa.Integer, nullable=False),
sa.Column("sequence", sa.SmallInteger, nullable=False),
sa.Column("kind", sa.String(32), nullable=False),
sa.Column("path", sa.JSON, nullable=False),
sa.Column("from_value", sa.JSON, nullable=True),
sa.Column("to_value", sa.JSON, nullable=True),
sa.UniqueConstraint(
"transaction_id",
"entity_kind",
"entity_id",
"sequence",
name="uq_version_changes_tx_entity_sequence",
),
sa.Index("ix_version_changes_kind", "kind"),
sa.Index("ix_version_changes_transaction_id", "transaction_id"),
sa.Index("ix_version_changes_entity", "entity_kind", "entity_id"),
extend_existing=True,
)
# Mapping from Python class name to the ``entity_kind`` value written
# to ``version_changes.entity_kind``. The API filters change records
# by this value (``WHERE entity_kind = 'chart'`` for the chart history
# endpoint, etc.) — kept short and user-facing-ish so downstream tools
# consuming the raw table read sensibly.
_ENTITY_KIND_BY_CLASS_NAME: dict[str, str] = {
"Slice": "chart",
"Dashboard": "dashboard",
"SqlaTable": "dataset",
}
# Key under which the pending-records buffer is stored on ``session.info``.
# Using ``session.info`` (SQLAlchemy's user-data dict) avoids the need
# for a module-level WeakKeyDictionary and keeps buffers naturally scoped
# to the session's lifetime.
_BUFFER_KEY = "_version_changes_pending"
# Key for the set of Continuum transaction ids whose change records
# have already been written in this session. ``after_flush`` can fire
# more than once for a single transaction (e.g. autoflush triggered by
# a mid-commit query), and our child-diff path reads snapshot tables
# that don't care about the buffer state — without this marker we'd
# re-insert the same child records on the second flush and hit the
# UNIQUE(transaction_id, entity_kind, entity_id, sequence) constraint.
_PROCESSED_TXS_KEY = "_version_changes_processed_txs"
# Per-model-class cache of the scalar-field set. Populated lazily on
# first save of a model. Reading from ``__table__.columns`` is cheap
# but not free; memoising keeps the save-path overhead budget (FR-021)
# from slowly growing with the set of distinct model classes seen.
_SCALAR_FIELDS_CACHE: dict[type, frozenset[str]] = {}
def _cached_scalar_fields(model_cls: type) -> frozenset[str]:
"""Cached wrapper around :func:`scalar_fields_for`."""
if model_cls not in _SCALAR_FIELDS_CACHE:
# ``Slice.params`` is walked by ``diff_slice_params`` for kind
# promotion; emitting it as one opaque ``field`` change would
# defeat that and flood the log with meaningless records.
# ``last_saved_at`` / ``last_saved_by_fk`` are stamped by
# ``UpdateChartCommand`` on every chart save; they're audit
# noise (same shape as ``changed_on`` / ``changed_by_fk``) and
# don't carry user-authored signal.
# ``Dashboard.json_metadata`` and ``position_json`` are JSON
# blobs walked structurally by ``diff_json_field`` (one record
# per changed top-level key); the raw scalar diff would emit
# one giant multi-KB record per save and swamp the response.
special: frozenset[str] = frozenset()
audit: frozenset[str] = frozenset()
if model_cls.__name__ == "Slice":
special = frozenset({"params"})
audit = frozenset({"last_saved_at", "last_saved_by_fk"})
elif model_cls.__name__ == "Dashboard":
special = frozenset({"json_metadata", "position_json"})
_SCALAR_FIELDS_CACHE[model_cls] = scalar_fields_for(
model_cls, special=special, audit=audit
)
return _SCALAR_FIELDS_CACHE[model_cls]
def _jsonable(value: Any) -> Any:
"""Convert a column value into a JSON-serialisable form.
Slice has ``last_saved_at`` (datetime), datasets have datetime
columns, and any of these fields can land in ``from_value`` /
``to_value`` of a ``version_changes`` row, which is a JSON column.
Python's default JSON encoder rejects ``datetime`` / ``UUID`` /
``bytes`` / ``Decimal``, so the whole bulk insert fails if a single
record carries one. Convert to ISO / hex / str at record-construction
time.
"""
if isinstance(value, (datetime, date)):
return value.isoformat()
if isinstance(value, UUID):
return str(value)
if isinstance(value, bytes):
return value.hex()
if isinstance(value, Decimal):
# Stringify rather than ``float()`` to preserve precision; the
# diff engine compares string equality on ``from_value`` /
# ``to_value``, so coercing both sides to the same form is what
# matters.
return str(value)
return value
def _orm_to_post_state(obj: Any) -> dict[str, Any]:
"""Serialise an ORM object's column attributes to a plain dict.
We only read declared column attributes — not relationships or
hybrid properties — because the diff engine operates on scalar
values per its documented API. Values are passed through
:func:`_jsonable` so the dict is JSON-safe end-to-end.
"""
state = sa.inspect(obj)
return {
col.key: _jsonable(getattr(obj, col.key)) for col in state.mapper.column_attrs
}
def _read_pre_state(
session: Session, model_cls: type, entity_id: int
) -> dict[str, Any] | None:
"""Read the entity's pre-flush row directly from the DB and convert
non-JSON-safe types to strings so both sides of the diff compare on
the same form. Delegates the autoflush-suppressed read itself to
:func:`superset.versioning.utils.read_row_outside_flush`.
Returns ``None`` if the row is missing (shouldn't happen for a dirty
existing object, but defensive against race conditions).
"""
table = model_cls.__table__ # type: ignore[attr-defined]
result = read_row_outside_flush(session, table, entity_id)
if result is None:
return None
# Convert non-JSON-safe types (datetime, UUID, bytes, Decimal) to
# strings so both sides of the diff compare on the same form and
# any value that ends up in ``from_value`` / ``to_value`` is
# acceptable to the JSON column on insert.
return {key: _jsonable(value) for key, value in result.items()}
def _compute_records_for_entity(session: Session, obj: Any) -> list[ChangeRecord]:
"""Diff the pre-state (from DB) against the post-state (in memory).
Dispatches to :func:`diff_slice` / :func:`diff_dashboard` /
:func:`diff_dataset` based on the model class name — string-based
dispatch is used to keep this module free of hard imports on the
three entity classes, which in turn avoids import-order coupling
at app-init time.
"""
model_cls = type(obj)
entity_id = getattr(obj, "id", None)
if entity_id is None:
return []
try:
pre_state = _read_pre_state(session, model_cls, entity_id)
except Exception: # pylint: disable=broad-except
logger.exception(
"version_changes: pre-state read failed for %s id=%s",
model_cls.__name__,
entity_id,
)
return []
if pre_state is None:
return []
post_state = _orm_to_post_state(obj)
fields = _cached_scalar_fields(model_cls)
name = model_cls.__name__
if name == "Slice":
return diff_slice(pre_state, post_state, fields=fields)
if name == "Dashboard":
return diff_dashboard(pre_state, post_state, fields=fields)
if name == "SqlaTable":
return diff_dataset(pre_state, post_state, fields=fields)
return []
def _bulk_insert_records(
session: Session,
transaction_id: int,
buffered: dict[tuple[str, int], list[ChangeRecord]],
) -> None:
"""Insert ``version_changes`` rows for one transaction via raw SQL.
Uses the module-level :data:`version_changes_table` Table object
(which carries JSON column types, unlike ``sa.table(...)``) so the
connection marshals ``path`` / ``from_value`` / ``to_value`` Python
structures into JSON on insert. Skips the ORM flush round that
``session.bulk_insert_mappings`` would cost inside an already-
active flush.
``buffered`` is a dict keyed on ``(entity_kind, entity_id)`` so
records for one entity — scalars from ``before_flush`` plus
children collected in ``after_flush`` — merge naturally under the
same key. ``sequence`` resets per entity so each entity's records
form a self-contained replay sequence.
"""
if not buffered:
return
rows = []
for (entity_kind, entity_id), records in buffered.items():
for seq, r in enumerate(records):
rows.append(
{
"transaction_id": transaction_id,
"entity_kind": entity_kind,
"entity_id": entity_id,
"sequence": seq,
"kind": r.kind,
"path": r.path,
"from_value": r.from_value,
"to_value": r.to_value,
}
)
if rows:
session.connection().execute(version_changes_table.insert(), rows)
def _shadow_rows_valid_at(
session: Session,
shadow_table: sa.Table,
fk_col_name: str,
fk_value: int,
tx: int,
) -> list[dict[str, Any]]:
"""Return the live state of *shadow_table* rows whose FK column
(``fk_col_name``) equals *fk_value*, as of transaction *tx*.
Uses Continuum's validity-strategy semantics: a row is "valid at tx"
when ``transaction_id <= tx`` AND (``end_transaction_id`` IS NULL OR
``end_transaction_id`` > tx) AND it isn't a DELETE shadow.
The returned dicts mirror the live row's column set (no Continuum
bookkeeping columns), so they can be passed straight to the
natural-key diff helpers (``diff_dataset_columns`` etc.).
"""
fk_col = getattr(shadow_table.c, fk_col_name)
rows = (
session.connection()
.execute(
sa.select(shadow_table).where(
fk_col == fk_value,
shadow_table.c.transaction_id <= tx,
sa.or_(
shadow_table.c.end_transaction_id.is_(None),
shadow_table.c.end_transaction_id > tx,
),
shadow_table.c.operation_type != 2,
)
)
.mappings()
.all()
)
# Coerce values to JSON-safe forms — raw shadow rows can carry
# ``UUID``, ``datetime``, ``bytes`` etc. that don't survive the
# ``version_changes.from_value/to_value`` JSON column write.
meta_cols = {"transaction_id", "end_transaction_id", "operation_type"}
return [
{k: _jsonable(v) for k, v in dict(row).items() if k not in meta_cols}
for row in rows
]
def _affected_dataset_ids_at_tx(session: Session, tx: int) -> set[int]:
"""Datasets touched at *tx* — directly (parent shadow at tx) or
indirectly (column / metric shadow at tx)."""
# pylint: disable=import-outside-toplevel
from sqlalchemy_continuum import version_class
from superset.connectors.sqla.models import SqlaTable, SqlMetric, TableColumn
dataset_ids: set[int] = set()
parent_tbl = version_class(SqlaTable).__table__
for row in session.connection().execute(
sa.select(parent_tbl.c.id).where(parent_tbl.c.transaction_id == tx)
):
dataset_ids.add(row[0])
for child_cls in (TableColumn, SqlMetric):
child_tbl = version_class(child_cls).__table__
for row in session.connection().execute(
sa.select(child_tbl.c.table_id).where(child_tbl.c.transaction_id == tx)
):
if row[0] is not None:
dataset_ids.add(row[0])
return dataset_ids
def _dataset_child_records_for_tx_from_shadows(
session: Session, transaction_id: int
) -> dict[int, list[ChangeRecord]]:
"""Compute column + metric diff records for each dataset touched at
*transaction_id*, reading from Continuum shadow tables.
For each dataset:
* Post-state = rows valid at ``transaction_id`` in
``table_columns_version`` / ``sql_metrics_version``.
* Pre-state = rows valid at ``transaction_id - 1`` in the same
shadow tables.
With Continuum's validity-strategy semantics, "valid at tx N - 1"
is the state immediately before this transaction's effects (the
row that gets superseded at tx=N has ``end_transaction_id=N``, so
it satisfies ``end > N - 1``). Unrelated transactions between this
dataset's edits are transparent — they don't change validity for
this dataset's children.
First-edit case: when there is no prior tx (the dataset's earliest
shadow IS at *transaction_id*), pre-state is empty. We skip rather
than emit "Added X" for every column — same "baseline = zero
records" semantics as the snapshot path.
"""
# pylint: disable=import-outside-toplevel
from sqlalchemy_continuum import version_class
from superset.connectors.sqla.models import SqlMetric, TableColumn
cols_tbl = version_class(TableColumn).__table__
metrics_tbl = version_class(SqlMetric).__table__
result: dict[int, list[ChangeRecord]] = {}
for dataset_id in _affected_dataset_ids_at_tx(session, transaction_id):
# Skip the very first transaction for this dataset (no pre-state).
prior_tx = (
session.connection()
.execute(
sa.select(sa.func.max(cols_tbl.c.transaction_id)).where(
cols_tbl.c.table_id == dataset_id,
cols_tbl.c.transaction_id < transaction_id,
)
)
.scalar()
)
if prior_tx is None:
# No prior column shadow — could still be a metric-only edit;
# check metrics shadow too.
prior_tx = (
session.connection()
.execute(
sa.select(sa.func.max(metrics_tbl.c.transaction_id)).where(
metrics_tbl.c.table_id == dataset_id,
metrics_tbl.c.transaction_id < transaction_id,
)
)
.scalar()
)
if prior_tx is None:
continue
post_cols = _shadow_rows_valid_at(
session, cols_tbl, "table_id", dataset_id, transaction_id
)
pre_cols = _shadow_rows_valid_at(
session, cols_tbl, "table_id", dataset_id, prior_tx
)
post_metrics = _shadow_rows_valid_at(
session, metrics_tbl, "table_id", dataset_id, transaction_id
)
pre_metrics = _shadow_rows_valid_at(
session, metrics_tbl, "table_id", dataset_id, prior_tx
)
records: list[ChangeRecord] = []
records.extend(diff_dataset_columns(pre_cols, post_cols))
records.extend(diff_dataset_metrics(pre_metrics, post_metrics))
if records:
result[dataset_id] = records
return result
def _affected_dashboard_ids_at_tx(session: Session, tx: int) -> set[int]:
"""Dashboards touched at *tx* — directly (parent shadow at tx) or
indirectly (slice-membership shadow at tx)."""
# pylint: disable=import-outside-toplevel
from sqlalchemy_continuum import version_class
from superset.models.dashboard import Dashboard
dashboard_ids: set[int] = set()
parent_tbl = version_class(Dashboard).__table__
for row in session.connection().execute(
sa.select(parent_tbl.c.id).where(parent_tbl.c.transaction_id == tx)
):
dashboard_ids.add(row[0])
# M2M shadow: ``dashboard_slices_version`` is auto-generated by
# Continuum and lives in metadata — not a model class. Look it up
# from the metadata bag rather than via ``version_class``.
metadata = parent_tbl.metadata
if (m2m_tbl := metadata.tables.get("dashboard_slices_version")) is not None:
for row in session.connection().execute(
sa.select(m2m_tbl.c.dashboard_id).where(m2m_tbl.c.transaction_id == tx)
):
if row[0] is not None:
dashboard_ids.add(row[0])
return dashboard_ids
def _dashboard_slice_uuids_at_tx(
session: Session, dashboard_id: int, tx: int
) -> list[str]:
"""Slice UUIDs attached to *dashboard_id* as of *tx*, read by joining
``dashboard_slices_version`` (M2M membership) against
``slices_version`` (slice content).
Joining through both is necessary — and matches the same query
Continuum's M2M ``Reverter`` uses — because a slice that's
referenced by the M2M but has no slice-version row at this tx is
treated as "not yet versioned" and excluded.
Returns UUIDs (strings) so the result can be diffed by the existing
:func:`diff_dashboard_slices` helper, which keys on uuid.
"""
# pylint: disable=import-outside-toplevel
from sqlalchemy_continuum import version_class
from superset.models.slice import Slice
metadata = version_class(Slice).__table__.metadata
m2m_tbl = metadata.tables.get("dashboard_slices_version")
slices_tbl = version_class(Slice).__table__
if m2m_tbl is None:
return []
rows = (
session.connection()
.execute(
sa.select(slices_tbl.c.uuid).where(
slices_tbl.c.id == m2m_tbl.c.slice_id,
m2m_tbl.c.dashboard_id == dashboard_id,
m2m_tbl.c.transaction_id <= tx,
sa.or_(
m2m_tbl.c.end_transaction_id.is_(None),
m2m_tbl.c.end_transaction_id > tx,
),
m2m_tbl.c.operation_type != 2,
slices_tbl.c.transaction_id <= tx,
sa.or_(
slices_tbl.c.end_transaction_id.is_(None),
slices_tbl.c.end_transaction_id > tx,
),
slices_tbl.c.operation_type != 2,
)
)
.all()
)
return [str(r[0]) for r in rows if r[0] is not None]
def _dashboard_child_records_for_tx_from_shadows(
session: Session, transaction_id: int
) -> dict[int, list[ChangeRecord]]:
"""Compute slice-membership diff records for each dashboard touched
at *transaction_id*, reading from Continuum shadow tables.
Same pre/post logic as
:func:`_dataset_child_records_for_tx_from_shadows`.
"""
# pylint: disable=import-outside-toplevel
from sqlalchemy_continuum import version_class
from superset.models.dashboard import Dashboard
metadata = version_class(Dashboard).__table__.metadata
m2m_tbl = metadata.tables.get("dashboard_slices_version")
result: dict[int, list[ChangeRecord]] = {}
for dashboard_id in _affected_dashboard_ids_at_tx(session, transaction_id):
prior_tx = None
if m2m_tbl is not None:
prior_tx = (
session.connection()
.execute(
sa.select(sa.func.max(m2m_tbl.c.transaction_id)).where(
m2m_tbl.c.dashboard_id == dashboard_id,
m2m_tbl.c.transaction_id < transaction_id,
)
)
.scalar()
)
if prior_tx is None:
continue
post_uuids = _dashboard_slice_uuids_at_tx(session, dashboard_id, transaction_id)
pre_uuids = _dashboard_slice_uuids_at_tx(session, dashboard_id, prior_tx)
records = diff_dashboard_slices(pre_uuids, post_uuids)
if records:
result[dashboard_id] = records
return result
# Sentinel attribute set on the session target after first successful
# registration. Subsequent calls become no-ops. Storing the flag on the
# target itself (rather than module-level state) keeps the guard
# naturally scoped — a fresh session proxy gets a fresh registration —
# and avoids the TOCTOU race between ``event.contains`` and
# ``event.listen`` that a module-level ref would have under concurrent
# init. In test fixtures that instantiate multiple Superset apps per
# process, the shared ``db.session`` carries the sentinel and re-entry
# is correctly deduped.
_REGISTERED_SENTINEL = "_versioning_change_listener_registered"
def _process_dirty_entity_into_buffer(
session: Session,
obj: Any,
buffer: dict[tuple[str, int], list[ChangeRecord]],
) -> None:
"""Compute scalar change records for one dirty entity + append to buffer."""
entity_kind = _ENTITY_KIND_BY_CLASS_NAME.get(type(obj).__name__)
if entity_kind is None:
return
entity_id = getattr(obj, "id", None)
if entity_id is None:
return
try:
records = _compute_records_for_entity(session, obj)
except Exception: # pylint: disable=broad-except
logger.exception(
"version_changes: diff failed for %s id=%s",
type(obj).__name__,
entity_id,
)
return
if records:
buffer.setdefault((entity_kind, entity_id), []).extend(records)
def _append_child_records_to_buffer(
session: Session,
tx_id: int,
buffer: dict[tuple[str, int], list[ChangeRecord]],
) -> None:
"""Compute dataset + dashboard child-collection records + append to buffer.
Runs in ``after_flush`` so the shadow tables already have the
current-tx rows. Reads from Continuum shadow tables
(``table_columns_version`` / ``sql_metrics_version`` /
``dashboard_slices_version`` / ``slices_version``).
"""
try:
for dataset_id, records in _dataset_child_records_for_tx_from_shadows(
session, tx_id
).items():
buffer.setdefault(("dataset", dataset_id), []).extend(records)
for dashboard_id, records in (
_dashboard_child_records_for_tx_from_shadows(session, tx_id)
).items():
buffer.setdefault(("dashboard", dashboard_id), []).extend(records)
# Post-merge fold: when a dashboard save adds/removes charts,
# drop the redundant ``position_json.*`` records that mirror
# the membership change. See
# ``diff.fold_dashboard_layout_with_chart_changes``.
for key in list(buffer.keys()):
if key[0] == "dashboard":
buffer[key] = fold_dashboard_layout_with_chart_changes(buffer[key])
if not buffer[key]:
del buffer[key]
except Exception: # pylint: disable=broad-except
logger.exception("version_changes: child-diff failed for tx %s", tx_id)
def _current_transaction_id(session: Session) -> Optional[int]:
"""Return the Continuum transaction id for *session*'s current unit of
work, or ``None`` when Continuum has no active transaction (e.g. raw
SQL execution outside the ORM's flush flow).
"""
# pylint: disable=import-outside-toplevel
from sqlalchemy_continuum import versioning_manager
uow = versioning_manager.units_of_work.get(session.connection())
if uow is None or uow.current_transaction is None:
return None
return uow.current_transaction.id
def _persist_buffered_records(
session: Session,
tx_id: int,
buffer: dict[tuple[str, int], list[ChangeRecord]],
) -> None:
"""Bulk-insert *buffer*'s records under *tx_id* and reset the buffer.
Catches ``OperationalError`` to handle the pre-migration startup race
(version_changes table missing), and ``Exception`` as the listener-
boundary safety net so a malformed record can't crash the user's save.
"""
try:
_bulk_insert_records(session, tx_id, buffer)
except OperationalError:
# version_changes table missing (migration not yet applied).
pass
except Exception: # pylint: disable=broad-except
logger.exception(
"version_changes: bulk insert failed for tx %s (%d entities)",
tx_id,
len(buffer),
)
def register_change_record_listener() -> None:
"""Attach the before_flush + after_flush listeners.
Registered from :class:`superset.initialization.SupersetAppInitializer`
(``init_versioning``) alongside the baseline, dataset-snapshot,
and dashboard-snapshot listeners. Must run after Continuum's
``make_versioned()`` so the ``versioning_manager`` is available
and has installed its own before_flush hook.
"""
# pylint: disable=import-outside-toplevel
from superset.connectors.sqla.models import SqlaTable
from superset.extensions import db
from superset.models.dashboard import Dashboard
from superset.models.slice import Slice
if getattr(db.session, _REGISTERED_SENTINEL, False):
return
versioned_classes: tuple[type, ...] = (Dashboard, Slice, SqlaTable)
def compute_change_records(
session: Session, _flush_context: Any, _instances: Any
) -> None:
# session.info persists across before_flush/after_flush within
# a single transaction. The buffer is keyed on
# ``(entity_kind, entity_id)`` so scalar records captured here
# and child records captured in after_flush (T048b) merge
# under the same entity without duplication.
buffer: dict[tuple[str, int], list[ChangeRecord]] = session.info.setdefault(
_BUFFER_KEY, {}
)
for obj in list(session.dirty):
if isinstance(obj, versioned_classes):
_process_dirty_entity_into_buffer(session, obj, buffer)
def flush_change_records(session: Session, _flush_context: Any) -> None:
buffer: dict[tuple[str, int], list[ChangeRecord]] = session.info.setdefault(
_BUFFER_KEY, {}
)
tx_id = _current_transaction_id(session)
if tx_id is None:
session.info[_BUFFER_KEY] = {}
return
# Skip if we've already written records for this tx (after_flush
# can fire more than once per commit — e.g. autoflush from a
# mid-commit query). Without this guard the child-diff path would
# re-read the same shadow rows and re-emit the same records,
# tripping the UNIQUE(transaction_id, entity_kind, entity_id,
# sequence) constraint on insert.
processed: set[int] = session.info.setdefault(_PROCESSED_TXS_KEY, set())
if tx_id in processed:
return
_append_child_records_to_buffer(session, tx_id, buffer)
if not buffer:
# Don't mark tx as processed when nothing was inserted. A
# later after_flush firing for the same tx may carry the
# records — e.g. when an entity's edit lands across two
# flushes (a child-only flush followed by a parent-dirty
# flush): the parent shadow only lands in the parent-dirty
# flush, so the child-diff path can't find a prior tx to
# compare against until then.
session.info[_BUFFER_KEY] = {}
return
try:
_persist_buffered_records(session, tx_id, buffer)
finally:
session.info[_BUFFER_KEY] = {}
processed.add(tx_id)
def reset_processed_after_commit(session: Session) -> None:
# ``_PROCESSED_TXS_KEY`` accumulates Continuum tx ids whose change
# records have already been written, to dedup against multiple
# ``after_flush`` firings within one transaction. After commit
# the tx is closed and its id will never recur on this session
# — drop the set so a long-lived session (Celery worker, CLI)
# doesn't grow it without bound.
session.info.pop(_PROCESSED_TXS_KEY, None)
event.listen(db.session, "before_flush", compute_change_records)
event.listen(db.session, "after_flush", flush_change_records)
event.listen(db.session, "after_commit", reset_processed_after_commit)
setattr(db.session, _REGISTERED_SENTINEL, True)

884
superset/versioning/diff.py Normal file
View File

@@ -0,0 +1,884 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Diff engine for the ``version_changes`` table (FR-016..FR-019).
Hand-rolled because:
- The on-disk ``path`` shape (array of segments) is a direct
representation of our chosen format; external diff libraries
return string paths or JSON-Pointer forms that would need
translation.
- Kind classification (``filter`` vs ``metric`` vs ``field`` etc.)
is co-located with diff walking, avoiding a second classification
pass over the generic diff output.
- Child-collection identity uses natural keys (``column_name``,
``metric_name``, slice ``uuid``) — the same identity model
``DatasetDAO.update_columns`` settled on (ADR-004). External
libraries default to list-index matching, which is wrong for our
data.
See ADR (plan.md §"Key Design Decision: Hand-rolled diff engine") for
the full rationale.
All functions in this module are pure: they take dicts (or lists of
dicts) and return a list of :class:`ChangeRecord`. The ORM->dict
conversion and Continuum transaction lookup happen in the capture
listener (T048), not here. This keeps the engine unit-testable without
an app context or DB.
"""
from __future__ import annotations
from collections.abc import Iterable
from dataclasses import dataclass
from typing import Any, Callable, Optional
from superset.utils import json as _json
# Columns that are always excluded from change records, regardless of
# what ``__versioned__`` says. ``id`` / ``uuid`` are stable identifiers
# (not edited in normal flows). The four audit fields change on every
# save — emitting records for them would double every history entry
# with meaningless "timestamp changed, user stamped" rows that the UI
# would have to filter out anyway.
_AUDIT_FIELDS: frozenset[str] = frozenset(
{
"id",
"uuid",
"created_on",
"changed_on",
"created_by_fk",
"changed_by_fk",
}
)
# Fields stripped from child-collection dict items (TableColumn,
# SqlMetric) before comparison and emission. ``changed_on`` /
# ``created_on`` / ``*_by_fk`` are audit fields that update on every
# save of the parent — without this filter, saving a dataset to add
# one column produces a record per existing column too (because their
# ``changed_on`` timestamps all refreshed). ``id`` and ``table_id``
# are implementation details — ``id`` can change under the
# ``override_columns`` delete-and-reinsert pattern (ADR-004) even
# when the column is semantically unchanged; ``table_id`` is the
# parent FK and never meaningfully differs within one dataset's
# history. ``uuid`` stays stable across normal saves and is kept so
# the renderer can use it for identity if it needs to.
_CHILD_ITEM_OPAQUE_FIELDS: frozenset[str] = frozenset(
{
"id",
"table_id",
"changed_on",
"created_on",
"changed_by_fk",
"created_by_fk",
}
)
def _strip_opaque_fields(item: Any) -> Any:
"""Return *item* with child-item audit/implementation fields removed.
Pass-through for non-dict values (scalars, strings) — the strip
only applies where it matters (dataset column / metric dicts).
"""
if not isinstance(item, dict):
return item
return {k: v for k, v in item.items() if k not in _CHILD_ITEM_OPAQUE_FIELDS}
# Chart ``params`` sub-keys that are promoted to first-class kinds.
# Every other params sub-key falls through to ``kind="field"``.
_CHART_PARAMS_KIND_BY_KEY: dict[str, str] = {
"adhoc_filters": "filter",
"time_range": "time_range",
"color_scheme": "color_palette",
"metrics": "metric",
"groupby": "dimension",
"columns": "dimension",
}
# Chart ``params`` sub-keys that are machine-stamped on save and don't
# carry user-authored signal — same category as ``last_saved_at`` on
# the scalar side. ``slice_id`` is a self-reference to the chart's
# own primary id; Superset's save paths add or refresh it on every
# save, producing a spurious "field" record on the first save after
# a chart's params were stored without it.
_CHART_PARAMS_AUDIT_KEYS: frozenset[str] = frozenset({"slice_id"})
def scalar_fields_for(
model_cls: Any,
*,
special: frozenset[str] = frozenset(),
audit: frozenset[str] = frozenset(),
) -> frozenset[str]:
"""Scalar columns on ``model_cls`` that should produce change records.
Derived from the model itself at call time so contributors (and
downstream derivatives) don't have to maintain a parallel whitelist
in this module. Adding a new column to ``Dashboard``, ``Slice``, or
``SqlaTable`` — whether upstream or in a fork — automatically flows
through to ``version_changes`` on the next save.
Excludes, in order:
1. The model's own ``__versioned__.exclude`` list, so change records
stay consistent with Continuum's shadow tables. If Continuum
isn't tracking a column, the change log shouldn't either.
2. :data:`_AUDIT_FIELDS` — ``id``, ``uuid``, and the audit
timestamps / user-id columns shared across the three entity types.
3. The caller's ``audit`` set — model-specific save-side-effect
columns that aren't user-authored content. ``Slice.last_saved_at``
/ ``last_saved_by_fk`` are stamped on every chart save by
``UpdateChartCommand``, similar to how ``changed_on`` is stamped
by the ORM event listener; emitting "field" records for them
would noise up the change log with one entry per save that
carries no user-meaningful signal.
4. The caller's ``special`` set — columns handled by a dedicated
differ elsewhere. ``Slice.params``, for example, is walked by
:func:`diff_slice_params` to produce first-class ``filter`` /
``time_range`` / ``metric`` / ``dimension`` records; emitting
it as a single opaque ``field`` would defeat that.
"""
try:
table = model_cls.__table__
except AttributeError:
return frozenset()
columns = frozenset(c.name for c in table.columns)
continuum_exclude = frozenset(
getattr(model_cls, "__versioned__", {}).get("exclude", []) or []
)
return columns - continuum_exclude - _AUDIT_FIELDS - audit - special
@dataclass(frozen=True)
class ChangeRecord:
"""One atomic change, as stored in ``version_changes``.
Fields match the ``version_changes`` columns one-to-one so the
capture listener can serialise a list of these to
``session.bulk_insert_mappings`` without translation.
"""
kind: str
path: list[Any]
from_value: Any
to_value: Any
Key = str | int
def _values_equivalent(from_value: Any, to_value: Any) -> bool:
"""True if a transition from ``from_value`` to ``to_value`` should
NOT produce a record.
Beyond plain ``==`` equality, treats ``None`` and ``""`` as equivalent:
Superset's save paths normalize nullable strings to ``""`` on first
write (e.g. ``Dashboard.css``, ``certified_by``,
``certification_details``), so a first-save transition between
null and empty string carries no user-authored signal.
"""
if from_value == to_value:
return True
if from_value in (None, "") and to_value in (None, ""):
return True
return False
def _diff_scalar(
field_name: str,
from_value: Any,
to_value: Any,
) -> ChangeRecord | None:
"""Emit a generic ``kind="field"`` record when a scalar differs."""
if _values_equivalent(from_value, to_value):
return None
return ChangeRecord(
kind="field",
path=[field_name],
from_value=from_value,
to_value=to_value,
)
def _diff_list_by_natural_key(
kind: str,
path_prefix: list[Any],
from_list: list[Any] | None,
to_list: list[Any] | None,
key_fn: Callable[[Any], Key | None],
) -> list[ChangeRecord]:
"""Diff two lists, matching elements by natural key.
Emits one record per add / remove / modify. When ``key_fn`` returns
``None`` for an item (natural key missing or empty), the item falls
back to its position as a synthetic key — so insertions in the
middle of a keyless list still produce sensible records, at the
cost of position-dependent identity.
"""
from_list = from_list or []
to_list = to_list or []
def _effective_key(raw: Key | None, idx: int) -> Key:
if raw is None or raw == "":
return idx
return raw
from_by_key: dict[Key, Any] = {}
for idx, item in enumerate(from_list):
from_by_key[_effective_key(key_fn(item), idx)] = item
to_by_key: dict[Key, Any] = {}
for idx, item in enumerate(to_list):
to_by_key[_effective_key(key_fn(item), idx)] = item
records: list[ChangeRecord] = []
# Preserve `from` order then append `to`-only keys, so sequence is
# deterministic across runs. For dict items (dataset columns /
# metrics) we strip audit/implementation fields before comparing
# AND before emitting — otherwise a save that only adds a new
# column would also emit "changed" records for every existing
# column, because their ``changed_on`` timestamps all refreshed.
# The stripped from/to are what the renderer sees; the per-column
# audit trail is already aggregated at the transaction level in
# ``version_transaction`` (``user_id`` + ``issued_at``).
for k, from_item in from_by_key.items():
to_item = to_by_key.get(k)
stripped_from = _strip_opaque_fields(from_item)
if to_item is None:
records.append(
ChangeRecord(
kind=kind,
path=[*path_prefix, k],
from_value=stripped_from,
to_value=None,
)
)
continue
stripped_to = _strip_opaque_fields(to_item)
if stripped_from != stripped_to:
records.append(
ChangeRecord(
kind=kind,
path=[*path_prefix, k],
from_value=stripped_from,
to_value=stripped_to,
)
)
for k, to_item in to_by_key.items():
if k not in from_by_key:
records.append(
ChangeRecord(
kind=kind,
path=[*path_prefix, k],
from_value=None,
to_value=_strip_opaque_fields(to_item),
)
)
return records
def _filter_key(f: Any) -> Key | None:
"""Natural key for an adhoc filter — its subject (column name).
Users rarely have two filters on the same column; when they do the
secondary dimensions (operator, comparator) appear in the record's
from/to values so the renderer can disambiguate.
"""
return f.get("subject") if isinstance(f, dict) else None
def _metric_key(m: Any) -> Key | None:
"""Natural key for a metric: prefer ``label``, fall back to column+aggregate."""
if not isinstance(m, dict):
return None
if label := m.get("label"):
return label
column = m.get("column")
col_name = column.get("column_name") if isinstance(column, dict) else None
agg = m.get("aggregate")
if col_name and agg:
return f"{agg}({col_name})"
return None
def _dimension_key(d: Any) -> Key | None:
"""Natural key for a groupby/columns element — usually a bare string."""
if isinstance(d, str):
return d
if isinstance(d, dict):
return d.get("label") or d.get("column_name")
return None
def _coerce_params(p: Any) -> dict[str, Any]:
"""Decode ``Slice.params`` which is stored as a JSON string."""
if p is None:
return {}
if isinstance(p, str):
try:
decoded = _json.loads(p)
except _json.JSONDecodeError:
return {}
return decoded if isinstance(decoded, dict) else {}
if isinstance(p, dict):
return p
return {}
def diff_slice_params(
from_params: Any,
to_params: Any,
) -> list[ChangeRecord]:
"""Diff the ``Slice.params`` JSON blob, promoting known keys to kinds."""
from_p = _coerce_params(from_params)
to_p = _coerce_params(to_params)
records: list[ChangeRecord] = []
all_keys = (set(from_p) | set(to_p)) - _CHART_PARAMS_AUDIT_KEYS
for key in sorted(all_keys):
from_v = from_p.get(key)
to_v = to_p.get(key)
if _values_equivalent(from_v, to_v):
continue
kind = _CHART_PARAMS_KIND_BY_KEY.get(key)
if kind == "filter" and isinstance(from_v, list) and isinstance(to_v, list):
records.extend(
_diff_list_by_natural_key(
"filter",
["params", "adhoc_filters"],
from_v,
to_v,
_filter_key,
)
)
elif kind == "metric" and isinstance(from_v, list) and isinstance(to_v, list):
records.extend(
_diff_list_by_natural_key(
"metric",
["params", "metrics"],
from_v,
to_v,
_metric_key,
)
)
elif (
kind == "dimension" and isinstance(from_v, list) and isinstance(to_v, list)
):
records.extend(
_diff_list_by_natural_key(
"dimension",
["params", key],
from_v,
to_v,
_dimension_key,
)
)
elif kind:
# scalar first-class kind (time_range, color_palette) —
# single record carrying the whole value
records.append(
ChangeRecord(
kind=kind,
path=["params", key],
from_value=from_v,
to_value=to_v,
)
)
else:
# unknown params sub-key: generic field change
records.append(
ChangeRecord(
kind="field",
path=["params", key],
from_value=from_v,
to_value=to_v,
)
)
return records
def diff_scalar_fields(
pre: dict[str, Any],
post: dict[str, Any],
*,
fields: Iterable[str],
) -> list[ChangeRecord]:
"""Emit one ``kind="field"`` record per differing field in ``fields``.
The ``fields`` iterable is supplied by the caller — typically
:func:`scalar_fields_for` at listener wiring time. Keeping the
field list outside this function means adding a new column to a
model does not require a matching edit here.
"""
records: list[ChangeRecord] = []
for field in sorted(fields):
record = _diff_scalar(field, pre.get(field), post.get(field))
if record is not None:
records.append(record)
return records
def diff_slice(
pre: dict[str, Any],
post: dict[str, Any],
*,
fields: Iterable[str],
) -> list[ChangeRecord]:
"""Full Slice (chart) diff — scalars plus params classification.
Pass ``fields=scalar_fields_for(Slice, special=frozenset({"params"}))``
to get the ``params``-excluded scalar set; ``Slice.params`` is diffed
separately by :func:`diff_slice_params` for kind promotion.
"""
records = diff_scalar_fields(pre, post, fields=fields)
records.extend(diff_slice_params(pre.get("params"), post.get("params")))
return records
def diff_json_field(
field_name: str,
from_value: Any,
to_value: Any,
*,
exclude_keys: frozenset[str] = frozenset(),
) -> list[ChangeRecord]:
"""Diff a TEXT column that stores a JSON dict, emitting one record
per top-level key whose value changed.
Used for ``Dashboard.json_metadata`` (``position_json`` has its
own structural diff via :func:`diff_dashboard_layout`). Saving the
blob verbatim into ``from_value`` / ``to_value`` would swamp the
change log with multi-KB strings on every save; walking the parsed
dict at the top level reduces noise to "what changed".
*exclude_keys* names sub-keys that are frontend-derived /
auto-stamped on save and don't carry user-authored signal. Same
rationale as the ``audit`` parameter on
:func:`scalar_fields_for` for the parent-column level.
Path is ``[field_name, key]``, mirroring ``diff_slice_params``'s
``["params", key]`` shape so renderers can use a single addressing
scheme across the chart and dashboard sides.
"""
from_p = _coerce_params(from_value)
to_p = _coerce_params(to_value)
records: list[ChangeRecord] = []
for key in sorted(set(from_p) | set(to_p)):
if key in exclude_keys:
continue
from_v = from_p.get(key)
to_v = to_p.get(key)
if _values_equivalent(from_v, to_v):
continue
records.append(
ChangeRecord(
kind="field",
path=[field_name, key],
from_value=from_v,
to_value=to_v,
)
)
return records
# json_metadata sub-keys that the frontend auto-stamps / auto-derives
# on save. They mirror dashboard membership and chart inventory, not
# user-authored content, so they noise up the change log without
# carrying intent. The records produced for these keys can be ~50KB
# (full label-colour dict) for a one-chart save.
#
# chart_configuration: per-chart cross-filter scope state,
# re-derived when charts are added/removed.
# global_chart_configuration: dashboard-wide filter scope; the
# ``chartsInScope`` list mirrors live
# dashboard membership.
# map_label_colors: label → colour map, re-stamped on save
# from currently-visible filter values.
# show_chart_timestamps: frontend toggle, defaults applied on
# save when missing.
# color_namespace: scoped colour-scheme namespace, frontend-
# derived from the chart set.
DASHBOARD_JSON_METADATA_AUDIT_KEYS: frozenset[str] = frozenset(
{
"chart_configuration",
"global_chart_configuration",
"map_label_colors",
"show_chart_timestamps",
"color_namespace",
}
)
# Layout component types and how they map to record ``kind`` strings.
# ``HEADER_ID`` is excluded — that's the dashboard's title bar, mirrored
# from ``dashboard_title``. ``ROOT_ID`` and ``GRID_ID`` are structural
# singletons whose only deltas are children lists, which we infer from
# the moves of the children themselves.
_LAYOUT_TYPE_TO_KIND: dict[str, str] = {
"CHART": "chart",
"ROW": "row",
"COLUMN": "column",
"TAB": "tab",
"TABS": "tabs",
"HEADER": "header",
"MARKDOWN": "markdown",
"DIVIDER": "divider",
}
# Layout components we never emit records for: ROOT_ID is the layout
# root (always present, never moves); GRID_ID is the singleton vertical
# stack inside ROOT_ID; HEADER_ID is the dashboard's title bar (already
# covered by the ``dashboard_title`` scalar field).
_LAYOUT_SUPPRESSED_IDS: frozenset[str] = frozenset({"ROOT_ID", "GRID_ID", "HEADER_ID"})
def _layout_component_label(node: dict[str, Any]) -> str | None:
"""Extract a human-readable label from a layout node, when one
exists. Used to build the ``from_value`` / ``to_value`` payload so
the UI can render messages like "Added chart 'Foo'" without
needing to fetch related entities.
"""
meta = node.get("meta") or {}
if not isinstance(meta, dict):
return None
for key in ("sliceName", "label", "text"):
value = meta.get(key)
if isinstance(value, str) and value.strip():
return value
return None
def _layout_node_payload(node: dict[str, Any]) -> dict[str, Any]:
"""Minimal payload describing a layout node — enough for the UI
to render the change without dragging the full layout snippet
(which can be ~1KB per row when CHART nodes carry colour configs).
"""
meta = node.get("meta") or {}
if not isinstance(meta, dict):
meta = {}
payload: dict[str, Any] = {"id": node.get("id"), "type": node.get("type")}
if (label := _layout_component_label(node)) is not None:
payload["name"] = label
if (chart_id := meta.get("chartId")) is not None:
payload["chartId"] = chart_id
# ``uuid`` (slice uuid for CHART nodes) lets the M2M-vs-layout
# dedupe in :func:`fold_dashboard_layout_with_chart_changes`
# match on the same key — :func:`diff_dashboard_slices` keys its
# records by uuid, not chartId.
if (slice_uuid := meta.get("uuid")) is not None:
payload["uuid"] = slice_uuid
return payload
def _layout_parent_id(node: dict[str, Any]) -> Any:
"""The immediate-parent node id for a layout component — the last
entry in ``parents``. Used to detect moves: same id, different
parent."""
parents = node.get("parents") or []
if not isinstance(parents, list) or not parents:
return None
return parents[-1]
def _meta_excluding_position(node: dict[str, Any]) -> dict[str, Any]:
"""Meta dict with ``parents``-equivalent positional bits removed
so two nodes that differ ONLY in where they sit compare equal at
the meta level. Move detection uses ``parents`` directly; this is
for "edit" (meta change) detection."""
meta = node.get("meta") or {}
return dict(meta) if isinstance(meta, dict) else {}
def _diff_layout_node(
node_id: str,
pre_node: Optional[dict[str, Any]],
post_node: Optional[dict[str, Any]],
) -> Optional[ChangeRecord]:
"""Diff one component slot in the layout dict and return a record for
the logical action — add, remove, move, edit — or ``None`` when the
slot is unchanged or holds an unknown component type.
"""
node_for_kind = post_node or pre_node or {}
kind = _LAYOUT_TYPE_TO_KIND.get(node_for_kind.get("type") or "")
if kind is None:
return None # unknown component type — skip rather than emit garbage
if pre_node is None and post_node is not None:
return ChangeRecord(
kind=kind,
path=["add", kind, node_id],
from_value=None,
to_value=_layout_node_payload(post_node),
)
if post_node is None and pre_node is not None:
return ChangeRecord(
kind=kind,
path=["remove", kind, node_id],
from_value=_layout_node_payload(pre_node),
to_value=None,
)
# Both present — check move first, then edit.
assert pre_node is not None
assert post_node is not None
pre_parent = _layout_parent_id(pre_node)
if pre_parent != (post_parent := _layout_parent_id(post_node)):
return ChangeRecord(
kind=kind,
path=["move", kind, node_id],
from_value={**_layout_node_payload(pre_node), "parent": pre_parent},
to_value={**_layout_node_payload(post_node), "parent": post_parent},
)
pre_meta = _meta_excluding_position(pre_node)
if pre_meta != (post_meta := _meta_excluding_position(post_node)):
return ChangeRecord(
kind=kind,
path=["edit", kind, node_id],
from_value={**_layout_node_payload(pre_node), "meta": pre_meta},
to_value={**_layout_node_payload(post_node), "meta": post_meta},
)
return None
def diff_dashboard_layout(
pre: Any,
post: Any,
) -> list[ChangeRecord]:
"""Structural diff of a dashboard's ``position_json``, emitting one
record per logical layout action.
Walks both sides keyed on the component ``id`` (e.g.
``"CHART-mkPZLOnWCElgL0Udp1gVK"``):
* id present only in *post* → ``op=add``, ``from_value=None``,
``to_value=<minimal payload>``
* id present only in *pre* → ``op=remove``, payload swapped
* id in both, ``parents`` differs → ``op=move``, payloads carry
old + new parent
* id in both, parents equal, ``meta`` differs → ``op=edit``,
payloads carry old + new meta
* id in both, equal → no record
The ``operation_type``-style verb is encoded in
``path[0]`` as ``["add"|"remove"|"move"|"edit", <component-kind>,
<component-id>]`` so the UI's path-based renderer can read it
without inspecting from/to.
``ROOT_ID`` / ``GRID_ID`` / ``HEADER_ID`` are suppressed (see
:data:`_LAYOUT_SUPPRESSED_IDS`).
"""
pre_nodes = _layout_nodes(pre)
post_nodes = _layout_nodes(post)
records: list[ChangeRecord] = []
for node_id in sorted(set(pre_nodes) | set(post_nodes)):
record = _diff_layout_node(
node_id, pre_nodes.get(node_id), post_nodes.get(node_id)
)
if record is not None:
records.append(record)
return records
def _layout_nodes(raw: Any) -> dict[str, dict[str, Any]]:
"""Coerce *raw* (a ``position_json`` blob or already-parsed dict) into
the ``{node_id: node_dict}`` shape used by the layout diff, filtering
out non-dict values and the always-present root/grid/header singletons.
"""
parsed = _coerce_params(raw)
return {
k: v
for k, v in parsed.items()
if isinstance(v, dict) and k not in _LAYOUT_SUPPRESSED_IDS
}
def diff_dashboard(
pre: dict[str, Any],
post: dict[str, Any],
*,
fields: Iterable[str],
) -> list[ChangeRecord]:
"""Dashboard diff: scalar fields plus structural diff of
``json_metadata`` and ``position_json``.
Promoting ``position_json`` to ``kind="layout"`` or
``json_metadata.native_filter_configuration`` to ``kind="filter"``
is deferred to Phase 2 alongside the UI that would render them
(spec Clarifications §Session 2026-04-24); until then, both fields
fall through to ``kind="field"`` records keyed by sub-key.
"""
records = diff_scalar_fields(pre, post, fields=fields)
records.extend(
diff_json_field(
"json_metadata",
pre.get("json_metadata"),
post.get("json_metadata"),
exclude_keys=DASHBOARD_JSON_METADATA_AUDIT_KEYS,
)
)
records.extend(
diff_dashboard_layout(pre.get("position_json"), post.get("position_json"))
)
return records
def _layout_chart_uuids_by_verb(
records: list[ChangeRecord],
) -> tuple[set[Any], set[Any]]:
"""Scan *records* for layout ``add``/``remove`` records on charts and
return ``(added_uuids, removed_uuids)`` sets.
"""
added: set[Any] = set()
removed: set[Any] = set()
for r in records:
if r.kind != "chart" or len(r.path) < 3:
continue
verb = r.path[0]
if verb == "add" and isinstance(r.to_value, dict):
uuid_ = r.to_value.get("uuid")
if uuid_ is not None:
added.add(uuid_)
elif verb == "remove" and isinstance(r.from_value, dict):
uuid_ = r.from_value.get("uuid")
if uuid_ is not None:
removed.add(uuid_)
return added, removed
def _is_redundant_m2m_chart_record(
r: ChangeRecord, added_uuids: set[Any], removed_uuids: set[Any]
) -> bool:
"""Return ``True`` when *r* is an M2M-style slice record that
duplicates an already-captured layout add/remove for the same uuid.
M2M slice records have path ``["slices", uuid]`` (length 2); their
info is strictly less than the corresponding layout record's
(no name, no parent), so the layout side wins on dedup.
"""
if r.kind != "chart" or len(r.path) != 2 or r.path[0] != "slices":
return False
slice_uuid = r.path[1]
if r.from_value is None and r.to_value is not None:
return slice_uuid in added_uuids
if r.to_value is None and r.from_value is not None:
return slice_uuid in removed_uuids
return False
def fold_dashboard_layout_with_chart_changes(
records: list[ChangeRecord],
) -> list[ChangeRecord]:
"""When a dashboard save adds/removes charts, the ``slices`` M2M
diff and the layout diff each emit a record for the same logical
action. Drop the M2M ``kind="chart"`` records — the layout-side
record carries more information (chart name, parent container).
The matching is by slice uuid: ``diff_dashboard_slices`` produces
records with path ``["slices", <slice-uuid>]``; the layout
payloads carry the same uuid (sourced from
``position_json.CHART-x.meta.uuid``). We dedupe on that key.
Called from the change-records listener after the M2M and layout
diffs are both merged into the per-entity buffer.
"""
added_uuids, removed_uuids = _layout_chart_uuids_by_verb(records)
return [
r
for r in records
if not _is_redundant_m2m_chart_record(r, added_uuids, removed_uuids)
]
def diff_dataset(
pre: dict[str, Any],
post: dict[str, Any],
*,
fields: Iterable[str],
) -> list[ChangeRecord]:
"""SqlaTable scalar-field diff. All paths emit ``kind="field"``.
Children (columns, metrics) are diffed separately via
:func:`diff_dataset_columns` / :func:`diff_dataset_metrics`. The
listener reads them from Continuum shadow tables
(``table_columns_version`` / ``sql_metrics_version``) rather than
walking the ORM collection.
"""
return diff_scalar_fields(pre, post, fields=fields)
def diff_dataset_columns(
from_columns: list[dict[str, Any]] | None,
to_columns: list[dict[str, Any]] | None,
) -> list[ChangeRecord]:
"""Child-collection diff on TableColumn rows, keyed by column_name."""
return _diff_list_by_natural_key(
kind="column",
path_prefix=["columns"],
from_list=from_columns,
to_list=to_columns,
key_fn=lambda c: c.get("column_name") if isinstance(c, dict) else None,
)
def diff_dataset_metrics(
from_metrics: list[dict[str, Any]] | None,
to_metrics: list[dict[str, Any]] | None,
) -> list[ChangeRecord]:
"""Child-collection diff on SqlMetric rows, keyed by metric_name."""
return _diff_list_by_natural_key(
kind="metric",
path_prefix=["metrics"],
from_list=from_metrics,
to_list=to_metrics,
key_fn=lambda m: m.get("metric_name") if isinstance(m, dict) else None,
)
def diff_dashboard_slices(
from_slice_uuids: list[str] | None,
to_slice_uuids: list[str] | None,
) -> list[ChangeRecord]:
"""Diff a dashboard's chart membership, keyed by slice uuid.
Pure set-diff: added uuids get ``from_value=None, to_value=uuid``;
removed uuids get the inverse. No "changed" case because chart
associations are identity-only (the list element IS the uuid).
"""
from_set = set(from_slice_uuids or [])
to_set = set(to_slice_uuids or [])
records: list[ChangeRecord] = []
for uuid_ in sorted(from_set - to_set):
records.append(
ChangeRecord(
kind="chart",
path=["slices", uuid_],
from_value=uuid_,
to_value=None,
)
)
for uuid_ in sorted(to_set - from_set):
records.append(
ChangeRecord(
kind="chart",
path=["slices", uuid_],
from_value=None,
to_value=uuid_,
)
)
return records

View File

@@ -0,0 +1,68 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""ETag header emission for the entity-versioning API surface."""
from __future__ import annotations
from typing import Optional, TYPE_CHECKING
from uuid import UUID
import sqlalchemy as sa
from flask_appbuilder import Model
from superset.extensions import db
if TYPE_CHECKING:
from flask import Response
def set_version_etag(response: "Response", version_uuid: Optional[UUID]) -> "Response":
"""Attach ``ETag: "<version_uuid>"`` to *response*.
Uses RFC 7232 strong-validator form (no leading ``W/``); the response
header value is wrapped in double quotes per the spec. No-op when
*version_uuid* is ``None`` (entity has no version rows yet).
"""
if version_uuid is not None:
response.headers["ETag"] = f'"{version_uuid}"'
return response
def set_version_etag_by_uuid(
response: "Response", model_cls: type[Model], entity_uuid: UUID
) -> "Response":
"""Attach ``ETag`` derived from *entity_uuid*'s current live version.
Looks up ``entity_id`` from *entity_uuid* via the model's ``uuid`` column,
then derives ``version_uuid`` via :class:`VersionDAO`. No-op when the
entity is missing or has no version rows yet.
Prefer :func:`set_version_etag` when the caller already has the entity's
integer id — this helper costs an extra ``SELECT id WHERE uuid = ?``.
"""
# pylint: disable=import-outside-toplevel
from superset.daos.version import VersionDAO
entity_id = db.session.scalar(
sa.select(model_cls.id).where(model_cls.uuid == entity_uuid)
)
if entity_id is None:
return response
return set_version_etag(
response,
VersionDAO.current_live_version_uuid(model_cls, entity_id, entity_uuid),
)

View File

@@ -0,0 +1,288 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
import json
import logging
from typing import Any, Callable
import sqlalchemy as sa
import sqlalchemy.orm as sa_orm
from sqlalchemy_continuum import is_modified, version_class
from sqlalchemy_continuum.operation import Operation
from sqlalchemy_continuum.plugins.base import Plugin
from sqlalchemy_continuum.plugins.flask import FlaskPlugin
from sqlalchemy_continuum.transaction import TransactionFactory
from sqlalchemy_continuum.utils import versioned_column_properties
from superset.versioning.diff import DASHBOARD_JSON_METADATA_AUDIT_KEYS
logger = logging.getLogger(__name__)
def _normalize_dashboard_json_metadata(value: Any) -> Any:
"""Parse ``dashboards.json_metadata`` and drop frontend-stamped audit
sub-keys so a save that only re-stamps ``map_label_colors`` (etc.)
compares equal to its predecessor.
``map_label_colors`` is regenerated client-side from the
``LabelsColorMap`` singleton on every save (see
``saveDashboardRequest`` in
``superset-frontend/src/dashboard/actions/dashboardState.ts``).
The singleton's contents depend on which charts have rendered in
the page session, so two saves with no user-authored change produce
different bytes. The diff engine ignores the same audit sub-keys
(``DASHBOARD_JSON_METADATA_AUDIT_KEYS`` in
``superset/versioning/diff.py``); aligning the skip-plugin's
comparison with that filter keeps the two paths consistent.
"""
if value is None or value == "":
return value
try:
parsed = json.loads(value)
except (TypeError, ValueError):
return value
if not isinstance(parsed, dict):
return parsed
return {
k: v for k, v in parsed.items() if k not in DASHBOARD_JSON_METADATA_AUDIT_KEYS
}
# Per-class column normalizers, keyed on (class_name, column_name). Class
# name is used (rather than class itself) so importing the model classes
# at module load is unnecessary — keeps the plugin importable before
# ``make_versioned()`` has registered the version classes.
_COLUMN_NORMALIZERS: dict[tuple[str, str], Callable[[Any], Any]] = {
("Dashboard", "json_metadata"): _normalize_dashboard_json_metadata,
}
def _normalize_for_compare(target: Any, col_name: str, value: Any) -> Any:
"""Return *value* run through any per-class column normalizer registered
in ``_COLUMN_NORMALIZERS``, else *value* unchanged.
"""
normalizer = _COLUMN_NORMALIZERS.get((type(target).__name__, col_name))
return normalizer(value) if normalizer is not None else value
def _has_dirty_versioned_children(target: Any, uow: Any) -> bool:
"""Return ``True`` when *uow* contains an operation for a versioned
child of *target* (e.g. a ``TableColumn`` whose ``table`` is *target*).
Used by :meth:`SkipUnmodifiedPlugin._is_no_op_update` so a parent
UPDATE that was force-flagged by
:func:`baseline._force_parent_dirty_on_child_change` is preserved
even though the parent's own scalars match the previous version.
"""
# pylint: disable=import-outside-toplevel
from superset.versioning.baseline import _child_to_parent_registry
child_map = _child_to_parent_registry()
target_cls = type(target)
for _key, op in uow.operations.items():
entry = child_map.get(type(op.target))
if entry is None:
continue
parent_attr, parent_cls = entry
if parent_cls is not target_cls:
continue
parent = getattr(op.target, parent_attr, None)
if parent is target:
return True
return False
class VersionTransactionFactory(TransactionFactory):
"""TransactionFactory that renames the transaction table and adds a bare
``user_id`` integer column so the FlaskPlugin can record the acting user
without requiring a FK relationship to ``ab_user``.
Continuum only adds ``user_id`` when ``user_cls`` is set on the manager.
We add it unconditionally (no FK) so that both the FlaskPlugin's
``transaction_args()`` and our ``baseline.py`` direct inserts can record
which user triggered the version event.
"""
def create_class(self, manager: Any) -> Any:
cls = super().create_class(manager)
cls.__table__.name = "version_transaction"
# Rename the PostgreSQL sequence for consistent naming.
for col in cls.__table__.columns:
if col.name == "id" and col.default is not None:
col.default.name = "version_transaction_id_seq"
# Add user_id INTEGER (no FK) for user tracking. The mapper has not
# been configured yet at this point, so append_column + add_property
# is safe here.
user_id_col = sa.Column("user_id", sa.Integer, nullable=True)
cls.__table__.append_column(user_id_col)
cls.__mapper__.add_property("user_id", sa_orm.column_property(user_id_col))
return cls
class VersioningFlaskPlugin(FlaskPlugin):
"""FlaskPlugin subclass that uses Superset's :func:`get_user_id` (which
reads ``g.user``) instead of Flask-Login's ``current_user``. Superset's
JWT auth for API routes populates ``g.user`` but leaves
``flask_login.current_user`` anonymous, so the upstream plugin would
record ``user_id=NULL`` on version_transaction rows created by API
calls. Returns an empty dict (so the transaction row is written
anyway) when no user is available — e.g. CLI, Celery, import/export.
"""
def transaction_args(self, uow: Any, session: Any) -> dict[str, Any]:
# pylint: disable=import-outside-toplevel
from flask import has_request_context, request
from superset.utils.core import get_user_id
user_id = get_user_id()
if user_id is None:
return {}
remote_addr: str | None
try:
remote_addr = request.remote_addr if has_request_context() else None
except RuntimeError:
remote_addr = None
return {"user_id": user_id, "remote_addr": remote_addr}
class SkipUnmodifiedPlugin(Plugin):
"""Skip creating version rows for UPDATE operations whose post-flush
column values are byte-identical to the previous live version row.
Continuum creates a version row for every entity in ``session.dirty``,
including saves where the SQLAlchemy ORM marked a column dirty (because
Superset re-serialised ``json_metadata`` via ``json.dumps`` on the save
path, or AuditMixin auto-bumped ``changed_on``) but the resulting value
is unchanged from the previous version. Those rows pollute the version
history with no-op entries.
``is_modified()`` from Continuum is not enough: it consults SQLAlchemy's
attribute history, which is "did setattr produce a different value?",
not "did the final stored value change?". So we compare each
non-excluded versioned column on ``operation.target`` against the
previous live version row's value; if all are equal, the operation
is marked ``processed`` and Continuum skips it (see
``UnitOfWork.create_version_objects``).
The associated transaction is not removed; if every operation is a
no-op the transaction becomes an orphan in ``version_transaction``
and is swept by the retention task at cutoff. Deleting the row
inline (in this hook) was considered and rejected: it would couple
this plugin to the change-records listener's buffer state — both
would have to agree that the flush produced nothing before we
could safely DROP the tx row, since ``version_changes.transaction_id``
has an ON DELETE CASCADE FK that would silently drop any buffered
diff records the listener was about to insert. The orphan's storage
cost (~40 bytes/row) is small enough that the coordination isn't
worth it; retention handles the cleanup correctly by construction
(orphans have no parent shadow → they're never "preserved" by the
"preserve transactions whose shadow has the live row" rule and
age out with the rest of the history).
"""
def before_create_version_objects(self, uow: Any, session: Any) -> None:
# ``uow.operations`` is a custom Continuum ``Operations`` collection;
# use its ``.items()`` method (not ``.values()``) to iterate.
# INSERTs always create a row (no prior to compare against);
# DELETEs can't be no-ops. Only UPDATE operations are candidates.
for _key, operation in uow.operations.items():
if operation.processed or operation.type != Operation.UPDATE:
continue
try:
if self._is_no_op_update(operation.target, session, uow):
operation.processed = True
except Exception: # pylint: disable=broad-except
# Defensive — if introspection fails for any reason, fall
# back to creating the version row.
logger.exception(
"SkipUnmodifiedPlugin: skip-check raised for %s",
type(operation.target).__name__,
)
@classmethod
def _is_no_op_update(cls, target: Any, session: Any, uow: Any) -> bool:
"""Return ``True`` when this UPDATE produces no observable change to
any non-excluded versioned column **and** no versioned children of
*target* are being modified in this flush.
Stages:
1. If any versioned child (e.g. a ``TableColumn`` whose ``table``
is *target*) has an operation in ``uow.operations``, the parent
is being force-touched by
``baseline._force_parent_dirty_on_child_change`` to anchor the
child changes against a parent shadow row. Keep the row.
2. ``is_modified(target)`` — cheap SQLAlchemy attribute-history
check. Returns ``False`` when only excluded columns/relationships
(``owners``, ``changed_on``, …) are dirty. This is the common
case (every save auto-bumps ``changed_on``); short-circuiting
here saves the DB round-trip in stage 3.
3. Compare post-flush column values against the previous live
version row's stored values. Catches the case where SQLAlchemy
sees a column as dirty (e.g. ``set_dash_metadata`` re-serialised
``json_metadata`` to a different byte sequence) but the
resulting parsed content matches the prior version.
"""
if _has_dirty_versioned_children(target, uow):
return False
if not is_modified(target):
return True
return cls._matches_previous_version(target, session)
@staticmethod
def _matches_previous_version(target: Any, session: Any) -> bool:
"""Return ``True`` when every non-excluded versioned column on
*target* matches the value stored in its previous live version row
(i.e., the row with ``end_transaction_id IS NULL``).
Returns ``False`` for entities with no prior version row — letting
Continuum create the first one. In practice this case is rare:
``register_baseline_listener`` (in ``superset.versioning.baseline``)
runs ahead of Continuum's ``before_flush`` and inserts a baseline
row for any entity being saved for the first time, so the second
save (and beyond) is what flows through this path.
"""
cls = type(target)
try:
ver_cls = version_class(cls)
except Exception: # pylint: disable=broad-except
return False
ver_table = ver_cls.__table__
col_keys = [prop.key for prop in versioned_column_properties(target)]
if not col_keys:
return False
select_stmt = (
sa.select(*[ver_table.c[c] for c in col_keys])
.where(ver_table.c.id == target.id)
.where(ver_table.c.end_transaction_id.is_(None))
.order_by(ver_table.c.transaction_id.desc())
.limit(1)
)
row = session.connection().execute(select_stmt).first()
if row is None:
return False # no previous version → let Continuum create one
for col_name, prev_value in zip(col_keys, row, strict=False):
post = _normalize_for_compare(target, col_name, getattr(target, col_name, None))
pre = _normalize_for_compare(target, col_name, prev_value)
if post != pre:
return False
return True

View File

@@ -0,0 +1,487 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Read-side queries for the entity-versioning API.
Pure-read helpers that translate Continuum shadow rows and
``version_changes`` records into the shapes the API endpoints return.
The corresponding write side (restore) lives in
:mod:`superset.versioning.restore`. The backward-compat ``VersionDAO``
façade in :mod:`superset.daos.version` re-exports both.
Also exposes the deterministic version-UUID derivation
(:data:`VERSION_UUID_NAMESPACE` + :func:`derive_version_uuid`) used by
both the read endpoints and the ETag emission path in
:mod:`superset.versioning.etag`.
"""
from __future__ import annotations
import uuid
from typing import Any, Optional
from uuid import UUID
import sqlalchemy as sa
from sqlalchemy_continuum import version_class
from superset.extensions import db
# Fixed UUIDv5 namespace under which per-(entity, transaction) version UUIDs
# are derived. Never change this constant — changing it invalidates every
# version_uuid that clients may have cached, bookmarked, or stored.
VERSION_UUID_NAMESPACE = UUID("7a6f5d9b-4c3b-5d8e-9a1c-0e2b4c6d8f10")
# Continuum's integer ``operation_type`` mapped to the string the API
# returns. Kept short and stable for downstream tooling consuming the
# raw response. Continuum guarantees 0/1/2; anything else is a Continuum
# version mismatch and surfaces as ``str(int)`` rather than crashing.
_OP_TYPE_LABELS: dict[int, str] = {0: "baseline", 1: "update", 2: "delete"}
def derive_version_uuid(entity_uuid: UUID, transaction_id: int) -> UUID:
"""Derive a deterministic UUIDv5 identifying one version row.
The UUID is a function of the owning entity's UUID and the Continuum
``transaction_id`` of the version row, so it is stable across retention
pruning (which never changes ``transaction_id``) and portable across
replicas. It is not randomly generated — two Supersets with identical
``(entity.uuid, transaction_id)`` will compute the same version_uuid.
"""
return uuid.uuid5(VERSION_UUID_NAMESPACE, f"{entity_uuid}:{transaction_id}")
def _resolve_version_tables(
model_cls: type,
) -> tuple[sa.Table, sa.Table, sa.Table]:
"""Return the (version, transaction, user) ``Table`` objects used by the
listing and snapshot queries.
All three lookups happen inside this module on every read; centralising
the trio (a) keeps the imports in one place and (b) makes the join helper
below take a uniform signature.
"""
# pylint: disable=import-outside-toplevel
from sqlalchemy_continuum import versioning_manager
from superset import security_manager
ver_tbl = version_class(model_cls).__table__
tx_tbl = versioning_manager.transaction_cls.__table__
user_tbl = security_manager.user_model.__table__
return ver_tbl, tx_tbl, user_tbl
def _version_with_tx_user_join(
ver_tbl: sa.Table, tx_tbl: sa.Table, user_tbl: sa.Table
) -> Any:
"""Build the version → transaction → user left-join used by both
:func:`list_versions` and :func:`get_version`. The user-side join is
a left-outer so saves with no Flask user context (CLI, Celery, import)
still surface in the result with ``changed_by = None``.
"""
return ver_tbl.join(tx_tbl, ver_tbl.c.transaction_id == tx_tbl.c.id).outerjoin(
user_tbl, tx_tbl.c.user_id == user_tbl.c.id
)
def _baseline_first_ordering(ver_tbl: sa.Table) -> tuple[Any, ...]:
"""Order ``(operation_type != 0).asc(), transaction_id.asc()`` so any
op=0 row — Continuum's INSERT or our synthetic baseline — sorts to
position 0 regardless of its transaction_id. A single entity never has
more than one op=0 row (Continuum tracks one creation per live entity;
our baseline listener only fires when no prior version rows exist), so
this gives a stable chronological order with the "original" version
always first.
"""
return (
(ver_tbl.c.operation_type != 0).asc(),
ver_tbl.c.transaction_id.asc(),
)
def _user_select_cols(user_tbl: sa.Table) -> list[Any]:
"""Columns to select from ``user_tbl`` to build a ``changed_by`` dict.
Labels ``user_tbl.c.id`` as ``"user_id"`` so callers can read the row
by a stable key regardless of whether they also select the version
table's ``id`` column.
"""
return [
user_tbl.c.id.label("user_id"),
user_tbl.c.username,
user_tbl.c.first_name,
user_tbl.c.last_name,
]
def _changed_by_from_row(row: Any) -> Optional[dict[str, Any]]:
"""Project the user columns from a query row onto the API's
``changed_by`` shape, or ``None`` for saves with no Flask user context
(CLI / Celery / import / unauthenticated). Expects the user columns to
have been selected via :func:`_user_select_cols` so the row keys are
``user_id`` / ``username`` / ``first_name`` / ``last_name``.
"""
if row["user_id"] is None:
return None
return {
"id": row["user_id"],
"username": row["username"],
"first_name": row["first_name"],
"last_name": row["last_name"],
}
def _entity_kind_for(model_cls: type) -> Optional[str]:
"""Return the ``version_changes.entity_kind`` value for *model_cls*, or
``None`` when the class isn't in the change-records taxonomy."""
# pylint: disable=import-outside-toplevel
from superset.versioning.changes import _ENTITY_KIND_BY_CLASS_NAME
return _ENTITY_KIND_BY_CLASS_NAME.get(model_cls.__name__)
def find_active_by_uuid(model_cls: type, entity_uuid: UUID) -> Optional[Any]:
"""Return the live entity matching *entity_uuid*, or None if not found.
Soft-delete filtering (deleted_at IS NOT NULL → return None) will be
added when sc-103157 is merged (T043).
"""
return (
db.session.query(model_cls)
.filter(model_cls.uuid == entity_uuid) # type: ignore[attr-defined]
.one_or_none()
)
def _get_version_count(model_cls: type, entity_id: int) -> int:
"""Return the number of historical version rows for *entity_id*."""
ver_cls = version_class(model_cls)
return (
db.session.query(sa.func.count())
.select_from(ver_cls)
.filter(ver_cls.id == entity_id)
.scalar()
or 0
)
def current_version_number(model_cls: type, entity_id: int) -> Optional[int]:
"""Return the 0-based ``version_number`` of the live row for *entity_id*
— equivalent to the index of the most recent entry that
:func:`list_versions` would return, or ``None`` when the entity has no
version rows yet.
Note: this index is *unstable under retention pruning*. The scheduled
:func:`prune_old_versions` task drops shadow rows whose owning
``version_transaction`` is older than
:envvar:`SUPERSET_VERSION_HISTORY_RETENTION_DAYS`, so the same integer
can refer to different rows before and after a prune cycle. Use
:func:`current_live_transaction_id` for a stable identifier.
"""
count = _get_version_count(model_cls, entity_id)
return count - 1 if count > 0 else None
def current_live_transaction_id(model_cls: type, entity_id: int) -> Optional[int]:
"""Return the Continuum ``transaction_id`` of the live row for
*entity_id* — stable across retention pruning, unlike the index
returned by :func:`current_version_number`.
"""
ver_cls = version_class(model_cls)
row = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == entity_id)
.filter(ver_cls.end_transaction_id.is_(None))
.order_by(ver_cls.transaction_id.desc())
.limit(1)
.first()
)
return row[0] if row else None
def current_live_version_uuid(
model_cls: type, entity_id: int, entity_uuid: UUID
) -> Optional[UUID]:
"""Return the deterministic ``version_uuid`` of the live row, or
``None`` when the entity has no version rows yet."""
tx_id = current_live_transaction_id(model_cls, entity_id)
if tx_id is None:
return None
return derive_version_uuid(entity_uuid, tx_id)
def list_change_records_batch(
entity_kind: str,
entity_id: int,
transaction_ids: list[int],
) -> dict[int, list[dict[str, Any]]]:
"""Return ``version_changes`` rows keyed by ``transaction_id``.
Batches the lookup across multiple transactions with a single
``WHERE transaction_id IN (...) AND entity_kind = ? AND entity_id = ?``
query so the list endpoint avoids N+1 round-trips. Rows are
distributed into per-tx lists sorted by ``sequence`` ascending
(matching the replay order the diff engine emits). Missing
transactions are represented by an empty list in the result so
callers can use ``result.get(tx_id, [])`` without guarding.
If the ``version_changes`` table is missing (pre-migration or
freshly downgraded), returns an empty dict rather than propagating
the error — consistent with this being a descriptive layer that
should not break the list endpoint.
"""
# pylint: disable=import-outside-toplevel
from superset.versioning.changes import version_changes_table
if not transaction_ids:
return {}
try:
rows = (
db.session.connection()
.execute(
sa.select(
version_changes_table.c.transaction_id,
version_changes_table.c.sequence,
version_changes_table.c.kind,
version_changes_table.c.path,
version_changes_table.c.from_value,
version_changes_table.c.to_value,
)
.where(
version_changes_table.c.entity_kind == entity_kind,
version_changes_table.c.entity_id == entity_id,
version_changes_table.c.transaction_id.in_(transaction_ids),
)
.order_by(
version_changes_table.c.transaction_id.asc(),
version_changes_table.c.sequence.asc(),
)
)
.mappings()
.all()
)
except sa.exc.OperationalError:
return {}
grouped: dict[int, list[dict[str, Any]]] = {tx: [] for tx in transaction_ids}
for row in rows:
grouped[row["transaction_id"]].append(
{
"kind": row["kind"],
"path": row["path"],
"from_value": row["from_value"],
"to_value": row["to_value"],
}
)
return grouped
def list_versions(
model_cls: type,
entity_uuid: UUID,
) -> Optional[list[dict[str, Any]]]:
"""Return the version history for the entity identified by *entity_uuid*.
Returns ``None`` when no active entity matches the UUID — callers should
translate that into a 404. Returns an empty list when the entity exists
but has no version rows yet (pre-migration, or never edited).
The list is ordered by ``transaction_id`` ascending and each entry is
assigned a 0-based sequential ``version_number``. ``operation_type`` is
mapped from Continuum's integer constants to a string (``0`` → baseline,
``1`` → update, ``2`` → delete). ``changed_by`` is the User row keyed
off ``version_transaction.user_id``, or ``None`` when the save had no
Flask user context (CLI, import, etc.).
"""
entity = find_active_by_uuid(model_cls, entity_uuid)
if entity is None:
return None
ver_tbl, tx_tbl, user_tbl = _resolve_version_tables(model_cls)
stmt = (
sa.select(
ver_tbl.c.transaction_id,
ver_tbl.c.operation_type,
tx_tbl.c.issued_at,
*_user_select_cols(user_tbl),
)
.select_from(_version_with_tx_user_join(ver_tbl, tx_tbl, user_tbl))
.where(ver_tbl.c.id == entity.id)
.order_by(*_baseline_first_ordering(ver_tbl))
)
rows = db.session.execute(stmt).mappings().all()
# Batch-load change records for every listed transaction in one query
# (T050). ``entity_kind`` is derived from the model class so the API
# filter ``WHERE entity_kind = 'chart' AND entity_id = ?`` can be
# precise when multiple versioned entities share a flush.
changes_by_tx: dict[int, list[dict[str, Any]]] = {}
if (entity_kind := _entity_kind_for(model_cls)) is not None:
tx_ids = [row["transaction_id"] for row in rows]
changes_by_tx = list_change_records_batch(entity_kind, entity.id, tx_ids)
return [
{
"version_uuid": derive_version_uuid(entity_uuid, row["transaction_id"]),
"version_number": version_number,
"transaction_id": row["transaction_id"],
"operation_type": _OP_TYPE_LABELS.get(
row["operation_type"], str(row["operation_type"])
),
"issued_at": row["issued_at"],
"changed_by": _changed_by_from_row(row),
"changes": changes_by_tx.get(row["transaction_id"], []),
}
for version_number, row in enumerate(rows)
]
def resolve_version_uuid(
model_cls: type, entity_uuid: UUID, version_uuid: UUID
) -> Optional[int]:
"""Translate a ``version_uuid`` into the 0-based ``version_number`` that
:func:`superset.versioning.restore.restore_version` accepts, or ``None``
when the UUID does not match any version row of the given entity.
Ordering matches :func:`list_versions` — op=0 rows first, then by
transaction_id — so the version_number returned here is the same index
a client would see in the list response.
Implementation note: the loop re-derives ``version_uuid`` per
transaction in Python because there's no portable SQL form for a
UUIDv5 derivation across PostgreSQL / MySQL / SQLite (Postgres has
``uuid_generate_v5``; the other two do not). The iteration count is
bounded by ``SUPERSET_VERSION_HISTORY_RETENTION_DAYS`` worth of
edits — the retention task ages older shadow rows out — so the
practical N is at most a few hundred. If retention is ever
disabled (``= 0``) on a heavily-edited entity, this loop is the
place to revisit.
"""
entity = find_active_by_uuid(model_cls, entity_uuid)
if entity is None:
return None
ver_cls = version_class(model_cls)
tx_ids = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == entity.id)
.order_by(
(ver_cls.operation_type != 0).asc(),
ver_cls.transaction_id.asc(),
)
.all()
)
for version_number, (tx_id,) in enumerate(tx_ids):
if derive_version_uuid(entity_uuid, tx_id) == version_uuid:
return version_number
return None
def get_version(
model_cls: type,
entity_uuid: UUID,
version_uuid: UUID,
) -> Optional[dict[str, Any]]:
"""Return the entity's state at the specified version as a dict.
Read-only — nothing in the live database is modified. The returned
shape is intended to mirror a regular single-entity GET response
(scalar columns plus restored ``columns`` / ``metrics`` lists for
``SqlaTable``), with a ``_version`` key holding the version-level
metadata (uuid, transaction_id, operation_type, issued_at,
changed_by) so callers can tell which version they're looking at.
Returns ``None`` when either *entity_uuid* or *version_uuid* does not
match — callers should translate to 404.
"""
# pylint: disable=import-outside-toplevel
from superset.connectors.sqla.models import SqlaTable
entity = find_active_by_uuid(model_cls, entity_uuid)
if entity is None:
return None
version_num = resolve_version_uuid(model_cls, entity_uuid, version_uuid)
if version_num is None:
return None
ver_tbl, tx_tbl, user_tbl = _resolve_version_tables(model_cls)
stmt = (
sa.select(
ver_tbl,
tx_tbl.c.issued_at,
*_user_select_cols(user_tbl),
)
.select_from(_version_with_tx_user_join(ver_tbl, tx_tbl, user_tbl))
.where(ver_tbl.c.id == entity.id)
.order_by(*_baseline_first_ordering(ver_tbl))
.offset(version_num)
.limit(1)
)
row = db.session.execute(stmt).mappings().first()
if row is None:
return None
# Project the entity's own scalar fields, skipping versioning
# metadata columns.
result: dict[str, Any] = {}
for col in ver_tbl.columns:
if col.name in {"transaction_id", "end_transaction_id", "operation_type"}:
continue
value = row[col.name]
# uuid columns come back as UUID instances; make them JSON-safe.
if isinstance(value, UUID):
value = str(value)
result[col.name] = value
changes: list[dict[str, Any]] = []
if (entity_kind := _entity_kind_for(model_cls)) is not None:
changes = list_change_records_batch(
entity_kind, entity.id, [row["transaction_id"]]
).get(row["transaction_id"], [])
result["_version"] = {
"version_uuid": str(version_uuid),
"version_number": version_num,
"transaction_id": row["transaction_id"],
"operation_type": _OP_TYPE_LABELS.get(
row["operation_type"], str(row["operation_type"])
),
"issued_at": row["issued_at"],
"changed_by": _changed_by_from_row(row),
"changes": changes,
}
# For datasets, attach the columns/metrics as they were at this
# transaction by reading from Continuum's child shadow tables
# (``table_columns_version`` / ``sql_metrics_version``). Empty lists
# when the dataset had no children at this tx.
if model_cls is SqlaTable:
# pylint: disable=import-outside-toplevel
from superset.connectors.sqla.models import SqlMetric, TableColumn
from superset.versioning.changes import _shadow_rows_valid_at
target_tx = row["transaction_id"]
cols_tbl = version_class(TableColumn).__table__
metrics_tbl = version_class(SqlMetric).__table__
result["columns"] = _shadow_rows_valid_at(
db.session, cols_tbl, "table_id", entity.id, target_tx
)
result["metrics"] = _shadow_rows_valid_at(
db.session, metrics_tbl, "table_id", entity.id, target_tx
)
return result

View File

@@ -0,0 +1,138 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Write-side: restore a versioned entity to an earlier state.
Companion to :mod:`superset.versioning.queries`. The
``BaseRestoreVersionCommand`` in :mod:`superset.commands.version_restore`
is the only intended caller; the backward-compat ``VersionDAO`` façade
in :mod:`superset.daos.version` re-exports ``restore_version`` for
existing call sites.
"""
from __future__ import annotations
import logging
from typing import Any, Optional
from uuid import UUID
from sqlalchemy_continuum import version_class
from superset.extensions import db
from superset.versioning.queries import find_active_by_uuid
from superset.versioning.utils import single_flush_scope
logger = logging.getLogger(__name__)
# Per-model relationships that Continuum's Reverter recurses into during a
# restore. Each restore replays the listed relationships from the version-
# side shadow tables onto the live entity. Children versioned through
# Continuum (``TableColumn`` / ``SqlMetric`` on ``SqlaTable``;
# ``dashboard_slices`` M2M on ``Dashboard``) come back automatically;
# ``Slice`` has no child collections to recurse into so its list is empty.
_RESTORE_RELATIONS: dict[str, list[str]] = {
"SqlaTable": ["columns", "metrics"],
"Dashboard": ["slices"],
"Slice": [],
}
def restore_version(
model_cls: type,
entity_uuid: UUID,
version_num: int,
) -> Optional[Any]:
"""Restore the entity identified by *entity_uuid* to the state captured
by *version_num* (0-based, as returned by
:func:`superset.versioning.queries.list_versions`).
Returns the live entity after the restore, or ``None`` when either the
UUID does not match an active entity or ``version_num`` is out of
range — callers should translate both to a 404.
Uses SQLAlchemy-Continuum's native ``version_obj.revert(relations=...)``
and delegates commit to the caller (expected to be a command decorated
with ``@transaction()``). The ``relations`` list depends on the model
type and is looked up in :data:`_RESTORE_RELATIONS`.
After the revert, ``changed_on`` / ``changed_by_fk`` are re-stamped
with the current time and the restoring user's id (see
:func:`_stamp_audit_fields_for_restore`) so the new version row
produced by the restoring commit reflects who clicked Restore, not
the original author. ``created_on`` / ``created_by_fk`` are left
alone.
"""
entity = find_active_by_uuid(model_cls, entity_uuid)
if entity is None:
return None
ver_cls = version_class(model_cls)
# version_num is a 0-based positional index, matching what
# ``list_versions`` emits. Ordering keeps op=0 rows first so position 0
# is always the baseline/INSERT.
target_version = (
db.session.query(ver_cls)
.filter(ver_cls.id == entity.id)
.order_by(
(ver_cls.operation_type != 0).asc(),
ver_cls.transaction_id.asc(),
)
.offset(version_num)
.limit(1)
.first()
)
if target_version is None:
return None
# Run the whole multi-relationship revert inside a single flush scope
# so SQLAlchemy-Continuum's ``Reverter`` can iterate relations without
# tripping its autoflush race, and so the change-records listener sees
# the complete shadow state in one ``after_flush`` pass. See
# ``single_flush_scope`` for the full rationale.
relations = _RESTORE_RELATIONS.get(model_cls.__name__, [])
try:
with single_flush_scope(db.session):
target_version.revert(relations=relations)
except Exception:
logger.exception(
"Continuum revert() failed for %s id=%s tx=%s relations=%s",
model_cls.__name__,
entity.id,
target_version.transaction_id,
relations,
)
raise
_stamp_audit_fields_for_restore(entity)
return entity
def _stamp_audit_fields_for_restore(entity: Any) -> None:
"""Overwrite ``changed_on`` / ``changed_by_fk`` on *entity* with the
current time and current user id, so that the restore is attributed
to the restoring user rather than the version snapshot's original
author."""
# pylint: disable=import-outside-toplevel
from datetime import datetime
from superset.utils.core import get_user_id
if hasattr(entity, "changed_on"):
entity.changed_on = datetime.now()
if hasattr(entity, "changed_by_fk"):
entity.changed_by_fk = get_user_id()

View File

@@ -0,0 +1,128 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Shared Marshmallow schemas for entity version history endpoints.
Consumed by ChartRestApi, DashboardRestApi, and DatasetRestApi — the response
shape is identical across all three resources, so the schemas live here to
avoid triplicated definitions.
"""
from __future__ import annotations
from marshmallow import fields, Schema
class VersionChangedBySchema(Schema):
"""Subset of the User model included in each version history entry."""
id = fields.Integer()
username = fields.String()
first_name = fields.String()
last_name = fields.String()
class VersionChangeRecordSchema(Schema):
"""One field-level diff hunk from ``version_changes``.
The frontend renders human-readable prose from (``kind``,
``from_value``, ``to_value``) via Flask-Babel. Server-side the
shape is deliberately machine-readable only — see spec FR-019.
"""
kind = fields.String(
metadata={
"description": (
"Semantic category of the change. First-class values in V1: "
"'filter', 'metric', 'dimension', 'column', 'chart', "
"'time_range', 'color_palette'. Falls back to 'field' for "
"generic scalar changes that don't map to a named kind."
)
},
)
path = fields.Raw(
metadata={
"description": (
"Array of segments locating the change in the entity's state. "
"Example: ['params', 'adhoc_filters', 'country']."
)
},
)
from_value = fields.Raw(
allow_none=True,
metadata={
"description": (
"Value at path before the save; null when the field did not exist."
),
},
)
to_value = fields.Raw(
allow_none=True,
metadata={
"description": (
"Value at path after the save; null when the field was removed."
),
},
)
class VersionListItemSchema(Schema):
"""A single version row in the version history response."""
version_number = fields.Integer(
metadata={"description": "0-based position in the history, oldest first"},
)
transaction_id = fields.Integer(
metadata={"description": "Underlying Continuum transaction id"},
)
operation_type = fields.String(
metadata={
"description": (
"One of 'baseline', 'update', 'delete', 'restore'. Derived "
"from the Continuum integer constant."
)
},
)
issued_at = fields.DateTime(
metadata={"description": "UTC timestamp of the commit that produced the row"},
)
changed_by = fields.Nested(
VersionChangedBySchema,
allow_none=True,
metadata={
"description": (
"User who produced the version, or null when the commit had no "
"authenticated Flask user (CLI, Celery, import)."
)
},
)
changes = fields.List(
fields.Nested(VersionChangeRecordSchema),
metadata={
"description": (
"Structured diff records describing the atomic field-level "
"changes at this version, ordered by emission sequence. "
"Empty for baseline (op=0) transactions per spec M4."
)
},
)
class VersionListResponseSchema(Schema):
"""Envelope for version list responses."""
result = fields.List(fields.Nested(VersionListItemSchema))
count = fields.Integer()

View File

@@ -0,0 +1,80 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Shared session helpers used by the entity-versioning machinery."""
from __future__ import annotations
from contextlib import contextmanager
from typing import Any, Iterator, Optional
import sqlalchemy as sa
from sqlalchemy.orm import Session
@contextmanager
def single_flush_scope(session: Session) -> Iterator[None]:
"""Suppress autoflushes inside the block, flush once on clean exit.
Intended for operations that (a) make multiple mutations across
relationships and (b) issue intermediate queries which would
otherwise autoflush. Iterating from one relationship to another
inside SQLAlchemy-Continuum's ``Reverter`` is the canonical case:
a mid-iteration autoflush transitions pending DELETEs to
``state.deleted=True``, and the subsequent
``session.add(version_parent)`` cascade walk trips on the
deleted-state instances with ``InvalidRequestError``. Wrapping the
whole revert keeps marked-for-deletion instances in
``state.persistent`` until the trailing flush drains DELETEs +
INSERTs in one atomic step. That single flush is also load-bearing
for the ``after_flush`` change-records listener — splitting the
work across multiple flushes would split it across multiple
Continuum transactions, and the listener's tx-dedup guard would
silently drop the second pass's records.
On exception, the trailing flush is skipped — the session's normal
rollback flow handles cleanup, and flushing a partially-mutated
state would be wrong.
"""
with session.no_autoflush:
yield
session.flush()
def read_row_outside_flush(
session: Session, table: sa.Table, entity_id: int
) -> Optional[dict[str, Any]]:
"""Read the row with ``id == entity_id`` from *table* without triggering
an autoflush. Returns the row as a plain dict, or ``None`` when no row
matches.
The companion read primitive to :func:`single_flush_scope`. Listeners
that need pre-flush state (the row as it existed *before* the in-flight
edit was staged) use this — without ``no_autoflush``, the
``session.connection().execute(...)`` would itself trigger a flush of
the pending edit, leaving "pre" and "post" indistinguishable.
Returns ``dict[str, Any]`` rather than ``RowMapping`` so callers don't
accidentally hold a cursor-bound object past the listener boundary.
"""
with session.no_autoflush:
result = (
session.connection()
.execute(sa.select(table).where(table.c.id == entity_id))
.mappings()
.one_or_none()
)
return dict(result) if result else None

View File

@@ -0,0 +1,591 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Integration tests for chart (Slice) version history capture.
T014 — chart version capture
T017 — baseline row capture
T018 (partial) — retention pruning (chart side)
T026 — chart version list endpoint
"""
from __future__ import annotations
from typing import Any
import pytest
from sqlalchemy_continuum import version_class
from superset.extensions import db
from superset.models.slice import Slice
from superset.utils import json as _json
from tests.integration_tests.base_tests import SupersetTestCase
from tests.integration_tests.constants import ADMIN_USERNAME
from tests.integration_tests.fixtures.birth_names_dashboard import ( # noqa: F401
load_birth_names_dashboard_with_slices,
load_birth_names_data,
)
def _get_version_rows(chart: Slice) -> list[Any]:
ver_cls = version_class(Slice)
return (
db.session.query(ver_cls)
.filter(ver_cls.id == chart.id)
.order_by(ver_cls.transaction_id.asc())
.all()
)
def _persist_fixture_state() -> None:
"""Force fixture's pending INSERTs to commit in their own transaction.
The birth_names fixture stages charts and the dashboard via session.add()
but does not commit. Without this, the test's first commit batches the
INSERTs and UPDATEs into the same Continuum transaction, causing the
existing version row to be updated in place instead of a new one being
created.
"""
db.session.commit()
class TestChartVersionCapture(SupersetTestCase):
"""T014 — version rows are created on save; no spurious extra rows."""
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: PT004, F811
pass
def test_single_save_creates_one_version_row(self) -> None:
"""Saving a chart for the first time creates exactly one version row."""
_persist_fixture_state()
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Girls").first()
)
assert chart is not None
# Trigger a save (update a scalar field)
original_name = chart.slice_name
chart.slice_name = "Girls (edited)"
db.session.commit()
rows = _get_version_rows(chart)
# Two rows: baseline (operation_type=0) + edit (operation_type=1)
assert len(rows) == 2, f"Expected 2 version rows, got {len(rows)}"
assert rows[0].operation_type == 0 # baseline
assert rows[1].operation_type == 1 # update
# Cleanup
chart.slice_name = original_name
db.session.commit()
def test_two_saves_create_exactly_two_version_rows_after_baseline(self) -> None:
"""Second save adds exactly one more version row (no duplicate rows)."""
_persist_fixture_state()
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Boys").first()
)
assert chart is not None
original_name = chart.slice_name
chart.slice_name = "Boys v1"
db.session.commit()
rows_after_first = _get_version_rows(chart)
# baseline + v1 = 2 rows
assert len(rows_after_first) == 2
chart.slice_name = "Boys v2"
db.session.commit()
rows_after_second = _get_version_rows(chart)
# baseline + v1 + v2 = 3 rows
assert len(rows_after_second) == 3
assert rows_after_second[-1].slice_name == "Boys v2"
# Cleanup
chart.slice_name = original_name
db.session.commit()
class TestChartBaselineCapture(SupersetTestCase):
"""T017 — the baseline listener inserts a pre-edit snapshot row (operation_type=0).""" # noqa: E501
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: PT004, F811
pass
def test_baseline_row_has_pre_edit_state(self) -> None:
"""The baseline row captures the field value *before* the first edit."""
_persist_fixture_state()
chart: Slice = (
db.session.query(Slice)
.filter(Slice.slice_name == "Top 10 Girl Name Share")
.first()
)
assert chart is not None
pre_edit_name = chart.slice_name
chart.slice_name = "Top 10 Girl Name Share (baseline test)"
db.session.commit()
rows = _get_version_rows(chart)
assert rows[0].operation_type == 0 # baseline row
assert rows[0].slice_name == pre_edit_name # pre-edit name preserved
# Cleanup
chart.slice_name = pre_edit_name
db.session.commit()
def test_baseline_row_is_at_position_zero_for_preexisting_entity(self) -> None:
"""When an entity has zero Continuum history (e.g. created before
versioning was enabled), our baseline listener must produce a row
that sorts to version_number 0 — i.e. its transaction_id must be
strictly less than the UPDATE row Continuum writes in the same
commit."""
_persist_fixture_state()
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Participants").first()
)
assert chart is not None
chart_id = chart.id
original_name = chart.slice_name
# Wipe this chart's Continuum history so our baseline listener has
# count==0 on the next save — simulating a pre-existing entity.
ver_cls = version_class(Slice)
db.session.query(ver_cls).filter(ver_cls.id == chart_id).delete(
synchronize_session=False
)
db.session.commit()
chart.slice_name = "Participants (preexisting baseline test)"
db.session.commit()
rows = _get_version_rows(chart)
pairs = [(r.operation_type, r.transaction_id) for r in rows]
assert len(rows) == 2, f"Expected baseline + update; got {pairs}"
assert rows[0].operation_type == 0, (
f"Position 0 should be the baseline (op=0); got "
f"op={rows[0].operation_type} at tx={rows[0].transaction_id}"
)
assert rows[0].slice_name == original_name, (
"The baseline row must carry the pre-edit slice_name"
)
assert rows[0].transaction_id < rows[1].transaction_id, (
"Baseline's transaction_id must be less than the update's so it "
"sorts to position 0"
)
# Cleanup
chart.slice_name = original_name
db.session.commit()
def test_no_duplicate_baseline_on_subsequent_saves(self) -> None:
"""Subsequent saves do NOT add a second baseline row."""
_persist_fixture_state()
chart: Slice = (
db.session.query(Slice)
.filter(Slice.slice_name == "Top 10 Boy Name Share")
.first()
)
assert chart is not None
original_name = chart.slice_name
chart.slice_name = "Top 10 Boy Name Share v1"
db.session.commit()
chart.slice_name = "Top 10 Boy Name Share v2"
db.session.commit()
baseline_rows = [r for r in _get_version_rows(chart) if r.operation_type == 0]
assert len(baseline_rows) == 1, "Should have exactly one baseline row"
# Cleanup
chart.slice_name = original_name
db.session.commit()
class TestChartVersionListApi(SupersetTestCase):
"""T026 — GET /api/v1/chart/<uuid>/versions/ endpoint."""
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: PT004, F811
pass
def _list_versions(self, chart_uuid: str) -> Any:
return self.client.get(f"/api/v1/chart/{chart_uuid}/versions/")
def test_list_versions_returns_ordered_sequence(self) -> None:
"""Three saves produce three rows in ascending version_number order."""
_persist_fixture_state()
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Girls").first()
)
assert chart is not None
original_name = chart.slice_name
chart_uuid = str(chart.uuid)
for i in range(3):
chart.slice_name = f"Girls v{i}"
db.session.commit()
self.login(ADMIN_USERNAME)
rv = self._list_versions(chart_uuid)
assert rv.status_code == 200
body = _json.loads(rv.data.decode("utf-8"))
# Baseline + three updates = 4 rows; we only need to check the last 3
# are the updates we just made in order.
assert body["count"] == len(body["result"])
assert len(body["result"]) >= 3
for idx, entry in enumerate(body["result"]):
assert entry["version_number"] == idx
assert entry["issued_at"] is not None
# Timestamps are monotonically non-decreasing.
timestamps = [e["issued_at"] for e in body["result"]]
assert timestamps == sorted(timestamps)
# Cleanup
chart.slice_name = original_name
db.session.commit()
def test_list_versions_empty_for_untouched_entity(self) -> None:
"""A chart with no version rows returns [] (not 404)."""
_persist_fixture_state()
# Create a chart without subsequently editing it.
chart = Slice(
slice_name="Untouched chart for version list test",
datasource_type="table",
viz_type="table",
)
db.session.add(chart)
db.session.commit()
chart_uuid = str(chart.uuid)
# Purge the INSERT version row so the history is genuinely empty.
ver_cls = version_class(Slice)
db.session.query(ver_cls).filter(ver_cls.id == chart.id).delete(
synchronize_session=False
)
db.session.commit()
self.login(ADMIN_USERNAME)
rv = self._list_versions(chart_uuid)
assert rv.status_code == 200
body = _json.loads(rv.data.decode("utf-8"))
assert body["count"] == 0
assert body["result"] == []
# Cleanup
db.session.delete(chart)
db.session.commit()
def test_list_versions_returns_404_for_unknown_uuid(self) -> None:
"""An unknown UUID returns 404."""
self.login(ADMIN_USERNAME)
rv = self._list_versions("00000000-0000-0000-0000-000000000000")
assert rv.status_code == 404
def test_list_versions_returns_400_for_invalid_uuid(self) -> None:
"""A malformed UUID string is rejected with 400."""
self.login(ADMIN_USERNAME)
rv = self._list_versions("not-a-uuid")
assert rv.status_code == 400
@pytest.mark.skip(
reason=(
"Superset's default Gamma role has can_write on Chart — there is "
"no built-in no-write user to exercise the 403 branch for this "
"resource. See dataset tests (T028) for a working 403 check."
)
)
def test_list_versions_denies_without_write_permission(self) -> None:
"""A user without can_write on Chart gets 403."""
def test_list_versions_admin_sees_all_entities(self) -> None:
"""FR-013: workspace admin can list versions for any entity."""
_persist_fixture_state()
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Boys").first()
)
assert chart is not None
chart_uuid = str(chart.uuid)
self.login(ADMIN_USERNAME)
rv = self._list_versions(chart_uuid)
assert rv.status_code == 200
class TestChartRestoreApi(SupersetTestCase):
"""T037 — POST /api/v1/chart/<uuid>/versions/<version_uuid>/restore."""
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: PT004, F811
pass
def _restore(self, chart_uuid: str, version_uuid: str) -> Any:
return self.client.post(
f"/api/v1/chart/{chart_uuid}/versions/{version_uuid}/restore"
)
def _list(self, chart_uuid: str) -> Any:
return self.client.get(f"/api/v1/chart/{chart_uuid}/versions/")
def test_restore_applies_scalar_field_from_target_version(self) -> None:
"""Restoring version 0 puts the slice_name back to its pre-edit value
and appends a new version entry."""
_persist_fixture_state()
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Girls").first()
)
assert chart is not None
chart_uuid = str(chart.uuid)
original_name = chart.slice_name
# Produce two additional saves so version history is 0/1/2.
chart.slice_name = "Girls v1"
db.session.commit()
chart.slice_name = "Girls v2"
db.session.commit()
self.login(ADMIN_USERNAME)
rv_list = self._list(chart_uuid)
assert rv_list.status_code == 200
listing = _json.loads(rv_list.data.decode("utf-8"))
initial_count = listing["count"]
assert initial_count >= 3
target_uuid = listing["result"][0]["version_uuid"]
# Restore to the first version (the original "Girls" name).
rv = self._restore(chart_uuid, target_uuid)
assert rv.status_code == 200, rv.data
# Live state matches the restored snapshot.
db.session.expire_all()
chart = db.session.query(Slice).filter(Slice.uuid == chart.uuid).one()
assert chart.slice_name == original_name
# A new version row was recorded (non-destructive).
rv_list2 = self._list(chart_uuid)
body = _json.loads(rv_list2.data.decode("utf-8"))
assert body["count"] == initial_count + 1
# Cleanup
chart.slice_name = original_name
db.session.commit()
def test_restore_returns_404_for_unknown_uuid(self) -> None:
self.login(ADMIN_USERNAME)
rv = self._restore(
"00000000-0000-0000-0000-000000000000",
"00000000-0000-0000-0000-000000000001",
)
assert rv.status_code == 404
def test_restore_returns_404_for_unknown_version_uuid(self) -> None:
_persist_fixture_state()
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Boys").first()
)
assert chart is not None
self.login(ADMIN_USERNAME)
rv = self._restore(str(chart.uuid), "00000000-0000-0000-0000-000000000099")
assert rv.status_code == 404
def test_restore_returns_400_for_invalid_entity_uuid(self) -> None:
self.login(ADMIN_USERNAME)
rv = self._restore("not-a-uuid", "00000000-0000-0000-0000-000000000001")
assert rv.status_code == 400
def test_restore_returns_400_for_invalid_version_uuid(self) -> None:
_persist_fixture_state()
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Boys").first()
)
assert chart is not None
self.login(ADMIN_USERNAME)
rv = self._restore(str(chart.uuid), "not-a-uuid")
assert rv.status_code == 400
def test_get_version_returns_historical_snapshot(self) -> None:
"""GET /versions/<uuid>/ returns the chart's fields at that version
without modifying live state."""
_persist_fixture_state()
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Girls").first()
)
assert chart is not None
chart_uuid = str(chart.uuid)
original_name = chart.slice_name
chart.slice_name = "Girls (v1)"
db.session.commit()
self.login(ADMIN_USERNAME)
listing = _json.loads(self._list(chart_uuid).data.decode("utf-8"))
assert listing["count"] >= 2
# The earliest entry should still hold the original slice_name.
first_version_uuid = listing["result"][0]["version_uuid"]
rv = self.client.get(
f"/api/v1/chart/{chart_uuid}/versions/{first_version_uuid}/"
)
assert rv.status_code == 200, rv.data
body = _json.loads(rv.data.decode("utf-8"))["result"]
assert body["slice_name"] == original_name
assert body["_version"]["version_uuid"] == first_version_uuid
assert body["_version"]["version_number"] == 0
# Live row unchanged.
db.session.expire_all()
live = db.session.query(Slice).filter(Slice.uuid == chart.uuid).one()
assert live.slice_name == "Girls (v1)"
# Cleanup
live.slice_name = original_name
db.session.commit()
def test_get_version_returns_404_for_unknown_entity(self) -> None:
self.login(ADMIN_USERNAME)
rv = self.client.get(
"/api/v1/chart/00000000-0000-0000-0000-000000000000"
"/versions/00000000-0000-0000-0000-000000000001/"
)
assert rv.status_code == 404
def test_get_version_returns_400_for_invalid_uuid(self) -> None:
self.login(ADMIN_USERNAME)
rv = self.client.get(
"/api/v1/chart/not-a-uuid/versions/00000000-0000-0000-0000-000000000001/"
)
assert rv.status_code == 400
def test_restore_stamps_changed_by_with_restoring_user(self) -> None:
"""After a restore, changed_by_fk on the live entity must point at
the restoring user (not at whoever authored the version being
restored). created_by_fk stays unchanged. The new version row
produced by the restore also carries the restoring user in its
changed_by metadata.
"""
from superset.daos.version import derive_version_uuid
_persist_fixture_state()
self.login(ADMIN_USERNAME)
admin_id = self.get_user(ADMIN_USERNAME).id
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Girls").first()
)
assert chart is not None
chart_id = chart.id
chart_uuid = str(chart.uuid)
entity_uuid = chart.uuid
original_name = chart.slice_name
original_created_by = chart.created_by_fk
before_changed_on = chart.changed_on
# Produce a second version to restore to.
chart.slice_name = "Girls v1"
db.session.commit()
ver_cls = version_class(Slice)
first_tx = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == chart_id)
.order_by(ver_cls.transaction_id.asc())
.limit(1)
.scalar()
)
assert first_tx is not None
target_uuid = str(derive_version_uuid(entity_uuid, first_tx))
rv = self.client.post(
f"/api/v1/chart/{chart_uuid}/versions/{target_uuid}/restore"
)
assert rv.status_code == 200, rv.data
db.session.expire_all()
chart = db.session.query(Slice).filter(Slice.id == chart_id).one()
# Live entity checks.
assert chart.slice_name == original_name
assert chart.created_by_fk == original_created_by
assert chart.changed_by_fk == admin_id, (
f"Expected changed_by_fk to be restoring user id={admin_id}, "
f"got {chart.changed_by_fk}"
)
if before_changed_on is not None and chart.changed_on is not None:
assert chart.changed_on >= before_changed_on
# The new version row produced by the restore must attribute the
# change to the restoring user.
rv_list = self.client.get(f"/api/v1/chart/{chart_uuid}/versions/")
assert rv_list.status_code == 200
body = _json.loads(rv_list.data.decode("utf-8"))
latest_entry = body["result"][-1]
assert latest_entry["changed_by"] is not None, (
"New version row should have a changed_by"
)
assert latest_entry["changed_by"]["id"] == admin_id
# Cleanup
chart.slice_name = original_name
db.session.commit()
def test_put_response_returns_old_and_new_version_numbers(self) -> None:
"""PUT /api/v1/chart/<id> response must include old_version and
new_version matching the list-versions ordering."""
_persist_fixture_state()
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Girls").first()
)
assert chart is not None
chart_id = chart.id
original_name = chart.slice_name
ver_cls = version_class(Slice)
count_before = db.session.query(ver_cls).filter(ver_cls.id == chart_id).count()
expected_old = count_before - 1 if count_before > 0 else None
self.login(ADMIN_USERNAME)
rv = self.client.put(
f"/api/v1/chart/{chart_id}",
json={"slice_name": "put-response-version-test"},
)
assert rv.status_code == 200, rv.data
body = _json.loads(rv.data.decode("utf-8"))
assert body["id"] == chart_id
assert body["old_version"] == expected_old
assert body["new_version"] is not None
assert "old_transaction_id" in body
assert "new_transaction_id" in body
if body["old_transaction_id"] is not None:
assert body["new_transaction_id"] != body["old_transaction_id"]
# Cleanup
chart = db.session.query(Slice).filter(Slice.id == chart_id).one()
chart.slice_name = original_name
db.session.commit()
@pytest.mark.skip(
reason=(
"Per-entity ownership isn't enforced yet for the restore path — "
"raise_for_ownership is called inside validate(), but Gamma has "
"can_write on Chart so the admin-only assertion needs a custom "
"no-write user setup. See dataset tests (T039) for a working "
"403 check."
)
)
def test_restore_denies_without_write_permission(self) -> None:
"""A user without can_write on Chart gets 403."""

View File

@@ -25,7 +25,6 @@ from superset.connectors.sqla.models import SqlaTable, sqlatable_user
from superset.models.core import Database
from superset.models.dashboard import (
Dashboard,
dashboard_slices,
dashboard_user,
DashboardRoles,
)
@@ -234,9 +233,15 @@ def delete_dashboard_roles_associations(dashboard: Dashboard) -> None:
def delete_dashboard_slices_associations(dashboard: Dashboard) -> None:
db.session.execute(
dashboard_slices.delete().where(dashboard_slices.c.dashboard_id == dashboard.id)
)
# Use ORM-level reassignment instead of `db.session.execute(table.delete())`.
# SQLAlchemy-Continuum's M2M tracker needs row-level visibility to record
# shadow entries; a bulk DELETE via Core bypasses the ORM and produces a
# malformed INSERT into `dashboard_slices_version` (missing the composite-PK
# columns), which fails under MySQL strict mode and produces dead rows on
# Postgres. Mirrors the precedent set by ``DatasetDAO.update_columns``
# being rewritten to ORM-level ``session.delete()`` for the same reason.
dashboard.slices = []
db.session.flush()
def delete_all_inserted_slices():

View File

@@ -0,0 +1,519 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Integration tests for Dashboard version history capture.
T015 — dashboard version capture (single version per save; no extra rows from
process_tab_diff)
T018 — retention pruning (keep at most SUPERSET_VERSION_HISTORY_MAX_VERSIONS)
T027 — dashboard version list endpoint
"""
from __future__ import annotations
from typing import Any
import pytest
from sqlalchemy_continuum import version_class
from superset.extensions import db
from superset.models.dashboard import Dashboard
from superset.utils import json as _json
from tests.integration_tests.base_tests import SupersetTestCase
from tests.integration_tests.constants import ADMIN_USERNAME
from tests.integration_tests.fixtures.birth_names_dashboard import ( # noqa: F401
load_birth_names_dashboard_with_slices,
load_birth_names_data,
)
def _get_version_rows(dashboard: Dashboard) -> list[Any]:
ver_cls = version_class(Dashboard)
return (
db.session.query(ver_cls)
.filter(ver_cls.id == dashboard.id)
.order_by(ver_cls.transaction_id.asc())
.all()
)
def _persist_fixture_state() -> None:
"""Force fixture's pending INSERTs to commit in their own transaction.
The birth_names fixture stages charts and the dashboard via session.add()
but does not commit. Without this, the test's first commit batches the
INSERTs and UPDATEs into the same Continuum transaction, causing the
existing version row to be updated in place instead of a new one being
created.
"""
db.session.commit()
class TestDashboardVersionCapture(SupersetTestCase):
"""T015 — one version row per save; no multiple rows from tab/filter diff processing.""" # noqa: E501
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: PT004, F811
pass
def test_single_save_creates_one_version_row(self) -> None:
"""Saving a dashboard title creates exactly one update version row."""
_persist_fixture_state()
dashboard: Dashboard = (
db.session.query(Dashboard)
.filter(Dashboard.dashboard_title == "USA Births Names")
.first()
)
assert dashboard is not None
# Capture tx IDs that exist before this save — we'll verify that
# exactly ONE new tx_id with operation_type=1 appears after the save
# (comparing by tx_id makes the test robust against retention
# pruning of older rows).
tx_ids_before = {r.transaction_id for r in _get_version_rows(dashboard)}
original_title = dashboard.dashboard_title
dashboard.dashboard_title = "USA Births Names (edited)"
db.session.commit()
rows_after = _get_version_rows(dashboard)
new_update_rows = [
r
for r in rows_after
if r.operation_type == 1 and r.transaction_id not in tx_ids_before
]
assert len(new_update_rows) == 1, (
f"Expected 1 new update row from this save, got {len(new_update_rows)}"
" — possible no_autoflush regression"
)
# Cleanup
dashboard.dashboard_title = original_title
db.session.commit()
def test_second_save_adds_one_row(self) -> None:
"""Each subsequent save adds exactly one more version row."""
_persist_fixture_state()
dashboard: Dashboard = (
db.session.query(Dashboard)
.filter(Dashboard.dashboard_title == "USA Births Names")
.first()
)
assert dashboard is not None
original_title = dashboard.dashboard_title
# Track tx IDs across saves; compare by tx_id to sidestep retention
# pruning of older rows.
tx_before_v1 = {r.transaction_id for r in _get_version_rows(dashboard)}
dashboard.dashboard_title = "USA Births Names v1"
db.session.commit()
tx_after_v1 = {r.transaction_id for r in _get_version_rows(dashboard)}
new_txs_v1 = tx_after_v1 - tx_before_v1
assert len(new_txs_v1) == 1, (
f"Expected 1 new tx from v1 save, got {len(new_txs_v1)}"
)
dashboard.dashboard_title = "USA Births Names v2"
db.session.commit()
tx_after_v2 = {r.transaction_id for r in _get_version_rows(dashboard)}
new_txs_v2 = tx_after_v2 - tx_after_v1
assert len(new_txs_v2) == 1, (
f"Expected 1 new tx from v2 save, got {len(new_txs_v2)}"
)
# Cleanup
dashboard.dashboard_title = original_title
db.session.commit()
class TestDashboardVersionRetention(SupersetTestCase):
"""T018 — retention pruning caps history at SUPERSET_VERSION_HISTORY_MAX_VERSIONS.""" # noqa: E501
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: PT004, F811
pass
def test_retention_prunes_old_rows(self) -> None:
"""``prune_old_versions`` removes shadow rows whose owning
``version_transaction.issued_at`` is older than the retention
window, while preserving the live row and the baseline."""
from datetime import datetime, timedelta
import sqlalchemy as sa
from superset.extensions import db as _db
from superset.tasks.version_history_retention import (
_prune_old_versions_impl,
)
_persist_fixture_state()
dashboard: Dashboard = (
db.session.query(Dashboard)
.filter(Dashboard.dashboard_title == "USA Births Names")
.first()
)
assert dashboard is not None
original_title = dashboard.dashboard_title
try:
# Force a few saves so we have ≥ 2 closed shadow rows plus
# a baseline plus the live row.
for i in range(3):
dashboard.dashboard_title = f"USA Births Names retention test {i}"
db.session.commit()
rows_before = _get_version_rows(dashboard)
assert len(rows_before) >= 3, "Expected at least 3 version rows"
# Backdate every version_transaction row by 100 days so the
# prune sees them as old. Skip baseline+live rows; the prune
# itself preserves them.
from sqlalchemy_continuum import versioning_manager
tx_table = versioning_manager.transaction_cls.__table__
with _db.engine.begin() as conn:
conn.execute(
sa.update(tx_table).values(
issued_at=datetime.utcnow() - timedelta(days=100)
)
)
stats = _prune_old_versions_impl(retention_days=30)
assert stats.get("pruned_transactions", 0) >= 1, stats
rows_after = _get_version_rows(dashboard)
# Live row must still exist (this is the only preservation rule)
live_rows = [r for r in rows_after if r.end_transaction_id is None]
assert len(live_rows) >= 1, "Live row must never be pruned"
# Some rows should have been pruned. Closed historical rows —
# including the synthetic baseline (operation_type=0) — are
# subject to retention like everything else.
assert len(rows_after) < len(rows_before), (
f"Expected fewer rows after prune; before={len(rows_before)} "
f"after={len(rows_after)}"
)
finally:
dashboard.dashboard_title = original_title
db.session.commit()
class TestDashboardVersionListApi(SupersetTestCase):
"""T027 — GET /api/v1/dashboard/<uuid>/versions/ endpoint."""
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: PT004, F811
pass
def _list_versions(self, dashboard_uuid: str) -> Any:
return self.client.get(f"/api/v1/dashboard/{dashboard_uuid}/versions/")
def test_list_versions_returns_ordered_sequence(self) -> None:
"""Saving a dashboard three times extends the version list by three."""
_persist_fixture_state()
dashboard: Dashboard = (
db.session.query(Dashboard)
.filter(Dashboard.dashboard_title == "USA Births Names")
.first()
)
assert dashboard is not None
original_title = dashboard.dashboard_title
dashboard_uuid = str(dashboard.uuid)
self.login(ADMIN_USERNAME)
rv = self._list_versions(dashboard_uuid)
assert rv.status_code == 200
assert "count" in _json.loads(rv.data.decode("utf-8"))
for i in range(3):
dashboard.dashboard_title = f"USA Births Names v{i}"
db.session.commit()
rv = self._list_versions(dashboard_uuid)
assert rv.status_code == 200
body = _json.loads(rv.data.decode("utf-8"))
# Delta-based assertion — retention pruning from other tests can lower
# the absolute count, but each of our three saves must produce exactly
# one new entry. We compare by transaction_id instead.
assert len(body["result"]) == body["count"]
for idx, entry in enumerate(body["result"]):
assert entry["version_number"] == idx
# Cleanup
dashboard.dashboard_title = original_title
db.session.commit()
def test_list_versions_empty_for_untouched_entity(self) -> None:
"""A dashboard with no version rows returns [] (not 404)."""
_persist_fixture_state()
dashboard = Dashboard(dashboard_title="Untouched dashboard", slug="untouched")
db.session.add(dashboard)
db.session.commit()
dashboard_uuid = str(dashboard.uuid)
ver_cls = version_class(Dashboard)
db.session.query(ver_cls).filter(ver_cls.id == dashboard.id).delete(
synchronize_session=False
)
db.session.commit()
self.login(ADMIN_USERNAME)
rv = self._list_versions(dashboard_uuid)
assert rv.status_code == 200
body = _json.loads(rv.data.decode("utf-8"))
assert body["count"] == 0
assert body["result"] == []
# Cleanup
db.session.delete(dashboard)
db.session.commit()
def test_list_versions_returns_404_for_unknown_uuid(self) -> None:
"""An unknown UUID returns 404."""
self.login(ADMIN_USERNAME)
rv = self._list_versions("00000000-0000-0000-0000-000000000000")
assert rv.status_code == 404
def test_list_versions_returns_400_for_invalid_uuid(self) -> None:
"""A malformed UUID string is rejected with 400."""
self.login(ADMIN_USERNAME)
rv = self._list_versions("not-a-uuid")
assert rv.status_code == 400
@pytest.mark.skip(
reason=(
"Superset's default Gamma role has can_write on Dashboard — there "
"is no built-in no-write user to exercise the 403 branch for this "
"resource. See dataset tests (T028) for a working 403 check."
)
)
def test_list_versions_denies_without_write_permission(self) -> None:
"""A user without can_write on Dashboard gets 403."""
def test_list_versions_admin_sees_all_entities(self) -> None:
"""FR-013: workspace admin can list versions for any entity."""
_persist_fixture_state()
dashboard: Dashboard = (
db.session.query(Dashboard)
.filter(Dashboard.dashboard_title == "USA Births Names")
.first()
)
assert dashboard is not None
dashboard_uuid = str(dashboard.uuid)
self.login(ADMIN_USERNAME)
rv = self._list_versions(dashboard_uuid)
assert rv.status_code == 200
class TestDashboardRestoreApi(SupersetTestCase):
"""T038 — POST /api/v1/dashboard/<uuid>/versions/<version_uuid>/restore."""
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: PT004, F811
pass
def _restore(self, dashboard_uuid: str, version_uuid: str) -> Any:
return self.client.post(
f"/api/v1/dashboard/{dashboard_uuid}/versions/{version_uuid}/restore"
)
def test_restore_applies_scalar_field(self) -> None:
"""Restore a dashboard title edit."""
from superset.daos.version import derive_version_uuid
_persist_fixture_state()
dashboard: Dashboard = (
db.session.query(Dashboard)
.filter(Dashboard.dashboard_title == "USA Births Names")
.first()
)
assert dashboard is not None
dashboard_uuid = str(dashboard.uuid)
original_title = dashboard.dashboard_title
dashboard_id = dashboard.id
entity_uuid = dashboard.uuid
# Make two more edits so we have a known non-trivial history to
# navigate: [initial, v1, v2].
dashboard.dashboard_title = "USA Births Names v1"
db.session.commit()
dashboard.dashboard_title = "USA Births Names v2"
db.session.commit()
ver_cls = version_class(Dashboard)
rows = (
db.session.query(
ver_cls.transaction_id,
ver_cls.operation_type,
ver_cls.dashboard_title,
ver_cls.end_transaction_id,
)
.filter(ver_cls.id == dashboard_id)
.order_by(ver_cls.transaction_id.asc())
.all()
)
# Find the version whose snapshot has the original title.
target_row = next(
(row for row in rows if row.dashboard_title == original_title),
None,
)
assert target_row is not None, (
f"Expected at least one version row with original title; rows={rows}"
)
target_uuid = str(derive_version_uuid(entity_uuid, target_row.transaction_id))
self.login(ADMIN_USERNAME)
rv = self._restore(dashboard_uuid, target_uuid)
assert rv.status_code == 200, rv.data
db.session.expire_all()
dashboard = (
db.session.query(Dashboard).filter(Dashboard.id == dashboard_id).one()
)
assert dashboard.dashboard_title == original_title, (
f"Restore did not revert title; rows={rows}"
)
# Cleanup
dashboard.dashboard_title = original_title
db.session.commit()
def test_restore_reattaches_chart_removed_after_snapshot(self) -> None:
"""After the target snapshot is captured, detaching a chart and saving
must be undone by restore — the chart comes back on dashboard_slices."""
from superset.daos.version import derive_version_uuid
_persist_fixture_state()
dashboard: Dashboard = (
db.session.query(Dashboard)
.filter(Dashboard.dashboard_title == "USA Births Names")
.first()
)
assert dashboard is not None
dashboard_uuid = str(dashboard.uuid)
dashboard_id = dashboard.id
entity_uuid = dashboard.uuid
original_slice_ids = sorted(s.id for s in dashboard.slices)
assert len(original_slice_ids) >= 2, (
f"fixture expected to attach >= 2 charts; got {original_slice_ids}"
)
slice_to_drop = dashboard.slices[0]
drop_id = slice_to_drop.id
# Touch the dashboard so a snapshot row is captured at a known tx.
dashboard.dashboard_title = "USA Births Names — snapshot point"
db.session.commit()
ver_cls = version_class(Dashboard)
target_tx = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == dashboard_id)
.order_by(ver_cls.transaction_id.desc())
.limit(1)
.scalar()
)
assert target_tx is not None
target_uuid = str(derive_version_uuid(entity_uuid, target_tx))
# Detach the chart and commit — moves history forward.
dashboard.slices.remove(slice_to_drop)
db.session.commit()
db.session.expire_all()
dashboard = (
db.session.query(Dashboard).filter(Dashboard.id == dashboard_id).one()
)
live_ids = {s.id for s in dashboard.slices}
assert drop_id not in live_ids, "pre-restore: dropped chart should be detached"
self.login(ADMIN_USERNAME)
rv = self._restore(dashboard_uuid, target_uuid)
assert rv.status_code == 200, rv.data
db.session.expire_all()
dashboard = (
db.session.query(Dashboard).filter(Dashboard.id == dashboard_id).one()
)
restored_ids = sorted(s.id for s in dashboard.slices)
assert restored_ids == original_slice_ids, (
f"restore did not re-attach chart: expected {original_slice_ids}, "
f"got {restored_ids}"
)
def test_restore_returns_404_for_unknown_uuid(self) -> None:
self.login(ADMIN_USERNAME)
rv = self._restore(
"00000000-0000-0000-0000-000000000000",
"00000000-0000-0000-0000-000000000001",
)
assert rv.status_code == 404
def test_restore_returns_404_for_unknown_version_uuid(self) -> None:
_persist_fixture_state()
dashboard: Dashboard = (
db.session.query(Dashboard)
.filter(Dashboard.dashboard_title == "USA Births Names")
.first()
)
assert dashboard is not None
self.login(ADMIN_USERNAME)
rv = self._restore(str(dashboard.uuid), "00000000-0000-0000-0000-000000000099")
assert rv.status_code == 404
def test_put_response_returns_old_and_new_version_numbers(self) -> None:
"""PUT /api/v1/dashboard/<id> response must include old_version and
new_version matching the list-versions ordering."""
_persist_fixture_state()
dashboard: Dashboard = (
db.session.query(Dashboard)
.filter(Dashboard.dashboard_title == "USA Births Names")
.first()
)
assert dashboard is not None
dashboard_id = dashboard.id
original_title = dashboard.dashboard_title
ver_cls = version_class(Dashboard)
count_before = (
db.session.query(ver_cls).filter(ver_cls.id == dashboard_id).count()
)
expected_old = count_before - 1 if count_before > 0 else None
self.login(ADMIN_USERNAME)
rv = self.client.put(
f"/api/v1/dashboard/{dashboard_id}",
json={"dashboard_title": "put-response-version-test"},
)
assert rv.status_code == 200, rv.data
body = _json.loads(rv.data.decode("utf-8"))
assert body["id"] == dashboard_id
assert body["old_version"] == expected_old
assert body["new_version"] is not None
assert "old_transaction_id" in body
assert "new_transaction_id" in body
if body["old_transaction_id"] is not None:
assert body["new_transaction_id"] != body["old_transaction_id"]
# Cleanup
dashboard = (
db.session.query(Dashboard).filter(Dashboard.id == dashboard_id).one()
)
dashboard.dashboard_title = original_title
db.session.commit()

View File

@@ -0,0 +1,619 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Integration tests for Dataset (SqlaTable) version history capture.
T016 — dataset column and metric version rows are created via ORM (not bulk) ops
T028 — dataset version list endpoint
"""
from __future__ import annotations
from typing import Any
import pytest
import sqlalchemy as sa
from sqlalchemy_continuum import version_class
from superset.connectors.sqla.models import SqlaTable, SqlMetric, TableColumn
from superset.extensions import db
from superset.utils import json as _json
from tests.integration_tests.base_tests import SupersetTestCase
from tests.integration_tests.constants import ADMIN_USERNAME, GAMMA_USERNAME
from tests.integration_tests.fixtures.birth_names_dashboard import ( # noqa: F401
load_birth_names_dashboard_with_slices,
load_birth_names_data,
)
def _get_table_column_version_rows(column: TableColumn) -> list[Any]:
ver_cls = version_class(TableColumn)
return (
db.session.query(ver_cls)
.filter(ver_cls.id == column.id)
.order_by(ver_cls.transaction_id.asc())
.all()
)
def _get_sql_metric_version_rows(metric: SqlMetric) -> list[Any]:
ver_cls = version_class(SqlMetric)
return (
db.session.query(ver_cls)
.filter(ver_cls.id == metric.id)
.order_by(ver_cls.transaction_id.asc())
.all()
)
def _get_table_version_rows(table: SqlaTable) -> list[Any]:
ver_cls = version_class(SqlaTable)
return (
db.session.query(ver_cls)
.filter(ver_cls.id == table.id)
.order_by(ver_cls.transaction_id.asc())
.all()
)
def _persist_fixture_state() -> None:
"""Force fixture's pending INSERTs to commit in their own transaction.
The birth_names fixture stages charts and the dashboard via session.add()
but does not commit. Without this, the test's first commit batches the
INSERTs and UPDATEs into the same Continuum transaction, causing the
existing version row to be updated in place instead of a new one being
created.
"""
db.session.commit()
class TestDatasetVersionListApi(SupersetTestCase):
"""T028 — GET /api/v1/dataset/<uuid>/versions/ endpoint."""
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: PT004, F811
pass
def _list_versions(self, dataset_uuid: str) -> Any:
return self.client.get(f"/api/v1/dataset/{dataset_uuid}/versions/")
def test_list_versions_returns_ordered_sequence(self) -> None:
"""Editing a dataset produces ascending version_number entries."""
_persist_fixture_state()
table: SqlaTable = (
db.session.query(SqlaTable)
.filter(SqlaTable.table_name == "birth_names")
.first()
)
assert table is not None
original_description = table.description
table_uuid = str(table.uuid)
for i in range(3):
table.description = f"Test description v{i}"
db.session.commit()
self.login(ADMIN_USERNAME)
rv = self._list_versions(table_uuid)
assert rv.status_code == 200
body = _json.loads(rv.data.decode("utf-8"))
assert body["count"] == len(body["result"])
for idx, entry in enumerate(body["result"]):
assert entry["version_number"] == idx
assert entry["issued_at"] is not None
# issued_at is an RFC-1123 HTTP date ("Wed, 22 Apr 2026 …"); parse
# before checking monotonic order rather than sorting strings,
# which would reorder incorrectly across day-of-week boundaries.
from email.utils import parsedate_to_datetime
parsed = [parsedate_to_datetime(e["issued_at"]) for e in body["result"]]
assert parsed == sorted(parsed)
# Cleanup
table.description = original_description
db.session.commit()
def test_list_versions_empty_for_untouched_entity(self) -> None:
"""A dataset with no version rows returns [] (not 404)."""
_persist_fixture_state()
table = SqlaTable(
table_name="__untouched_table_for_version_list__",
database_id=1,
)
db.session.add(table)
db.session.commit()
table_uuid = str(table.uuid)
ver_cls = version_class(SqlaTable)
db.session.query(ver_cls).filter(ver_cls.id == table.id).delete(
synchronize_session=False
)
db.session.commit()
self.login(ADMIN_USERNAME)
rv = self._list_versions(table_uuid)
assert rv.status_code == 200
body = _json.loads(rv.data.decode("utf-8"))
assert body["count"] == 0
assert body["result"] == []
# Cleanup
db.session.delete(table)
db.session.commit()
def test_list_versions_returns_404_for_unknown_uuid(self) -> None:
"""An unknown UUID returns 404."""
self.login(ADMIN_USERNAME)
rv = self._list_versions("00000000-0000-0000-0000-000000000000")
assert rv.status_code == 404
def test_list_versions_returns_400_for_invalid_uuid(self) -> None:
"""A malformed UUID string is rejected with 400."""
self.login(ADMIN_USERNAME)
rv = self._list_versions("not-a-uuid")
assert rv.status_code == 400
def test_list_versions_denies_without_write_permission(self) -> None:
"""Gamma is read-only on Dataset — 403 on list_versions."""
_persist_fixture_state()
table: SqlaTable = (
db.session.query(SqlaTable)
.filter(SqlaTable.table_name == "birth_names")
.first()
)
assert table is not None
table_uuid = str(table.uuid)
self.login(GAMMA_USERNAME)
rv = self._list_versions(table_uuid)
assert rv.status_code == 403
def test_list_versions_admin_sees_all_entities(self) -> None:
"""FR-013: workspace admin can list versions for any entity."""
_persist_fixture_state()
table: SqlaTable = (
db.session.query(SqlaTable)
.filter(SqlaTable.table_name == "birth_names")
.first()
)
assert table is not None
table_uuid = str(table.uuid)
self.login(ADMIN_USERNAME)
rv = self._list_versions(table_uuid)
assert rv.status_code == 200
class TestDatasetRestoreApi(SupersetTestCase):
"""T039 — POST /api/v1/dataset/<uuid>/versions/<version_uuid>/restore."""
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: PT004, F811
pass
def _restore(self, dataset_uuid: str, version_uuid: str) -> Any:
return self.client.post(
f"/api/v1/dataset/{dataset_uuid}/versions/{version_uuid}/restore"
)
def test_restore_applies_scalar_field(self) -> None:
"""Restore a dataset's description edit."""
from superset.daos.version import derive_version_uuid
_persist_fixture_state()
table: SqlaTable = (
db.session.query(SqlaTable)
.filter(SqlaTable.table_name == "birth_names")
.first()
)
assert table is not None
table_uuid = str(table.uuid)
entity_uuid = table.uuid
table_id = table.id
original_description = table.description
# Two more edits to produce a non-trivial history.
table.description = "restore-test v1"
db.session.commit()
table.description = "restore-test v2"
db.session.commit()
ver_cls = version_class(SqlaTable)
rows = (
db.session.query(
ver_cls.transaction_id,
ver_cls.operation_type,
ver_cls.description,
)
.filter(ver_cls.id == table_id)
.order_by(ver_cls.transaction_id.asc())
.all()
)
target_row = next(
(row for row in rows if row.description == original_description),
None,
)
assert target_row is not None, (
f"No version with original description; rows={rows}"
)
target_uuid = str(derive_version_uuid(entity_uuid, target_row.transaction_id))
self.login(ADMIN_USERNAME)
rv = self._restore(table_uuid, target_uuid)
assert rv.status_code == 200, rv.data
db.session.expire_all()
table = db.session.query(SqlaTable).filter(SqlaTable.id == table_id).one()
assert table.description == original_description
# Cleanup
table.description = original_description
db.session.commit()
def test_restore_with_column_edits_reverts_columns(self) -> None:
"""After editing a column's description, restoring an earlier version
reverts the column."""
from superset.daos.version import derive_version_uuid
_persist_fixture_state()
table: SqlaTable = (
db.session.query(SqlaTable)
.filter(SqlaTable.table_name == "birth_names")
.first()
)
assert table is not None
table_uuid = str(table.uuid)
entity_uuid = table.uuid
table_id = table.id
col = table.columns[0]
col_name = col.column_name
original_col_description = col.description
# Snapshot target version before our column edit.
ver_cls = version_class(SqlaTable)
last_tx = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == table_id)
.order_by(ver_cls.transaction_id.desc())
.limit(1)
.scalar()
)
assert last_tx is not None
target_uuid = str(derive_version_uuid(entity_uuid, last_tx))
col.description = "restore-test column edit"
db.session.commit()
self.login(ADMIN_USERNAME)
rv = self._restore(table_uuid, target_uuid)
assert rv.status_code == 200, rv.data
# JSON-snapshot restore reassigns child PKs, so look up by natural
# key (column_name) rather than the old id.
db.session.expire_all()
col = (
db.session.query(TableColumn)
.filter(TableColumn.table_id == table_id)
.filter(TableColumn.column_name == col_name)
.one()
)
assert col.description == original_col_description
# Cleanup
col.description = original_col_description
db.session.commit()
def test_restore_adds_back_removed_column_and_drops_added_one(self) -> None:
"""After a snapshot is taken, removing an existing column and adding
a new one, restoring the snapshot must undo both operations."""
from superset.daos.version import derive_version_uuid
_persist_fixture_state()
table: SqlaTable = (
db.session.query(SqlaTable)
.filter(SqlaTable.table_name == "birth_names")
.first()
)
assert table is not None
table_id = table.id
table_uuid = str(table.uuid)
entity_uuid = table.uuid
original_col_names = sorted(c.column_name for c in table.columns)
removed_name = table.columns[0].column_name
# Capture a snapshot tx point by touching the dataset.
table.description = "snapshot before column-swap"
db.session.commit()
ver_cls = version_class(SqlaTable)
target_tx = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == table_id)
.order_by(ver_cls.transaction_id.desc())
.limit(1)
.scalar()
)
assert target_tx is not None
target_uuid = str(derive_version_uuid(entity_uuid, target_tx))
# Remove a column, add a new one, commit (moves history forward).
db.session.delete(table.columns[0])
db.session.add(
TableColumn(
table_id=table_id,
column_name="__restore_test_calc__",
expression="1",
)
)
db.session.commit()
assert removed_name not in {c.column_name for c in table.columns}
assert "__restore_test_calc__" in {c.column_name for c in table.columns}
self.login(ADMIN_USERNAME)
rv = self._restore(table_uuid, target_uuid)
assert rv.status_code == 200, rv.data
db.session.expire_all()
table = db.session.query(SqlaTable).filter(SqlaTable.id == table_id).one()
restored_names = sorted(c.column_name for c in table.columns)
assert restored_names == original_col_names
def test_restore_emits_full_child_diff_in_one_transaction(self) -> None:
"""A restore that re-adds one column and drops another MUST write
*both* change records under the same transaction. Under the prior
per-relation flush loop the first flush emitted only the
easier-to-detect change (the modification of a surviving
column), the listener's tx-dedup guard then suppressed the
second pass, and the addition record was silently lost from
``version_changes`` — the dropdown rendered the restore as an
empty "Baseline" entry. Locks in the single-flush restore
behavior in ``VersionDAO.restore_version``.
"""
from superset.daos.version import derive_version_uuid
from superset.versioning.changes import version_changes_table
_persist_fixture_state()
table: SqlaTable = (
db.session.query(SqlaTable)
.filter(SqlaTable.table_name == "birth_names")
.first()
)
assert table is not None
table_id = table.id
table_uuid = str(table.uuid)
entity_uuid = table.uuid
removed_name = table.columns[0].column_name
added_name = "__restore_full_diff_test__"
# Snapshot point captures the baseline.
table.description = "snapshot before full-diff column swap"
db.session.commit()
ver_cls = version_class(SqlaTable)
target_tx = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == table_id)
.order_by(ver_cls.transaction_id.desc())
.limit(1)
.scalar()
)
assert target_tx is not None
target_uuid = str(derive_version_uuid(entity_uuid, target_tx))
db.session.delete(table.columns[0])
db.session.add(
TableColumn(table_id=table_id, column_name=added_name, expression="1")
)
db.session.commit()
self.login(ADMIN_USERNAME)
rv = self._restore(table_uuid, target_uuid)
assert rv.status_code == 200, rv.data
db.session.expire_all()
restore_tx = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == table_id)
.order_by(ver_cls.transaction_id.desc())
.limit(1)
.scalar()
)
rows = (
db.session.connection()
.execute(
sa.select(
version_changes_table.c.kind,
version_changes_table.c.path,
).where(
version_changes_table.c.transaction_id == restore_tx,
version_changes_table.c.entity_kind == "dataset",
version_changes_table.c.entity_id == table_id,
)
)
.all()
)
paths = {tuple(row.path) for row in rows}
assert ("columns", added_name) in paths, (
f"restore tx {restore_tx} did not emit removal record for "
f"the added-then-restored-away column {added_name!r}; "
f"observed paths={paths}"
)
assert ("columns", removed_name) in paths, (
f"restore tx {restore_tx} did not emit addition record for "
f"the deleted-then-restored column {removed_name!r}; "
f"observed paths={paths}"
)
def test_restore_returns_404_for_unknown_uuid(self) -> None:
self.login(ADMIN_USERNAME)
rv = self._restore(
"00000000-0000-0000-0000-000000000000",
"00000000-0000-0000-0000-000000000001",
)
assert rv.status_code == 404
def test_restore_returns_404_for_unknown_version_uuid(self) -> None:
_persist_fixture_state()
table: SqlaTable = (
db.session.query(SqlaTable)
.filter(SqlaTable.table_name == "birth_names")
.first()
)
assert table is not None
self.login(ADMIN_USERNAME)
rv = self._restore(str(table.uuid), "00000000-0000-0000-0000-000000000099")
assert rv.status_code == 404
def test_restore_returns_400_for_invalid_entity_uuid(self) -> None:
self.login(ADMIN_USERNAME)
rv = self._restore("not-a-uuid", "00000000-0000-0000-0000-000000000001")
assert rv.status_code == 400
def test_restore_returns_400_for_invalid_version_uuid(self) -> None:
_persist_fixture_state()
table: SqlaTable = (
db.session.query(SqlaTable)
.filter(SqlaTable.table_name == "birth_names")
.first()
)
assert table is not None
self.login(ADMIN_USERNAME)
rv = self._restore(str(table.uuid), "not-a-uuid")
assert rv.status_code == 400
def test_get_version_returns_historical_snapshot_with_children(self) -> None:
"""GET /versions/<uuid>/ on a dataset returns scalar fields and
reconstructed columns/metrics, without modifying live state."""
from superset.daos.version import derive_version_uuid
_persist_fixture_state()
table: SqlaTable = (
db.session.query(SqlaTable)
.filter(SqlaTable.table_name == "birth_names")
.first()
)
assert table is not None
table_id = table.id
table_uuid = str(table.uuid)
entity_uuid = table.uuid
original_description = table.description
original_col_names = sorted(c.column_name for c in table.columns)
# Capture a snapshot point now; make a change after.
ver_cls = version_class(SqlaTable)
target_tx = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == table_id)
.order_by(ver_cls.transaction_id.desc())
.limit(1)
.scalar()
)
assert target_tx is not None
target_uuid = str(derive_version_uuid(entity_uuid, target_tx))
table.description = "edited after snapshot"
db.session.commit()
self.login(ADMIN_USERNAME)
rv = self.client.get(f"/api/v1/dataset/{table_uuid}/versions/{target_uuid}/")
assert rv.status_code == 200, rv.data
body = _json.loads(rv.data.decode("utf-8"))["result"]
# Scalar fields reflect the snapshot, not the live edit.
assert body["description"] == original_description
assert body["_version"]["version_uuid"] == target_uuid
# Columns list matches original set.
snapshot_col_names = sorted(c["column_name"] for c in body["columns"])
assert snapshot_col_names == original_col_names
# Metrics reconstructed.
assert isinstance(body["metrics"], list)
assert all("metric_name" in m for m in body["metrics"])
# Live row remains in its edited state.
db.session.expire_all()
live = db.session.query(SqlaTable).filter(SqlaTable.id == table_id).one()
assert live.description == "edited after snapshot"
# Cleanup
live.description = original_description
db.session.commit()
def test_put_response_returns_old_and_new_version_numbers(self) -> None:
"""PUT /api/v1/dataset/<id> should include old_version and new_version
fields that match the list-versions endpoint's version_number values."""
_persist_fixture_state()
table: SqlaTable = (
db.session.query(SqlaTable)
.filter(SqlaTable.table_name == "birth_names")
.first()
)
assert table is not None
table_id = table.id
original_description = table.description
ver_cls = version_class(SqlaTable)
count_before = db.session.query(ver_cls).filter(ver_cls.id == table_id).count()
expected_old = count_before - 1 if count_before > 0 else None
self.login(ADMIN_USERNAME)
rv = self.client.put(
f"/api/v1/dataset/{table_id}",
json={"description": "version-number response test"},
)
assert rv.status_code == 200, rv.data
body = _json.loads(rv.data.decode("utf-8"))
assert body["id"] == table_id
assert "old_version" in body
assert "new_version" in body
assert "old_transaction_id" in body
assert "new_transaction_id" in body
assert body["old_version"] == expected_old
# new_version points to the live row post-commit. It is usually
# old_version + 1, but can equal old_version when retention pruning
# removed an older closed row in the same commit.
assert body["new_version"] is not None
assert body["new_version"] >= 0
# Transaction ids are stable identifiers, so a successful update
# always produces a new_transaction_id distinct from the previous
# one (when old_transaction_id is known).
if body["old_transaction_id"] is not None:
assert body["new_transaction_id"] != body["old_transaction_id"]
# Cleanup
table = db.session.query(SqlaTable).filter(SqlaTable.id == table_id).one()
table.description = original_description
db.session.commit()
def test_restore_denies_without_write_permission(self) -> None:
"""Gamma is read-only on Dataset — 403 on restore."""
_persist_fixture_state()
table: SqlaTable = (
db.session.query(SqlaTable)
.filter(SqlaTable.table_name == "birth_names")
.first()
)
assert table is not None
table_uuid = str(table.uuid)
self.login(GAMMA_USERNAME)
rv = self._restore(table_uuid, "00000000-0000-0000-0000-000000000001")
assert rv.status_code == 403

View File

@@ -0,0 +1,131 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Schema-shape assertion tests for the composite-PK association-tables
migration (revision 2bee73611e32).
Builds the pre-migration shape against an isolated in-memory SQLite engine,
runs the migration's ``upgrade()``, and asserts the resulting shape matches
the data-model.md "After" specification: no ``id`` column, composite PK on
the two FK columns, and no redundant ``UNIQUE(fk1, fk2)`` on the two tables
that previously carried one.
Continuum-restore verification is OUT OF SCOPE; that work lives in the
versioning epic (sc-103156). Cross-backend verification (PostgreSQL, MySQL)
is handled by the CI matrix (T034a).
"""
from importlib import import_module
import pytest
import sqlalchemy as sa
from alembic.migration import MigrationContext
from alembic.operations import Operations
from sqlalchemy import inspect
# Import the migration module under test.
_migration = import_module(
"superset.migrations.versions."
"2026-05-01_23-36_2bee73611e32_composite_pk_association_tables"
)
AFFECTED_TABLES = _migration.AFFECTED_TABLES
TABLES_WITH_PRE_EXISTING_UNIQUE = _migration.TABLES_WITH_PRE_EXISTING_UNIQUE
@pytest.fixture(scope="module")
def post_upgrade_engine() -> sa.engine.Engine:
"""An isolated in-memory SQLite engine with the migration applied to a
pre-migration-shaped seed schema. Used by the post-upgrade assertions
below. Module-scoped so the upgrade only runs once per test session."""
engine = sa.create_engine("sqlite:///:memory:")
md = sa.MetaData()
for t in AFFECTED_TABLES:
cols: list[sa.SchemaItem] = [
sa.Column("id", sa.Integer, primary_key=True),
sa.Column(t.fk1, sa.Integer, nullable=False),
sa.Column(t.fk2, sa.Integer, nullable=False),
]
constraints: list[sa.SchemaItem] = []
if t.name in TABLES_WITH_PRE_EXISTING_UNIQUE:
constraints.append(sa.UniqueConstraint(t.fk1, t.fk2))
sa.Table(t.name, md, *cols, *constraints)
md.create_all(engine)
# Apply the migration's upgrade() against this engine via Alembic's
# MigrationContext, patching the migration module's ``op`` reference.
with engine.connect() as conn:
ctx = MigrationContext.configure(conn)
ops = Operations(ctx)
original_op = _migration.op
_migration.op = ops # type: ignore[attr-defined]
try:
_migration.upgrade()
finally:
_migration.op = original_op # type: ignore[attr-defined]
return engine
@pytest.mark.parametrize("t", AFFECTED_TABLES, ids=lambda t: t.name)
def test_no_id_column(post_upgrade_engine: sa.engine.Engine, t) -> None:
"""The synthetic ``id`` column is gone from each affected table."""
insp = inspect(post_upgrade_engine)
column_names = {c["name"] for c in insp.get_columns(t.name)}
assert "id" not in column_names, (
f"{t.name} still has an 'id' column after migration; "
f"composite-PK conversion incomplete"
)
@pytest.mark.parametrize("t", AFFECTED_TABLES, ids=lambda t: t.name)
def test_primary_key_is_composite_fks(post_upgrade_engine: sa.engine.Engine, t) -> None:
"""The primary key of each affected table is exactly ``(fk1, fk2)``."""
insp = inspect(post_upgrade_engine)
pk_cols = set(insp.get_pk_constraint(t.name).get("constrained_columns", []))
assert pk_cols == {t.fk1, t.fk2}, (
f"{t.name} primary key is {pk_cols}, expected {{{t.fk1}, {t.fk2}}}"
)
@pytest.mark.parametrize(
"t",
[t for t in AFFECTED_TABLES if t.name in TABLES_WITH_PRE_EXISTING_UNIQUE],
ids=lambda t: t.name,
)
def test_redundant_unique_dropped(post_upgrade_engine: sa.engine.Engine, t) -> None:
"""For the two tables that previously carried a UNIQUE(fk1, fk2), that
constraint is now subsumed by the composite PK and must not appear
separately in the unique-constraint list."""
insp = inspect(post_upgrade_engine)
redundant_pair = {t.fk1, t.fk2}
for uc in insp.get_unique_constraints(t.name):
cols = set(uc.get("column_names", []))
assert cols != redundant_pair, (
f"{t.name} still carries a redundant UniqueConstraint over "
f"{redundant_pair} (name={uc.get('name')!r}); "
f"composite-PK conversion incomplete"
)
@pytest.mark.parametrize("t", AFFECTED_TABLES, ids=lambda t: t.name)
def test_fk_columns_not_null(post_upgrade_engine: sa.engine.Engine, t) -> None:
"""PK promotion implicitly tightens the FK columns to NOT NULL."""
insp = inspect(post_upgrade_engine)
cols_by_name = {c["name"]: c for c in insp.get_columns(t.name)}
for col in (t.fk1, t.fk2):
assert col in cols_by_name, f"{t.name} missing column {col}"
assert cols_by_name[col].get("nullable") is False, (
f"{t.name}.{col} is nullable; expected NOT NULL after PK promotion"
)

View File

@@ -0,0 +1,168 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Schema round-trip tests for the composite-PK association-tables migration
(revision 2bee73611e32). Builds the pre-migration shape against an in-memory
SQLite engine, runs the migration's ``upgrade()``, asserts the post-upgrade
shape, runs ``downgrade()``, asserts the prior shape is restored (modulo the
documented FK NOT NULL asymmetry), and re-runs ``upgrade()`` to verify
idempotency.
This is run against an isolated in-memory engine via Alembic's
``MigrationContext`` so the test does not perturb the project's test DB.
Cross-backend verification of the same migration against PostgreSQL and
MySQL is delegated to the CI matrix (see T034a in tasks.md) and to the
quickstart.md verification (T033). This file covers the SQLite slice.
"""
from importlib import import_module
from typing import Any
import pytest
import sqlalchemy as sa
from alembic.migration import MigrationContext
from alembic.operations import Operations
from sqlalchemy import inspect
# Import the migration module under test.
_migration = import_module(
"superset.migrations.versions."
"2026-05-01_23-36_2bee73611e32_composite_pk_association_tables"
)
AFFECTED_TABLES = _migration.AFFECTED_TABLES
TABLES_WITH_PRE_EXISTING_UNIQUE = _migration.TABLES_WITH_PRE_EXISTING_UNIQUE
def _build_pre_migration_schema(engine: sa.engine.Engine) -> None:
"""Recreate the eight tables in their pre-migration shape (surrogate
``id INTEGER PRIMARY KEY`` plus an optional ``UNIQUE(fk1, fk2)`` on the
two tables that previously carried one). FKs to parent tables are
omitted to keep the test self-contained — we're testing schema
transformations, not FK enforcement."""
md = sa.MetaData()
for t in AFFECTED_TABLES:
cols: list[sa.Column] = [
sa.Column("id", sa.Integer, primary_key=True),
sa.Column(t.fk1, sa.Integer, nullable=False),
sa.Column(t.fk2, sa.Integer, nullable=False),
]
constraints: list[sa.SchemaItem] = []
if t.name in TABLES_WITH_PRE_EXISTING_UNIQUE:
constraints.append(sa.UniqueConstraint(t.fk1, t.fk2))
sa.Table(t.name, md, *cols, *constraints)
md.create_all(engine)
def _shape(engine: sa.engine.Engine, table: str) -> dict[str, Any]:
"""Return a structural summary for asserting equality across runs."""
insp = inspect(engine)
pk = insp.get_pk_constraint(table).get("constrained_columns", [])
columns = sorted(c["name"] for c in insp.get_columns(table))
uniques = sorted(
tuple(sorted(uc.get("column_names", [])))
for uc in insp.get_unique_constraints(table)
)
return {"columns": columns, "pk": sorted(pk), "uniques": uniques}
def _run_with_alembic_context(engine: sa.engine.Engine, fn) -> None:
"""Run ``fn()`` (the migration's upgrade/downgrade body) inside a fresh
Alembic ``MigrationContext`` bound to ``engine``. Patches the
migration module's ``op`` to point at this context so its
``op.get_bind()`` and ``op.batch_alter_table`` calls execute against
the in-memory engine."""
with engine.connect() as conn:
ctx = MigrationContext.configure(conn)
ops = Operations(ctx)
original_op = _migration.op
_migration.op = ops # type: ignore[attr-defined]
try:
fn()
finally:
_migration.op = original_op # type: ignore[attr-defined]
def test_round_trip_against_in_memory_sqlite() -> None:
"""Round-trip: pre-migration → upgrade → downgrade → upgrade again.
Asserts:
- Post-upgrade shape: no ``id``, composite PK on (fk1, fk2), no
UNIQUE(fk1, fk2) on the two tables that previously carried one.
- Post-downgrade shape: ``id`` restored, PK back on (id), UNIQUE
re-added on the two tables. (FK columns remain NOT NULL — the
documented intentional asymmetry.)
- Post-re-upgrade idempotency: shape matches the first post-upgrade.
"""
engine = sa.create_engine("sqlite:///:memory:")
_build_pre_migration_schema(engine)
pre_shape = {t.name: _shape(engine, t.name) for t in AFFECTED_TABLES}
_run_with_alembic_context(engine, _migration.upgrade)
for t in AFFECTED_TABLES:
s = _shape(engine, t.name)
assert "id" not in s["columns"], f"{t.name}: id still present post-upgrade: {s}"
assert s["pk"] == sorted([t.fk1, t.fk2]), (
f"{t.name}: PK is {s['pk']}, expected {sorted([t.fk1, t.fk2])}"
)
assert tuple(sorted([t.fk1, t.fk2])) not in s["uniques"], (
f"{t.name}: redundant UNIQUE not dropped post-upgrade: {s['uniques']}"
)
post_upgrade_shape = {t.name: _shape(engine, t.name) for t in AFFECTED_TABLES}
_run_with_alembic_context(engine, _migration.downgrade)
for t in AFFECTED_TABLES:
s = _shape(engine, t.name)
assert "id" in s["columns"], f"{t.name}: id not restored post-downgrade: {s}"
assert s["pk"] == ["id"], f"{t.name}: PK is {s['pk']}, expected ['id']"
if t.name in TABLES_WITH_PRE_EXISTING_UNIQUE:
assert tuple(sorted([t.fk1, t.fk2])) in s["uniques"], (
f"{t.name}: UNIQUE not restored post-downgrade: {s['uniques']}"
)
_run_with_alembic_context(engine, _migration.upgrade)
re_upgrade_shape = {t.name: _shape(engine, t.name) for t in AFFECTED_TABLES}
assert re_upgrade_shape == post_upgrade_shape, (
"Re-upgrade shape differs from initial upgrade shape — "
"migration is not idempotent. "
f"diff: {set(re_upgrade_shape.items()) ^ set(post_upgrade_shape.items())}"
)
# Use pre_shape only to demonstrate it was captured (not asserted against
# because the round-trip downgrade intentionally diverges on FK NOT NULL).
_ = pre_shape
def test_migration_module_constants_are_consistent() -> None:
"""Sanity-check the migration module's exported constants. Catches
accidental edits that misalign AFFECTED_TABLES with the auxiliary sets."""
affected_names = {t.name for t in AFFECTED_TABLES}
assert _migration.TABLES_WITH_PRE_EXISTING_UNIQUE.issubset(affected_names)
assert _migration.TABLES_WITH_NULLABLE_FKS.issubset(affected_names)
# Order is alphabetical (deterministic for review/bisection).
assert [t.name for t in AFFECTED_TABLES] == sorted(affected_names)
@pytest.mark.skipif(True, reason="placeholder — see test_round_trip above")
def test_placeholder_for_future_postgres_round_trip() -> None:
"""Reserved slot for a future Postgres-specific round-trip if local
SQLite divergence ever needs to be cross-checked against the real
backend. Today's CI matrix (T034a) handles this implicitly."""

View File

@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

View File

@@ -0,0 +1,442 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Integration tests for ``version_changes`` capture (T052, partial).
Covers in this file:
(a) saving a chart with three field changes produces three rows
(f) baseline / INSERT transactions produce zero records *for that entity*
+ unchanged-save / dashboard / params-classification cases
Deferred:
(b) ``GET /versions/`` response includes ``changes`` array — lands with
T050 (API integration).
(c) FK cascade — exercisable in principle (the migration declares
``ON DELETE CASCADE``) but can't be isolated in a unit-style test
because ``version_transaction`` is referenced by non-cascading FKs
from slices_version / dashboards_version / etc. Covered instead
by (d) below once it lands, and by the structural declaration in
T046's migration.
(d) retention prune drops change records alongside the pruned
version — will land when T049 extends ``VersionDAO.prune_versions``
to include ``version_changes`` alongside the shadow-row delete.
(e) ``kind`` index query plan on Postgres — deferred to T053 perf
validation.
"""
from __future__ import annotations
from datetime import datetime, timedelta
from typing import Any
import pytest
import sqlalchemy as sa
from sqlalchemy_continuum import version_class
from superset.extensions import db
from superset.models.dashboard import Dashboard
from superset.models.slice import Slice
from superset.utils import json as _json
from tests.integration_tests.base_tests import SupersetTestCase
from tests.integration_tests.fixtures.birth_names_dashboard import ( # noqa: F401
load_birth_names_dashboard_with_slices,
load_birth_names_data,
)
_VERSION_CHANGES = sa.table(
"version_changes",
sa.column("id"),
sa.column("transaction_id"),
sa.column("entity_kind"),
sa.column("entity_id"),
sa.column("sequence"),
sa.column("kind"),
sa.column("path"),
sa.column("from_value"),
sa.column("to_value"),
)
def _change_rows_for(
tx_id: int,
*,
entity_kind: str | None = None,
entity_id: int | None = None,
) -> list[dict[str, Any]]:
"""Raw fetch of ``version_changes`` rows for a tx + optional entity filter."""
query = sa.select(_VERSION_CHANGES).where(
_VERSION_CHANGES.c.transaction_id == tx_id
)
if entity_kind is not None:
query = query.where(_VERSION_CHANGES.c.entity_kind == entity_kind)
if entity_id is not None:
query = query.where(_VERSION_CHANGES.c.entity_id == entity_id)
query = query.order_by(_VERSION_CHANGES.c.sequence.asc())
result = db.session.connection().execute(query)
return [dict(row._mapping) for row in result]
def _persist_fixture_state() -> None:
"""Commit fixture INSERTs so the baseline row exists before the test edits.
Without this, the test's first commit batches the fixture's pending
INSERTs with the test's UPDATE into a single Continuum transaction
and no diff records are emitted (no pre-state).
"""
db.session.commit()
class TestChartChangeRecords(SupersetTestCase):
"""Change-record capture for chart (Slice) saves."""
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: F811, PT004
pass
def test_single_scalar_edit_produces_one_change_record(self) -> None:
"""(a) — one field changed, one ``version_changes`` row."""
_persist_fixture_state()
chart = db.session.query(Slice).first()
assert chart is not None
chart.slice_name = f"{chart.slice_name[:64]}_renamed"
db.session.commit()
# The save produces one new version row (the UPDATE). Fetch its tx_id.
ver_cls = version_class(Slice)
update_tx_id = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == chart.id)
.filter(ver_cls.operation_type == 1)
.order_by(ver_cls.transaction_id.desc())
.first()
.transaction_id
)
rows = _change_rows_for(update_tx_id, entity_kind="chart", entity_id=chart.id)
assert len(rows) == 1
assert rows[0]["kind"] == "field"
path = (
_json.loads(rows[0]["path"])
if isinstance(rows[0]["path"], str)
else rows[0]["path"]
)
assert path == ["slice_name"]
assert rows[0]["sequence"] == 0
def test_last_saved_at_is_excluded_as_audit_noise(self) -> None:
"""``last_saved_at`` / ``last_saved_by_fk`` are save-side-effect
fields stamped by ``UpdateChartCommand`` and must not produce
change records — same category as ``changed_on``.
Saving a chart with ONLY a ``last_saved_at`` bump must produce
zero ``version_changes`` rows for that transaction. (Continuum
still records the shadow row; we just don't want to noise up
the per-edit diff log.)
"""
_persist_fixture_state()
chart = db.session.query(Slice).first()
assert chart is not None
chart.last_saved_at = datetime.now() + timedelta(seconds=1)
db.session.commit()
ver_cls = version_class(Slice)
latest_tx = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == chart.id)
.filter(ver_cls.operation_type == 1)
.order_by(ver_cls.transaction_id.desc())
.first()
)
# If the save produced no version row at all (no actual model
# change beyond the audit field), nothing to assert. If it did,
# there must be no ``last_saved_at`` row in version_changes.
if latest_tx is None:
return
rows = _change_rows_for(
latest_tx.transaction_id, entity_kind="chart", entity_id=chart.id
)
paths = [
_json.loads(r["path"]) if isinstance(r["path"], str) else r["path"]
for r in rows
]
assert ["last_saved_at"] not in paths
assert ["last_saved_by_fk"] not in paths
def test_three_scalar_edits_produce_three_records_in_sequence(self) -> None:
"""(a) — three fields changed, three rows, ``sequence`` 0..2."""
_persist_fixture_state()
chart = db.session.query(Slice).first()
assert chart is not None
# Derive from CURRENT values so every run guarantees a real
# change even against a persistent test DB where prior runs
# have already mutated the chart.
chart.slice_name = f"{chart.slice_name[:60]}_x"
chart.description = f"{chart.description or ''}_x"
chart.cache_timeout = (chart.cache_timeout or 0) + 1
db.session.commit()
ver_cls = version_class(Slice)
update_tx_id = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == chart.id)
.filter(ver_cls.operation_type == 1)
.order_by(ver_cls.transaction_id.desc())
.first()
.transaction_id
)
rows = _change_rows_for(update_tx_id, entity_kind="chart", entity_id=chart.id)
assert len(rows) == 3
assert [r["sequence"] for r in rows] == [0, 1, 2]
# Sorted by field name (diff engine emits in sorted field order)
paths = [
_json.loads(r["path"]) if isinstance(r["path"], str) else r["path"]
for r in rows
]
assert paths == [["cache_timeout"], ["description"], ["slice_name"]]
def test_params_filter_add_produces_filter_kind_record(self) -> None:
"""(a) — params classification still flows through the listener.
Adds an adhoc_filter with a natural key (``subject``) derived
from the chart id so it's unique across test runs on a
persistent DB. Whatever was in ``adhoc_filters`` before stays;
we only want to confirm at least one ``kind='filter'`` record
is emitted.
"""
_persist_fixture_state()
chart = db.session.query(Slice).first()
assert chart is not None
unique_subject = (
f"col_{chart.id}_{db.session.connection().engine.url.database[-8:]}"
)
params = _json.loads(chart.params or "{}")
existing = params.get("adhoc_filters", []) or []
params["adhoc_filters"] = [
*existing,
{
"subject": unique_subject,
"operator": "==",
"comparator": "x",
"expressionType": "SIMPLE",
},
]
chart.params = _json.dumps(params)
db.session.commit()
ver_cls = version_class(Slice)
update_tx_id = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == chart.id)
.filter(ver_cls.operation_type == 1)
.order_by(ver_cls.transaction_id.desc())
.first()
.transaction_id
)
rows = _change_rows_for(update_tx_id, entity_kind="chart", entity_id=chart.id)
filter_rows = [r for r in rows if r["kind"] == "filter"]
assert len(filter_rows) >= 1, (
f"expected at least one filter record, got rows: {rows}"
)
def test_unchanged_save_produces_zero_change_records(self) -> None:
"""An edit that sets fields to identical values emits nothing."""
_persist_fixture_state()
chart = db.session.query(Slice).first()
ver_cls = version_class(Slice)
# Capture the latest tx_id BEFORE this test's save so we can
# distinguish "the no-op save produced nothing new" (the intent)
# from "prior tests left tx rows with records on them" (noise).
pre_save_tx_row = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == chart.id)
.filter(ver_cls.operation_type == 1)
.order_by(ver_cls.transaction_id.desc())
.first()
)
pre_save_tx_id = pre_save_tx_row.transaction_id if pre_save_tx_row else 0
# Touch the object (mark dirty) but assign the same value.
current_name = chart.slice_name
chart.slice_name = current_name
db.session.commit()
post_save_tx_row = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == chart.id)
.filter(ver_cls.operation_type == 1)
.filter(ver_cls.transaction_id > pre_save_tx_id)
.order_by(ver_cls.transaction_id.desc())
.first()
)
# Either no new tx at all (nothing dirty, best case), or a new
# tx with zero change records for this chart.
if post_save_tx_row is not None:
assert (
_change_rows_for(
post_save_tx_row.transaction_id,
entity_kind="chart",
entity_id=chart.id,
)
== []
)
class TestDashboardChangeRecords(SupersetTestCase):
"""Same flow for dashboards — all scalar fields land in ``kind='field'``."""
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: F811, PT004
pass
def test_dashboard_title_edit_produces_field_record(self) -> None:
_persist_fixture_state()
dashboard = db.session.query(Dashboard).first()
assert dashboard is not None
dashboard.dashboard_title = f"{dashboard.dashboard_title}_rev"
db.session.commit()
ver_cls = version_class(Dashboard)
update_tx_id = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == dashboard.id)
.filter(ver_cls.operation_type == 1)
.order_by(ver_cls.transaction_id.desc())
.first()
.transaction_id
)
rows = _change_rows_for(
update_tx_id, entity_kind="dashboard", entity_id=dashboard.id
)
assert len(rows) >= 1
field_rows = [r for r in rows if r["kind"] == "field"]
paths = [
_json.loads(r["path"]) if isinstance(r["path"], str) else r["path"]
for r in field_rows
]
assert ["dashboard_title"] in paths
class TestDatasetChildChangeRecords(SupersetTestCase):
"""T048b — column and metric diff records for dataset saves.
Two snapshots must exist for any child diff to emit: the prior
save's and the current one. The fixture ``load_birth_names_data``
has already created the dataset before these tests run; their
first commit produces snapshot #1. The test's edit produces
snapshot #2, and the listener diffs the two.
"""
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: F811, PT004
pass
def test_column_description_change_produces_column_record(self) -> None:
# pylint: disable=import-outside-toplevel
from sqlalchemy_continuum import version_class
from superset.connectors.sqla.models import SqlaTable
_persist_fixture_state()
dataset = (
db.session.query(SqlaTable)
.filter(SqlaTable.table_name == "birth_names")
.first()
)
assert dataset is not None
assert dataset.columns, "birth_names fixture should produce columns"
# First save establishes snapshot #1 (the pre-edit state).
# Scalar + child diffs won't emit anything yet because there's
# no prior snapshot to diff against.
dataset.description = f"{dataset.description or ''}_v1"
db.session.commit()
# Second save: edit a column AND touch a dataset scalar so
# the parent SqlaTable ends up in session.dirty. In real
# flows DatasetDAO.update_columns() marks the parent via its
# individual session.add / session.delete calls (T011); the
# direct-ORM test here needs an explicit parent touch.
column = dataset.columns[0]
column.description = f"{column.description or ''}_edited"
dataset.description = f"{dataset.description}_v2"
db.session.commit()
ver_cls = version_class(SqlaTable)
latest_tx_id = (
db.session.query(ver_cls.transaction_id)
.filter(ver_cls.id == dataset.id)
.filter(ver_cls.operation_type == 1)
.order_by(ver_cls.transaction_id.desc())
.first()
.transaction_id
)
rows = _change_rows_for(
latest_tx_id, entity_kind="dataset", entity_id=dataset.id
)
column_rows = [r for r in rows if r["kind"] == "column"]
assert len(column_rows) >= 1, (
f"expected at least one kind='column' record, got {rows}"
)
class TestBaselineProducesZeroChangeRecords(SupersetTestCase):
"""(f) — operation_type=0 (baseline / INSERT) transactions emit no records."""
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: F811, PT004
pass
def test_baseline_transaction_has_no_change_records_for_this_entity(
self,
) -> None:
"""(f) — baseline tx produces zero records *for that entity*.
A single transaction can touch multiple entities (fixture loads,
import pipelines). A tx that's a baseline for this chart might
still legitimately carry update records for some *other* entity
that shared the flush. The spec's M4 clarification means:
records filtered to this entity's (tx, entity_kind, entity_id)
are empty for its baseline tx.
"""
_persist_fixture_state()
chart = db.session.query(Slice).first()
chart.slice_name = f"{chart.slice_name[:64]}_force_baseline"
db.session.commit()
ver_cls = version_class(Slice)
rows_by_tx = (
db.session.query(ver_cls.transaction_id, ver_cls.operation_type)
.filter(ver_cls.id == chart.id)
.order_by(ver_cls.transaction_id.asc())
.all()
)
baseline_tx_ids = [tx for tx, op in rows_by_tx if op == 0]
assert baseline_tx_ids, "expected at least one baseline version row"
for tx_id in baseline_tx_ids:
records_for_this_chart = _change_rows_for(
tx_id, entity_kind="chart", entity_id=chart.id
)
assert records_for_this_chart == [], (
f"baseline tx {tx_id} unexpectedly has change records for "
f"chart id={chart.id}: {records_for_this_chart}"
)

View File

@@ -0,0 +1,184 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""T055 — ``ETag`` header emission on entity GETs / PUTs / version endpoints."""
from __future__ import annotations
import pytest
from superset.connectors.sqla.models import SqlaTable
from superset.daos.version import VersionDAO
from superset.extensions import db
from superset.models.dashboard import Dashboard
from superset.models.slice import Slice
from superset.utils import json as _json
from tests.integration_tests.base_tests import SupersetTestCase
from tests.integration_tests.constants import ADMIN_USERNAME
from tests.integration_tests.fixtures.birth_names_dashboard import ( # noqa: F401
load_birth_names_dashboard_with_slices,
load_birth_names_data,
)
def _expected_etag(model_cls: type, entity_id: int, entity_uuid) -> str:
version_uuid = VersionDAO.current_live_version_uuid(
model_cls, entity_id, entity_uuid
)
return f'"{version_uuid}"'
class TestETagEmission(SupersetTestCase):
"""ETag header on entity detail, save response, and version endpoints."""
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: PT004, F811
pass
def test_chart_get_emits_etag_matching_current_live_version(self) -> None:
db.session.commit()
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Girls").first()
)
assert chart is not None
expected = _expected_etag(Slice, chart.id, chart.uuid)
self.login(ADMIN_USERNAME)
rv = self.client.get(f"/api/v1/chart/{chart.id}")
assert rv.status_code == 200
assert rv.headers.get("ETag") == expected
def test_chart_put_emits_etag_matching_new_live_version(self) -> None:
db.session.commit()
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Girls").first()
)
assert chart is not None
chart_id = chart.id
original_name = chart.slice_name
self.login(ADMIN_USERNAME)
rv = self.client.put(
f"/api/v1/chart/{chart_id}",
json={"slice_name": "etag-put-test"},
)
assert rv.status_code == 200
body = _json.loads(rv.data.decode("utf-8"))
assert body["new_version_uuid"] is not None
assert rv.headers.get("ETag") == f'"{body["new_version_uuid"]}"'
# Cleanup
chart.slice_name = original_name
db.session.commit()
def test_chart_list_versions_emits_etag(self) -> None:
db.session.commit()
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Girls").first()
)
assert chart is not None
expected = _expected_etag(Slice, chart.id, chart.uuid)
self.login(ADMIN_USERNAME)
rv = self.client.get(f"/api/v1/chart/{chart.uuid}/versions/")
assert rv.status_code == 200
assert rv.headers.get("ETag") == expected
def test_chart_get_version_emits_etag(self) -> None:
db.session.commit()
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Girls").first()
)
assert chart is not None
expected = _expected_etag(Slice, chart.id, chart.uuid)
self.login(ADMIN_USERNAME)
rv = self.client.get(f"/api/v1/chart/{chart.uuid}/versions/")
body = _json.loads(rv.data.decode("utf-8"))
version_uuid = body["result"][0]["version_uuid"]
rv = self.client.get(f"/api/v1/chart/{chart.uuid}/versions/{version_uuid}/")
assert rv.status_code == 200
# ETag reflects the live version, not the queried version.
assert rv.headers.get("ETag") == expected
def test_dashboard_get_emits_etag_matching_current_live_version(self) -> None:
db.session.commit()
dashboard: Dashboard = (
db.session.query(Dashboard)
.filter(Dashboard.dashboard_title == "USA Births Names")
.first()
)
assert dashboard is not None
expected = _expected_etag(Dashboard, dashboard.id, dashboard.uuid)
self.login(ADMIN_USERNAME)
rv = self.client.get(f"/api/v1/dashboard/{dashboard.id}")
assert rv.status_code == 200
assert rv.headers.get("ETag") == expected
def test_dataset_get_emits_etag_matching_current_live_version(self) -> None:
db.session.commit()
dataset: SqlaTable = (
db.session.query(SqlaTable)
.filter(SqlaTable.table_name == "birth_names")
.first()
)
assert dataset is not None
expected = _expected_etag(SqlaTable, dataset.id, dataset.uuid)
self.login(ADMIN_USERNAME)
rv = self.client.get(f"/api/v1/dataset/{dataset.id}")
assert rv.status_code == 200
assert rv.headers.get("ETag") == expected
def test_etag_absent_when_entity_has_no_version_rows(self) -> None:
"""``set_version_etag`` is a no-op when the entity has no version rows."""
from sqlalchemy_continuum import version_class
db.session.commit()
chart: Slice = (
db.session.query(Slice).filter(Slice.slice_name == "Girls").first()
)
assert chart is not None
chart_id = chart.id
chart_uuid = chart.uuid
ver_cls = version_class(Slice)
db.session.query(ver_cls).filter(ver_cls.id == chart_id).delete(
synchronize_session=False
)
db.session.commit()
try:
self.login(ADMIN_USERNAME)
rv = self.client.get(f"/api/v1/chart/{chart_id}")
assert rv.status_code == 200
assert rv.headers.get("ETag") is None
finally:
# Always restore the chart's name + version rows so downstream
# tests in this class don't see corrupted fixture state, even
# if the assertions above fail.
self.client.put(
f"/api/v1/chart/{chart_id}",
json={"slice_name": "Girls"},
)
# Sanity-check that version rows came back.
assert (
VersionDAO.current_live_version_uuid(Slice, chart_id, chart_uuid)
is not None
)

View File

@@ -0,0 +1,272 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""T044 — Performance validation for entity version history.
Skipped by default. Run on demand:
SUPERSET_PERF_VALIDATION=1 pytest \
tests/integration_tests/versioning/perf_validation_tests.py -v -s
Measures the three success criteria defined in the spec:
* SC-002: version list endpoint responds in under 1 second
* SC-003: restore endpoint completes in under 3 seconds
* SC-004: save path p95 overhead under 50 ms with Continuum tracking
on vs. off (FR-014)
The test prints a summary table suitable for pasting into the PR
description. It also asserts each target so regressions fail loudly
when the harness is re-run.
"""
from __future__ import annotations
import os
import statistics
import time
from typing import Any
import pytest
import sqlalchemy as sa
from sqlalchemy_continuum import version_class, versioning_manager
from superset.extensions import db
from superset.models.slice import Slice
from tests.integration_tests.base_tests import SupersetTestCase
from tests.integration_tests.constants import ADMIN_USERNAME
from tests.integration_tests.fixtures.birth_names_dashboard import ( # noqa: F401
load_birth_names_dashboard_with_slices,
load_birth_names_data,
)
SKIP_REASON = "Performance validation is manual. Set SUPERSET_PERF_VALIDATION=1 to run."
# Thresholds from spec.md §Success Criteria.
LIST_ENDPOINT_MAX_MS = 1000 # SC-002
RESTORE_ENDPOINT_MAX_MS = 3000 # SC-003
SAVE_OVERHEAD_P95_MAX_MS = 50 # SC-004
def _save_chart_once(chart: Slice, suffix: str) -> None:
"""One ORM-level save path, mimicking what ChartDAO.update does."""
chart.slice_name = f"{chart.slice_name[:64]}_{suffix}"
db.session.commit()
def _timings_ms(seconds: list[float]) -> dict[str, float]:
ms = sorted(s * 1000.0 for s in seconds)
return {
"p50": statistics.median(ms),
"p95": ms[int(len(ms) * 0.95) - 1] if len(ms) >= 20 else max(ms),
"max": max(ms),
"n": len(ms),
}
@pytest.mark.skipif(
not os.environ.get("SUPERSET_PERF_VALIDATION"),
reason=SKIP_REASON,
)
class PerfValidationTests(SupersetTestCase):
"""Runs only when SUPERSET_PERF_VALIDATION=1 is set."""
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices: Any) -> None: # noqa: F811, PT004
pass
def _seed_chart_with_n_versions(self, n: int) -> Slice:
"""Save a chart N times to produce N version rows."""
chart = db.session.query(Slice).first()
assert chart is not None, "birth_names fixture should provide charts"
for i in range(n):
_save_chart_once(chart, f"v{i}")
db.session.commit()
return chart
def test_sc002_list_endpoint_under_1s(self) -> None:
"""SC-002: list endpoint responds in under 1 second."""
self.login(ADMIN_USERNAME)
# Generate enough versions to exercise the retention-capped state.
chart = self._seed_chart_with_n_versions(24)
chart_uuid = str(chart.uuid)
url = f"/api/v1/chart/{chart_uuid}/versions/"
# Warm up the endpoint once (JIT caching, mapper configuration, etc.)
self.client.get(url)
timings: list[float] = []
for _ in range(10):
t0 = time.perf_counter()
response = self.client.get(url)
timings.append(time.perf_counter() - t0)
assert response.status_code == 200
stats = _timings_ms(timings)
print(
f"\n[SC-002] GET /versions/ (24 versions) "
f"p50={stats['p50']:.1f}ms p95={stats['p95']:.1f}ms "
f"max={stats['max']:.1f}ms n={stats['n']}"
)
assert stats["p95"] < LIST_ENDPOINT_MAX_MS, (
f"SC-002 failed: list endpoint p95 {stats['p95']:.1f}ms "
f">= {LIST_ENDPOINT_MAX_MS}ms"
)
def test_sc003_restore_endpoint_under_3s(self) -> None:
"""SC-003: restore endpoint completes in under 3 seconds."""
self.login(ADMIN_USERNAME)
chart = self._seed_chart_with_n_versions(5)
chart_uuid = str(chart.uuid)
list_response = self.client.get(f"/api/v1/chart/{chart_uuid}/versions/")
assert list_response.status_code == 200
versions = list_response.get_json()["result"]
assert len(versions) >= 2, "need at least two versions to restore"
target_version_uuid = versions[-1]["version_uuid"]
restore_url = (
f"/api/v1/chart/{chart_uuid}/versions/{target_version_uuid}/restore"
)
# Warm up once
self.client.post(restore_url)
timings: list[float] = []
for _ in range(5):
t0 = time.perf_counter()
response = self.client.post(restore_url)
timings.append(time.perf_counter() - t0)
assert response.status_code == 200
stats = _timings_ms(timings)
print(
f"\n[SC-003] POST /restore chart "
f"p50={stats['p50']:.1f}ms max={stats['max']:.1f}ms n={stats['n']}"
)
assert stats["max"] < RESTORE_ENDPOINT_MAX_MS, (
f"SC-003 failed: restore max {stats['max']:.1f}ms "
f">= {RESTORE_ENDPOINT_MAX_MS}ms"
)
def test_sc004_save_overhead_under_50ms(self) -> None:
"""SC-004: save path p95 overhead under 50ms (FR-014).
Toggling Continuum on and off mid-process corrupts its internal
``units_of_work`` state and is not a reliable measurement. Instead
this test directly measures the wall-clock time spent inside the
four session-level listeners Continuum attaches to
``sa.orm.session.Session`` — ``before_flush``, ``after_flush``,
``after_commit``, ``after_rollback`` — plus Superset's own
baseline / snapshot / retention-prune listeners (attached to
``db.session``). The cumulative listener time per save is the
marginal overhead version capture adds over a save with
versioning removed entirely, because without these listeners
the ORM would not execute any of that code.
The approach:
1. Wrap each known listener with a timing proxy that adds its
wall-clock time to a per-save accumulator.
2. Save the same chart N times, recording each save's
accumulator value.
3. Compute p50 / p95 of the per-save overhead.
This matches the measurement intent of SC-004 (how much does
versioning cost per save) without the fragility of toggling
Continuum mid-test.
"""
self.login(ADMIN_USERNAME)
chart = db.session.query(Slice).first()
assert chart is not None
# Per-save accumulator incremented by the wrapped listeners.
acc = [0.0]
def wrap_listener(original: Any) -> Any:
def wrapper(*args: Any, **kwargs: Any) -> Any:
t0 = time.perf_counter()
try:
return original(*args, **kwargs)
finally:
acc[0] += time.perf_counter() - t0
wrapper.__wrapped__ = original # type: ignore[attr-defined]
return wrapper
# Instrument Continuum's four session listeners by detaching the
# bound method, wrapping, and re-attaching under a single-use
# listener handle we can cleanly remove on teardown.
session_target = sa.orm.session.Session
attached: list[tuple[str, Any]] = []
for event_name, listener in list(versioning_manager.session_listeners.items()):
sa.event.remove(session_target, event_name, listener)
wrapped = wrap_listener(listener)
sa.event.listen(session_target, event_name, wrapped)
attached.append((event_name, wrapped))
iterations = 100
warmup = 5
try:
# Warmup (first baseline INSERT, JIT, cache warming).
for i in range(warmup):
_save_chart_once(chart, f"warm_{i}")
acc[0] = 0.0
total_timings: list[float] = []
overhead_timings: list[float] = []
for i in range(iterations):
acc[0] = 0.0
t0 = time.perf_counter()
_save_chart_once(chart, f"run_{i}")
total_timings.append(time.perf_counter() - t0)
overhead_timings.append(acc[0])
finally:
for event_name, wrapped in attached:
sa.event.remove(session_target, event_name, wrapped)
sa.event.listen(
session_target,
event_name,
wrapped.__wrapped__,
)
total = _timings_ms(total_timings)
overhead = _timings_ms(overhead_timings)
ver_cls = version_class(Slice)
produced = db.session.query(ver_cls).filter(ver_cls.id == chart.id).count()
print(
f"\n[SC-004] save iterations={iterations} chart_id={chart.id} "
f"version_rows_produced={produced}"
)
print(
f"[SC-004] full save: "
f"p50={total['p50']:.2f}ms p95={total['p95']:.2f}ms "
f"max={total['max']:.2f}ms"
)
print(
f"[SC-004] version-cap overhead: "
f"p50={overhead['p50']:.2f}ms p95={overhead['p95']:.2f}ms "
f"max={overhead['max']:.2f}ms"
)
assert overhead["p95"] < SAVE_OVERHEAD_P95_MAX_MS, (
f"SC-004 failed: version-capture p95 overhead "
f"{overhead['p95']:.2f}ms >= {SAVE_OVERHEAD_P95_MAX_MS}ms"
)

View File

@@ -0,0 +1,330 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""FR-026 — ``SkipUnmodifiedPlugin`` integration tests.
Locks in the behavior that owners-only saves and content-equivalent
re-saves do *not* mint version rows. Exercises the plugin's
``_matches_previous_version`` comparator across the Dashboard's three
column-type families (String, Text, MediumText) so a future column-type
change can't silently regress to "always create version rows".
"""
from __future__ import annotations
from typing import Any
import pytest
from sqlalchemy_continuum import version_class
from superset.connectors.sqla.models import SqlaTable, TableColumn
from superset.extensions import db
from superset.models.dashboard import Dashboard
from superset.models.slice import Slice
from superset.utils import json as _json
from tests.integration_tests.base_tests import SupersetTestCase
from tests.integration_tests.constants import ADMIN_USERNAME
from tests.integration_tests.fixtures.birth_names_dashboard import ( # noqa: F401
load_birth_names_dashboard_with_slices,
load_birth_names_data,
)
def _dashboard_version_count(dashboard_id: int) -> int:
ver_cls = version_class(Dashboard)
return db.session.query(ver_cls).filter(ver_cls.id == dashboard_id).count()
def _slice_version_count(slice_id: int) -> int:
ver_cls = version_class(Slice)
return db.session.query(ver_cls).filter(ver_cls.id == slice_id).count()
def _dataset_version_count(dataset_id: int) -> int:
ver_cls = version_class(SqlaTable)
return db.session.query(ver_cls).filter(ver_cls.id == dataset_id).count()
class TestSkipUnmodifiedPlugin(SupersetTestCase):
"""FR-026 — version rows are not minted for content-equivalent updates."""
@pytest.fixture(autouse=True)
def _load_data(self, load_birth_names_dashboard_with_slices): # noqa: PT004, F811
pass
def _get_dashboard(self) -> Dashboard:
db.session.commit()
dash = (
db.session.query(Dashboard)
.filter(Dashboard.dashboard_title == "USA Births Names")
.first()
)
assert dash is not None
return dash
def _put(self, pk: int, body: dict[str, Any]) -> None:
rv = self.client.put(f"/api/v1/dashboard/{pk}", json=body)
assert rv.status_code == 200, rv.data
def test_owners_only_edit_does_not_create_version(self) -> None:
"""Saving a dashboard with only owner changes is a no-op for
version-row creation."""
dash = self._get_dashboard()
dash_id = dash.id
title = dash.dashboard_title
original_owner_ids = [o.id for o in dash.owners]
self.login(ADMIN_USERNAME)
# Force a known baseline state with one save.
self._put(dash_id, {"dashboard_title": title})
db.session.expire_all()
before = _dashboard_version_count(dash_id)
try:
# Now save with only ``owners`` changed (toggle: drop one,
# then put it back). String / Text / MediumText columns are
# unchanged so the plugin should skip both saves.
new_owners = [oid for oid in original_owner_ids if oid != 1] or []
self._put(dash_id, {"dashboard_title": title, "owners": new_owners})
db.session.expire_all()
mid = _dashboard_version_count(dash_id)
assert mid == before, (
f"owners-only edit minted a version row (before={before}, after={mid})"
)
self._put(dash_id, {"dashboard_title": title, "owners": original_owner_ids})
db.session.expire_all()
after = _dashboard_version_count(dash_id)
assert after == before, (
f"second owners-only edit minted a version row "
f"(before={before}, after={after})"
)
finally:
# Always restore original ownership.
self._put(dash_id, {"dashboard_title": title, "owners": original_owner_ids})
def test_re_save_with_identical_values_does_not_create_version(self) -> None:
"""Submitting the same scalar values back through PUT is a no-op
for version creation — exercises the json_metadata re-serialize
case (``set_dash_metadata`` rewrites the column with a different
byte sequence; plugin must compare against the prior shadow row
and skip)."""
dash = self._get_dashboard()
dash_id = dash.id
title = dash.dashboard_title
existing_metadata = dash.json_metadata or "{}"
self.login(ADMIN_USERNAME)
# Prime: one real save to ensure the json_metadata is in canonical
# post-set_dash_metadata form.
self._put(
dash_id,
{"dashboard_title": title, "json_metadata": existing_metadata},
)
db.session.expire_all()
before = _dashboard_version_count(dash_id)
# Re-submit identical content. set_dash_metadata will round-trip
# the json — the resulting byte sequence might differ from the
# request body but must equal the previous stored value.
self._put(
dash_id,
{"dashboard_title": title, "json_metadata": existing_metadata},
)
db.session.expire_all()
after = _dashboard_version_count(dash_id)
assert after == before, (
f"identical re-save minted a version row (before={before}, after={after})"
)
def test_actual_change_creates_version(self) -> None:
"""A real scalar change MUST mint a version row — the plugin
only suppresses no-ops, never legitimate edits."""
dash = self._get_dashboard()
dash_id = dash.id
original_title = dash.dashboard_title
self.login(ADMIN_USERNAME)
before = _dashboard_version_count(dash_id)
try:
self._put(dash_id, {"dashboard_title": "fr-026-modified-title"})
db.session.expire_all()
after = _dashboard_version_count(dash_id)
assert after == before + 1, (
f"real edit failed to mint a version row "
f"(before={before}, after={after})"
)
finally:
self._put(dash_id, {"dashboard_title": original_title})
def test_chart_slice_name_change_creates_version(self) -> None:
"""Same assertion for ``Slice`` (covers the ``String`` column path
on a different entity type)."""
db.session.commit()
chart = db.session.query(Slice).filter(Slice.slice_name == "Girls").first()
assert chart is not None
chart_id = chart.id
self.login(ADMIN_USERNAME)
before = _slice_version_count(chart_id)
try:
rv = self.client.put(
f"/api/v1/chart/{chart_id}",
json={"slice_name": "fr-026-renamed"},
)
assert rv.status_code == 200
db.session.expire_all()
after = _slice_version_count(chart_id)
assert after == before + 1
finally:
self.client.put(f"/api/v1/chart/{chart_id}", json={"slice_name": "Girls"})
def test_dashboard_json_metadata_subkey_change_creates_version(self) -> None:
"""Editing a non-audit key inside ``json_metadata`` MUST mint a
version row — exercises the MediumText column path past the
plugin's content-equality check."""
dash = self._get_dashboard()
dash_id = dash.id
title = dash.dashboard_title
original_metadata = dash.json_metadata or "{}"
self.login(ADMIN_USERNAME)
before = _dashboard_version_count(dash_id)
try:
md = _json.loads(original_metadata)
md["color_scheme"] = "fr026TestPalette"
self._put(
dash_id,
{"dashboard_title": title, "json_metadata": _json.dumps(md)},
)
db.session.expire_all()
after = _dashboard_version_count(dash_id)
assert after == before + 1, (
f"json_metadata edit failed to mint a version row "
f"(before={before}, after={after})"
)
finally:
self._put(
dash_id,
{"dashboard_title": title, "json_metadata": original_metadata},
)
def test_map_label_colors_only_change_does_not_create_version(self) -> None:
"""Re-stamped ``map_label_colors`` (and other frontend-derived
audit sub-keys) inside ``json_metadata`` MUST NOT mint a version
row. The frontend regenerates this map from the
``LabelsColorMap`` singleton on every save, so two saves with no
user-authored change emit different bytes for the column. The
diff engine drops these sub-keys via
``DASHBOARD_JSON_METADATA_AUDIT_KEYS``; the skip-plugin's
comparator must apply the same filter or every save mints an
empty-changes "Baseline" row in the UI.
"""
dash = self._get_dashboard()
dash_id = dash.id
title = dash.dashboard_title
original_metadata = dash.json_metadata or "{}"
self.login(ADMIN_USERNAME)
# Prime with the existing metadata so the next save's only
# delta is the re-stamped ``map_label_colors``.
self._put(
dash_id,
{"dashboard_title": title, "json_metadata": original_metadata},
)
db.session.expire_all()
before = _dashboard_version_count(dash_id)
try:
md = _json.loads(original_metadata)
md["map_label_colors"] = {
"test-label-fr026": "#abcdef",
"another-label": "#123456",
}
self._put(
dash_id,
{"dashboard_title": title, "json_metadata": _json.dumps(md)},
)
db.session.expire_all()
after = _dashboard_version_count(dash_id)
assert after == before, (
f"map_label_colors-only edit minted a version row "
f"(before={before}, after={after})"
)
finally:
self._put(
dash_id,
{"dashboard_title": title, "json_metadata": original_metadata},
)
def test_dataset_column_edit_creates_parent_version(self) -> None:
"""Editing a ``TableColumn`` description MUST mint a parent
``tables_version`` row even though the parent's own scalars are
unchanged. Without the force-touch in
``baseline._force_parent_dirty_on_child_change``, child-only
edits leave the dataset's version-history dropdown empty.
"""
db.session.commit()
dataset = (
db.session.query(SqlaTable)
.filter(SqlaTable.table_name == "birth_names")
.first()
)
assert dataset is not None
dataset_id = dataset.id
column = (
db.session.query(TableColumn)
.filter(TableColumn.table_id == dataset_id)
.order_by(TableColumn.id)
.first()
)
assert column is not None
original_description = column.description
self.login(ADMIN_USERNAME)
before = _dataset_version_count(dataset_id)
try:
rv = self.client.put(
f"/api/v1/dataset/{dataset_id}",
json={
"columns": [
{
"id": column.id,
"column_name": column.column_name,
"description": "fr-026 child-edit forces parent shadow",
},
],
},
)
assert rv.status_code == 200, rv.data
db.session.expire_all()
after = _dataset_version_count(dataset_id)
assert after == before + 1, (
f"column edit did not force a parent dataset shadow row "
f"(before={before}, after={after})"
)
finally:
self.client.put(
f"/api/v1/dataset/{dataset_id}",
json={
"columns": [
{
"id": column.id,
"column_name": column.column_name,
"description": original_description,
},
],
},
)

View File

@@ -107,6 +107,12 @@ def test_import_adds_dashboard_charts(mocker: MockerFixture, session: Session) -
expected_number_of_charts = len(charts_config_1)
ImportAssetsCommand._import(base_configs)
# ``ImportAssetsCommand.run()`` is wrapped in ``@transaction``,
# so each production invocation gets its own DB (and Continuum)
# transaction. Calling ``_import`` directly twice in the same
# session would otherwise emit conflicting M2M shadow rows for
# ``dashboard_slices`` within a single Continuum tx.
db.session.commit()
ImportAssetsCommand._import(new_configs)
dashboard_ids = db.session.scalars(
select(dashboard_slices.c.dashboard_id).distinct()
@@ -574,6 +580,12 @@ def test_import_removes_dashboard_charts(
expected_number_of_charts = len(charts_config_2)
ImportAssetsCommand._import(base_configs)
# ``ImportAssetsCommand.run()`` is wrapped in ``@transaction``,
# so each production invocation gets its own DB (and Continuum)
# transaction. Calling ``_import`` directly twice in the same
# session would otherwise emit conflicting M2M shadow rows for
# ``dashboard_slices`` within a single Continuum tx.
db.session.commit()
ImportAssetsCommand._import(new_configs)
dashboard_ids = db.session.scalars(
select(dashboard_slices.c.dashboard_id).distinct()

View File

@@ -0,0 +1,97 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Unit tests for ``VersionDAO``.
Exercises the pure helpers (``derive_version_uuid``) and the
``restore_version`` control-flow branches that can be covered with mocks
alone. Full round-trip scalar restore / audit stamping / non-destructive
behaviour is covered by the integration tests in
``tests/integration_tests/{charts,dashboards,datasets}/version_history_tests.py``
— those need a real Continuum stack and live DB, which unit tests here
deliberately avoid.
"""
from __future__ import annotations
from unittest.mock import MagicMock, patch
from uuid import UUID
from superset.daos.version import (
derive_version_uuid,
VERSION_UUID_NAMESPACE,
VersionDAO,
)
# ---------------------------------------------------------------------------
# derive_version_uuid
# ---------------------------------------------------------------------------
def test_derive_version_uuid_is_deterministic():
entity = UUID("14f48794-ebfa-4f60-a26a-582c49132f1b")
assert derive_version_uuid(entity, 42) == derive_version_uuid(entity, 42)
def test_derive_version_uuid_differs_across_tx():
entity = UUID("14f48794-ebfa-4f60-a26a-582c49132f1b")
assert derive_version_uuid(entity, 1) != derive_version_uuid(entity, 2)
def test_derive_version_uuid_differs_across_entities():
tx = 42
a = UUID("14f48794-ebfa-4f60-a26a-582c49132f1b")
b = UUID("b388a396-cbca-4299-a443-3e41e870e2c2")
assert derive_version_uuid(a, tx) != derive_version_uuid(b, tx)
def test_derive_version_uuid_is_v5():
"""UUIDs must be version 5 — changing this is a breaking change."""
entity = UUID("14f48794-ebfa-4f60-a26a-582c49132f1b")
result = derive_version_uuid(entity, 1)
assert result.version == 5
def test_derive_version_uuid_uses_fixed_namespace():
"""Asserts the namespace constant hasn't drifted (changing it
invalidates every cached version_uuid — see the constant's comment)."""
assert VERSION_UUID_NAMESPACE == UUID("7a6f5d9b-4c3b-5d8e-9a1c-0e2b4c6d8f10")
# ---------------------------------------------------------------------------
# restore_version control-flow — unknown entity / out-of-range version
# ---------------------------------------------------------------------------
@patch("superset.versioning.restore.find_active_by_uuid", return_value=None)
def test_restore_version_returns_none_for_unknown_entity(mock_find):
"""Unknown entity UUID → caller raises 404."""
result = VersionDAO.restore_version(
MagicMock(__name__="Dashboard"),
UUID("00000000-0000-0000-0000-000000000000"),
0,
)
assert result is None
# Out-of-range version_num (the lookup query returns None) is verified
# end-to-end in the integration tests
# (``test_restore_returns_404_for_unknown_version_uuid`` in the three
# {charts,dashboards,datasets}/version_history_tests.py suites). A pure
# unit-level version of that test would require mocking the full
# SQLAlchemy expression tree — including ``ver_cls.operation_type != 0``
# — which is fragile and doesn't add coverage beyond what the
# integration path already provides.

View File

@@ -72,6 +72,13 @@ def test_dashboard_import_with_overwrite_replaces_charts(
initial_chart_ids = db.session.scalars(select(dashboard_slices.c.slice_id)).all()
assert len(initial_chart_ids) == 2
# ``ImportDashboardsCommand.run()`` is wrapped in ``@transaction``,
# so each production invocation gets its own DB (and Continuum)
# transaction. Calling ``_import`` directly twice in the same
# session would otherwise emit conflicting M2M shadow rows for
# ``dashboard_slices`` within a single Continuum tx.
db.session.commit()
# Second import: same dashboard with only 1 chart (charts_config_2 has 1 chart)
updated_configs = {
**copy.deepcopy(databases_config),

View File

@@ -0,0 +1,144 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Unit tests for the composite-PK association-tables migration (revision
2bee73611e32). Verifies the post-migration constraint enforcement: duplicate
``(fk1, fk2)`` insertions fail with IntegrityError, distinct pairs succeed.
Schema is built from the live ORM ``Table`` definitions via
``metadata.create_all(engine)`` against in-memory SQLite. This reflects the
post-T015T018 ORM model state (composite-PK), independent of whether the
Alembic migration has run against the test DB. The two should agree.
"""
import pytest
import sqlalchemy as sa
from sqlalchemy.exc import IntegrityError
# (table_name, fk1_col, fk2_col, fk1_parent_table, fk2_parent_table)
# Parent-table names are needed to build the FK targets in the in-memory schema.
AFFECTED_TABLES = [
("dashboard_roles", "dashboard_id", "role_id", "dashboards", "ab_role"),
("dashboard_slices", "dashboard_id", "slice_id", "dashboards", "slices"),
("dashboard_user", "user_id", "dashboard_id", "ab_user", "dashboards"),
(
"report_schedule_user",
"user_id",
"report_schedule_id",
"ab_user",
"report_schedule",
),
(
"rls_filter_roles",
"role_id",
"rls_filter_id",
"ab_role",
"row_level_security_filters",
),
(
"rls_filter_tables",
"table_id",
"rls_filter_id",
"tables",
"row_level_security_filters",
),
("slice_user", "user_id", "slice_id", "ab_user", "slices"),
("sqlatable_user", "user_id", "table_id", "ab_user", "tables"),
]
def _build_in_memory_schema(
table_name: str, fk1: str, fk2: str, fk1_parent: str, fk2_parent: str
) -> tuple[sa.engine.Engine, sa.Table]:
"""Build an in-memory SQLite schema with two minimal parent tables and
the junction table under test (composite-PK shape). Returns the engine
and the junction-table object for inserts."""
metadata = sa.MetaData()
sa.Table(
fk1_parent,
metadata,
sa.Column("id", sa.Integer, primary_key=True),
)
if fk2_parent != fk1_parent:
sa.Table(
fk2_parent,
metadata,
sa.Column("id", sa.Integer, primary_key=True),
)
junction = sa.Table(
table_name,
metadata,
sa.Column(
fk1,
sa.Integer,
sa.ForeignKey(f"{fk1_parent}.id"),
primary_key=True,
),
sa.Column(
fk2,
sa.Integer,
sa.ForeignKey(f"{fk2_parent}.id"),
primary_key=True,
),
)
engine = sa.create_engine("sqlite:///:memory:")
metadata.create_all(engine)
# Seed parent rows so the FK constraints can be satisfied.
# Identifiers come from the AFFECTED_TABLES test parameter list, not user input.
with engine.begin() as conn:
conn.execute(
sa.text(f"INSERT INTO {fk1_parent} (id) VALUES (1), (2)") # noqa: S608
)
if fk2_parent != fk1_parent:
conn.execute(
sa.text(f"INSERT INTO {fk2_parent} (id) VALUES (1), (2)") # noqa: S608
)
return engine, junction
@pytest.mark.parametrize("table,fk1,fk2,fk1_parent,fk2_parent", AFFECTED_TABLES)
def test_duplicate_insert_rejected(
table: str, fk1: str, fk2: str, fk1_parent: str, fk2_parent: str
) -> None:
"""Inserting the same ``(fk1, fk2)`` pair twice raises ``IntegrityError``.
Verifies SC-004 / FR-007 — the composite primary key enforces uniqueness
at the database level on every affected table.
"""
engine, junction = _build_in_memory_schema(table, fk1, fk2, fk1_parent, fk2_parent)
with engine.begin() as conn:
conn.execute(junction.insert().values({fk1: 1, fk2: 1}))
with pytest.raises(IntegrityError):
conn.execute(junction.insert().values({fk1: 1, fk2: 1}))
@pytest.mark.parametrize("table,fk1,fk2,fk1_parent,fk2_parent", AFFECTED_TABLES)
def test_distinct_pairs_accepted(
table: str, fk1: str, fk2: str, fk1_parent: str, fk2_parent: str
) -> None:
"""Two distinct ``(fk1, fk2)`` pairs both succeed.
Sanity check that the PK isn't accidentally a single-column constraint
(which would reject ``(1, 1)`` and ``(1, 2)`` as a duplicate on column 1).
"""
engine, junction = _build_in_memory_schema(table, fk1, fk2, fk1_parent, fk2_parent)
with engine.begin() as conn:
conn.execute(junction.insert().values({fk1: 1, fk2: 1}))
conn.execute(junction.insert().values({fk1: 1, fk2: 2}))
result = conn.execute(
sa.text(f"SELECT COUNT(*) FROM {table}") # noqa: S608
).scalar_one()
assert result == 2

View File

@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

File diff suppressed because it is too large Load Diff