Compare commits

...

6 Commits

Author SHA1 Message Date
Claude Code
6671c90103 test(encrypt): cover GCM->CBC rollback re-encryption
Adds a regression test for the reverse engine migration (--engine aes): an
AES-GCM value must be re-encrypted back to AES-CBC, not skipped. This guards
the idempotency fast-path, which decrypts in the target form first — since the
rollback target (AES-CBC) is unauthenticated, the test confirms it does not
mis-read GCM ciphertext and wrongly skip the value.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 11:50:36 -07:00
Evan
0fe86ea0c3 test(encrypt): add CLI coverage for re-encrypt-secrets --engine option
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 16:37:26 -07:00
Evan
731a476acc fix(encrypt): avoid logging config-sourced engine value (CodeQL)
CodeQL's clear-text-logging query flags interpolating any app-config
value into a log because the config also holds SECRET_KEY. The
unrecognized-engine warning now lists only the valid engine names,
which is enough to diagnose a typo.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 11:44:14 -07:00
Evan
05d87edf27 fix(encrypt): previous-key decryptor honors column engine; case-insensitive --engine; warn on unknown engine
Address bot review on #40654:
- SecretsMigrator now tries the previous key under both the column's
  configured engine and the historical AES-CBC engine, so SECRET_KEY
  rotation works for AES-GCM deployments (not just CBC) while still
  reading CBC source data after an engine-flipped-first migration.
- re-encrypt-secrets --engine accepts case-insensitive engine names.
- Adapter logs a warning when SQLALCHEMY_ENCRYPTED_FIELD_ENGINE is
  unrecognized before falling back to AES-CBC.
- config.py points operators at the SQLALCHEMY_ENCRYPTED_FIELD_ENGINE
  knob as the preferred way to switch engines.
- Add unit test covering AES-GCM SECRET_KEY rotation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 11:41:46 -07:00
Claude Code
782c1db8d8 feat(encrypt): add CBC→GCM re-encryption migrator + CLI engine flag
Phase 2 of the authenticated-encryption-at-rest proposal. Teaches the
existing SecretsMigrator an engine-migration mode (decrypt with the source
engine under the current SECRET_KEY, re-encrypt with a target engine) and
exposes it via `superset re-encrypt-secrets --engine aes-gcm`. The run is
transactional and idempotent per column, so existing installs can move from
AES-CBC to authenticated AES-GCM without bricking stored secrets. Default
engine is unchanged ("aes"); behavior is opt-in. Adds UPDATING.md runbook and
updates the SIP to reflect Phases 1–2 shipping here.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 11:41:45 -07:00
Claude Code
f748de3423 feat(encrypt): make encryption engine selectable (AES-GCM support) [DRAFT/SIP]
App-encrypted fields (DB passwords, SSH tunnel creds, OAuth tokens) use
sqlalchemy_utils EncryptedType, which defaults to AES-CBC — unauthenticated
encryption vulnerable to bit-flipping/tamper of ciphertext at rest. AES-GCM is
authenticated and fails closed on tampering.

Phase 1 (this change, backward compatible): add a configurable encryption
engine via SQLALCHEMY_ENCRYPTED_FIELD_ENGINE ("aes" | "aes-gcm"), defaulting to
"aes" so existing deployments are unchanged. The default SQLAlchemyUtilsAdapter
now honors it; an explicit engine kwarg still wins. New installs can opt into
AES-GCM with one line instead of writing a custom adapter.

Includes docs/sip/authenticated-encryption-at-rest.md describing the full
proposal: Phase 2 (a CBC->GCM re-encryption migrator built on SecretsMigrator)
and Phase 3 (flip default for fresh installs). The instance-wide re-encryption
migration is intentionally NOT included here — it is the subject of the SIP.

DRAFT: flipping the engine on a populated DB without the Phase 2 migrator makes
existing secrets undecryptable; this PR only adds the safe opt-in plumbing for
discussion.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 11:41:12 -07:00
7 changed files with 670 additions and 43 deletions

View File

@@ -59,6 +59,29 @@ Both default to empty (no behavior change). They apply to both the `LOCAL_EXTENS
The Dynamic Group By chart customization now orders its display values according to the "Sort display control values" toggle: ascending (AZ), descending (ZA), or the dataset's source order when the toggle is unset. Previously the dropdown always sorted alphabetically. Existing dashboards where the toggle was never set will show options in source order instead of AZ; open the customization and enable the toggle to restore alphabetical ordering.
### Selectable encryption engine for app-encrypted fields (AES-GCM)
App-encrypted fields (database passwords, SSH tunnel credentials, OAuth tokens, etc.) can now use authenticated **AES-GCM** encryption instead of the historical unauthenticated **AES-CBC**. A new config selects the engine for the default adapter:
```python
# "aes" (AES-CBC, historical default) | "aes-gcm" (authenticated, recommended for new installs)
SQLALCHEMY_ENCRYPTED_FIELD_ENGINE = "aes"
```
**No action required / no behavior change:** the default remains `"aes"`, so existing installs are unaffected.
**Opting in on an existing install:** flipping the engine on a populated database without re-encrypting first will make stored secrets undecryptable, because the two ciphertext formats are not compatible. A migrator is provided. Recommended runbook:
1. Take a metadata-DB backup.
2. Re-encrypt existing secrets into the new engine (the `SECRET_KEY` is unchanged):
```bash
superset re-encrypt-secrets --engine aes-gcm
```
3. Set `SQLALCHEMY_ENCRYPTED_FIELD_ENGINE = "aes-gcm"` in your config.
4. Restart Superset.
The migration is transactional (all-or-nothing) and idempotent — it can be safely re-run or resumed. Note that AES-GCM, unlike AES-CBC, does not support querying directly over encrypted columns; audit any code that filters on an encrypted column before switching. See the SIP at `docs/sip/authenticated-encryption-at-rest.md` for details.
### Granular Export Controls
A new feature flag `GRANULAR_EXPORT_CONTROLS` introduces three fine-grained permissions that replace the legacy `can_csv` permission:

View File

@@ -0,0 +1,133 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# SIP: Authenticated encryption (AES-GCM) for app-encrypted fields
## [DRAFT — proposal for discussion]
This document is a draft proposal accompanying the code in this PR. It is
intended to seed the formal SIP discussion. The code here ships the
backward-compatible engine selection **and** the re-encryption migrator
(Phases 12 below); both are opt-in and change nothing for existing installs by
default. Flipping the default for fresh installs (Phase 3) remains future work.
## Motivation
Superset app-encrypts a number of sensitive fields before persisting them to
the metadata database, including:
- database connection passwords and `encrypted_extra` (`superset/models/core.py`),
- SSH tunnel credentials — password, private key, private-key password
(`superset/databases/ssh_tunnel/models.py`),
- OAuth2 tokens and other secrets stored via `EncryptedType`.
These fields are encrypted with `sqlalchemy_utils.EncryptedType`, which
**defaults to `AesEngine` (AES-CBC)**. AES-CBC provides confidentiality but is
**unauthenticated**: it has no integrity tag. An attacker with write access to
the ciphertext (e.g. direct metadata-DB access, a backup, or a compromised
replica) can perform **bit-flipping / chosen-ciphertext manipulation** to
silently alter the decrypted plaintext of a secret without detection.
`AesGcmEngine` (AES-GCM) is authenticated encryption: tampering causes
decryption to fail loudly rather than yielding attacker-influenced plaintext.
Using authenticated encryption for secrets at rest is an ASVS L1 expectation
(11.3.2 / cryptography best practice).
`config.py` already documents that operators *can* switch to GCM by writing a
custom `AbstractEncryptedFieldAdapter`, but:
1. it is opt-in, undocumented as a security recommendation, and easy to miss;
2. there is **no migration path** — flipping the engine on a populated database
makes every existing secret undecryptable, because GCM ciphertext is not
format-compatible with CBC.
## Proposed change
A three-part change, delivered incrementally so existing deployments are never
broken:
### Phase 1 — engine selection (this PR)
- Add a `SQLALCHEMY_ENCRYPTED_FIELD_ENGINE` config (`"aes"` | `"aes-gcm"`),
**defaulting to `"aes"`** (no behavior change for existing installs).
- Teach the default `SQLAlchemyUtilsAdapter` to honor it (an explicit `engine`
kwarg still wins, so the migrator can pin an engine).
- This lets **new** deployments choose AES-GCM from day one with a one-line
config, instead of writing a custom adapter.
### Phase 2 — CBC→GCM re-encryption migrator (this PR)
The existing `SecretsMigrator` (previously only used for `SECRET_KEY` rotation)
gains an **engine migration** mode that:
1. discovers every `EncryptedType` column (via `discover_encrypted_fields()`),
2. decrypts each value with the **source** engine (AES-CBC) under the current
`SECRET_KEY`,
3. re-encrypts with the **target** engine (AES-GCM),
4. runs transactionally per the existing all-or-nothing semantics, and is
idempotent per column (already-migrated values are skipped), so a run can be
safely repeated or resumed.
Exposed via a new `--engine` option on the existing CLI command:
`superset re-encrypt-secrets --engine aes-gcm`, runnable by operators with a DB
backup in hand. The `SECRET_KEY` is unchanged; an engine change and a key
rotation can also be combined (pass `--previous_secret_key` as well).
### Phase 3 — flip the default for new installs
Once the migrator and docs are in place, change the default to `"aes-gcm"` for
**fresh** installs only (e.g. keyed off an empty metadata DB / documented in
`UPDATING.md`), keeping existing installs on `"aes"` until they run Phase 2.
## New or changed public interfaces
- New config: `SQLALCHEMY_ENCRYPTED_FIELD_ENGINE: Literal["aes", "aes-gcm"]`.
- New (Phase 2) CLI: `superset re-encrypt-secrets --engine <name>`.
- No schema changes; ciphertext format changes per migrated column.
## Migration plan and compatibility
- **Backward compatible by default.** Phase 1 changes nothing unless the
operator opts in.
- Switching an existing deployment to `"aes-gcm"` **without** running the Phase
2 migrator will make existing secrets undecryptable — this is called out in
the config comment and must be in `UPDATING.md`.
- Recommended operator runbook: take a metadata-DB backup → run
`re-encrypt-secrets --engine aes-gcm` → set
`SQLALCHEMY_ENCRYPTED_FIELD_ENGINE = "aes-gcm"` → restart.
- `AesEngine` allows queryability over encrypted fields; AES-GCM does not.
Any code that filters/queries on an encrypted column directly must be audited
before Phase 3 (none is expected, but it must be verified).
## Rejected alternatives
- **Flip the default immediately.** Rejected: bricks every existing
deployment's secrets with no migration path.
- **Document-only (custom adapter).** Status quo; high friction and no
migration tooling — most operators will never do it.
## Open questions
- GCM→CBC rollback (for operators who need queryability) already works via the
same command (`re-encrypt-secrets --engine aes`), since the migrator is
engine-symmetric. Should rollback be documented as a supported path or
discouraged?
- The migrator already supports a concurrent `SECRET_KEY` rotation + engine
change in a single pass (pass `--previous_secret_key` alongside `--engine`).
Is that combination worth calling out in the operator docs, or kept advanced?

View File

@@ -31,7 +31,7 @@ from flask_appbuilder.api.manager import resolver
import superset.utils.database as database_utils
from superset.utils.decorators import transaction
from superset.utils.encrypt import SecretsMigrator
from superset.utils.encrypt import ENCRYPTION_ENGINES, SecretsMigrator
logger = logging.getLogger(__name__)
@@ -110,17 +110,37 @@ def update_api_docs() -> None:
help="An optional previous secret key, if PREVIOUS_SECRET_KEY "
"is not set on the config",
)
def re_encrypt_secrets(previous_secret_key: Optional[str] = None) -> None:
@click.option(
"--engine",
"-e",
"target_engine_name",
required=False,
type=click.Choice(sorted(ENCRYPTION_ENGINES), case_sensitive=False),
help="Re-encrypt all app-encrypted fields with this encryption engine "
"(e.g. 'aes-gcm' for authenticated encryption). The SECRET_KEY is "
"unchanged. Take a metadata-DB backup first, then set "
"SQLALCHEMY_ENCRYPTED_FIELD_ENGINE to the same value and restart.",
)
def re_encrypt_secrets(
previous_secret_key: Optional[str] = None,
target_engine_name: Optional[str] = None,
) -> None:
previous_secret_key = previous_secret_key or current_app.config.get(
"PREVIOUS_SECRET_KEY"
)
if previous_secret_key is None:
target_engine = (
ENCRYPTION_ENGINES[target_engine_name] if target_engine_name else None
)
if previous_secret_key is None and target_engine is None:
click.secho(
"No previous secret key provided; nothing to re-encrypt.",
"No previous secret key or target engine provided; nothing to re-encrypt.",
fg="yellow",
)
return
secrets_migrator = SecretsMigrator(previous_secret_key=previous_secret_key)
secrets_migrator = SecretsMigrator(
previous_secret_key=previous_secret_key,
target_engine=target_engine,
)
try:
stats = secrets_migrator.run()
except Exception as exc: # pylint: disable=broad-except

View File

@@ -270,7 +270,10 @@ SQLALCHEMY_CUSTOM_PASSWORD_STORE = None
# as key material. Do note that AesEngine allows for queryability over the
# encrypted fields.
#
# To change the default engine you need to define your own adapter:
# To switch the engine used by the default adapter, prefer the
# ``SQLALCHEMY_ENCRYPTED_FIELD_ENGINE`` knob below (e.g. "aes-gcm"). Defining a
# custom adapter, as shown next, is only needed for behaviour the built-in
# engines do not cover:
#
# e.g.:
#
@@ -295,6 +298,16 @@ SQLALCHEMY_ENCRYPTED_FIELD_TYPE_ADAPTER = ( # pylint: disable=invalid-name
SQLAlchemyUtilsAdapter
)
# Encryption engine used by the default SQLAlchemyUtilsAdapter for app-encrypted
# fields. Options:
# "aes" - AES-CBC (historical default; unauthenticated, queryable)
# "aes-gcm" - AES-GCM (authenticated encryption; recommended for NEW installs)
# WARNING: changing this on a database that already holds encrypted secrets
# (database passwords, SSH tunnel credentials, OAuth tokens, ...) will make
# those values undecryptable unless they are re-encrypted first. See the
# authenticated-encryption SIP/migration before switching an existing install.
SQLALCHEMY_ENCRYPTED_FIELD_ENGINE: Literal["aes", "aes-gcm"] = "aes"
# Extends the default SQLGlot dialects with additional dialects
SQLGLOT_DIALECTS_EXTENSIONS: DialectExtensions | Callable[[], DialectExtensions] = {}

View File

@@ -19,17 +19,30 @@ from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any, Optional
from flask import Flask
from flask import current_app, Flask
from flask_babel import lazy_gettext as _
from sqlalchemy import Table, text, TypeDecorator
from sqlalchemy.engine import Connection, Dialect, Row
from sqlalchemy_utils import EncryptedType as SqlaEncryptedType
from sqlalchemy_utils.types.encrypted.encrypted_type import AesEngine, AesGcmEngine
class EncryptedType(SqlaEncryptedType):
cache_ok = True
# Named encryption engines selectable via the ``SQLALCHEMY_ENCRYPTED_FIELD_ENGINE``
# config. "aes" (AES-CBC) is the historical default; "aes-gcm" is authenticated
# encryption (recommended for new deployments). NOTE: switching an existing
# deployment from "aes" to "aes-gcm" requires re-encrypting all stored secrets
# first — see the SIP referenced in the docs. Changing this on a populated
# database without that migration will make existing secrets undecryptable.
ENCRYPTION_ENGINES = {
"aes": AesEngine,
"aes-gcm": AesGcmEngine,
}
ENC_ADAPTER_TAG_ATTR_NAME = "__created_by_enc_field_adapter__"
logger = logging.getLogger(__name__)
@@ -65,6 +78,23 @@ class SQLAlchemyUtilsAdapter( # pylint: disable=too-few-public-methods
**kwargs: Optional[dict[str, Any]],
) -> TypeDecorator:
if app_config:
# Select the encryption engine from config, defaulting to the
# historical AES-CBC engine for backward compatibility. An explicit
# ``engine`` kwarg (e.g. from the migrator) always takes precedence.
if "engine" not in kwargs:
engine_name = app_config.get("SQLALCHEMY_ENCRYPTED_FIELD_ENGINE", "aes")
if engine_name not in ENCRYPTION_ENGINES:
# Do not log the configured value itself: it originates from
# the app config (which also holds the SECRET_KEY), and
# static analysis flags interpolating any config-sourced
# value into a log as potential clear-text secret logging.
# The set of valid engines is enough to diagnose the typo.
logger.warning(
"Unrecognized SQLALCHEMY_ENCRYPTED_FIELD_ENGINE;"
" falling back to AES-CBC. Valid engines: %s",
", ".join(sorted(ENCRYPTION_ENGINES)),
)
kwargs["engine"] = ENCRYPTION_ENGINES.get(engine_name, AesEngine)
return EncryptedType(*args, lambda: app_config["SECRET_KEY"], **kwargs)
raise Exception( # pylint: disable=broad-exception-raised
@@ -101,10 +131,40 @@ class EncryptedFieldFactory:
class SecretsMigrator:
def __init__(self, previous_secret_key: str) -> None:
"""Re-encrypts every app-encrypted column in the ORM.
Two modes, which can also be combined:
- **Key rotation** — pass ``previous_secret_key``. Values are decrypted
under the previous key and re-encrypted under the current ``SECRET_KEY``
using the currently-configured engine.
- **Engine migration** — pass ``target_engine`` (e.g. ``AesGcmEngine``).
Values are decrypted under the current ``SECRET_KEY`` with the source
engine and re-encrypted with the target engine. This is how an existing
install moves from AES-CBC to authenticated AES-GCM without bricking
stored secrets; the ``SECRET_KEY`` itself is unchanged.
Both modes share the same all-or-nothing transaction and per-column
idempotency: a value already readable in the target form is left untouched,
so a run can be safely repeated or resumed.
"""
def __init__(
self,
previous_secret_key: Optional[str] = None,
target_engine: Optional[type[Any]] = None,
) -> None:
from superset import db # pylint: disable=import-outside-toplevel
self._db = db
self._secret_key = current_app.config["SECRET_KEY"]
self._target_engine = target_engine
# In engine-migration mode the SECRET_KEY does not change: the source
# ciphertext is decrypted under the current key (with the source
# engine), so default the "previous" key to the current one when the
# caller only asked for an engine change.
if target_engine is not None and not previous_secret_key:
previous_secret_key = self._secret_key
self._previous_secret_key = previous_secret_key
self._dialect: Dialect = db.engine.url.get_dialect()
@@ -177,6 +237,54 @@ class SecretsMigrator:
cols = ",".join(pk_columns + column_names)
return conn.execute(f"SELECT {cols} FROM {table_name}") # noqa: S608
def _target_type(self, encrypted_type: EncryptedType) -> EncryptedType:
"""The EncryptedType to re-encrypt a value *into*.
For a key rotation this is the column's own configured type (current
engine + lazily-resolved current key). For an engine migration it is a
type pinned to the requested target engine under the current key.
"""
if self._target_engine is None:
return encrypted_type
return EncryptedType(
type_in=encrypted_type.underlying_type,
key=self._secret_key,
engine=self._target_engine,
)
def _source_decryptors(self, encrypted_type: EncryptedType) -> list[EncryptedType]:
"""Candidate decryptors, tried in order, to recover a value's plaintext.
1. The column's configured type — current key + currently-configured
engine. During an engine migration this reads the source ciphertext
while the config still points at the source engine.
2. The previous key under each supported engine. Trying the column's
configured engine covers ``SECRET_KEY`` rotation for any engine
(including AES-GCM); also trying the historical AES-CBC engine
covers an engine migration whose config was flipped to the target
engine *before* the migrator ran (the source data is still CBC under
the current key, which the previous key defaults to in
engine-migration mode).
"""
decryptors = [encrypted_type]
if self._previous_secret_key:
# Try the column's own engine first, then any remaining supported
# engines (notably the historical AES-CBC), de-duplicated so we
# never build the same decryptor twice.
engines: list[type[Any]] = [type(encrypted_type.engine)]
for engine in ENCRYPTION_ENGINES.values():
if engine not in engines:
engines.append(engine)
decryptors.extend(
EncryptedType(
type_in=encrypted_type.underlying_type,
key=self._previous_secret_key,
engine=engine,
)
for engine in engines
)
return decryptors
def _re_encrypt_row(
self,
conn: Connection,
@@ -187,24 +295,28 @@ class SecretsMigrator:
stats: ReEncryptStats,
) -> None:
"""
Re encrypts all columns in a Row
Re-encrypts all columns in a Row into the target form.
Re-encryption is idempotent per column: we first ask whether the
current key can already decrypt the value, and skip if so. Only if
the current key fails do we fall back to decrypting with the
previous key and re-encrypting. Checking the current key first
keeps ``run()`` idempotent regardless of what ``previous_secret_key``
the caller supplies — even re-running with the same (unchanged)
``SECRET_KEY`` will not rewrite rows.
The "target form" is current-key + currently-configured engine for a
``SECRET_KEY`` rotation, or current-key + the requested engine for an
engine migration (see ``_target_type``).
Re-encryption is idempotent per column: a value that can already be
read in the target form is left untouched. This keeps ``run()``
idempotent regardless of what ``previous_secret_key`` the caller
supplies, and lets an interrupted engine migration be resumed.
Otherwise the plaintext is recovered from the first candidate source
decryptor that can read it (see ``_source_decryptors``) and re-encrypted
into the target form.
NULL values are never encrypted, so they are reported separately
(neither re-encrypted nor "skipped because already current").
Per-column outcomes are accumulated onto ``stats`` so the caller can
report a summary. Columns whose ciphertext is unreadable under both
keys are counted as failures and logged; the exception is not
propagated, so processing continues. The caller is responsible for
raising once all rows have been scanned.
report a summary. Columns whose ciphertext is unreadable under any
candidate key/engine are counted as failures and logged; the exception
is not propagated, so processing continues. The caller is responsible
for raising once all rows have been scanned.
If no columns need re-encryption, no UPDATE is issued.
@@ -223,38 +335,48 @@ class SecretsMigrator:
stats.null += 1
continue
# Fast path: if the current key can already read the value,
# leave it untouched. A failure here simply means we need to try
# the previous key below — not a condition worth logging.
target_type = self._target_type(encrypted_type)
# Fast path: if the value can already be read in the target form,
# leave it untouched. A failure here simply means we need to try the
# source decryptors below — not a condition worth logging.
try:
encrypted_type.process_result_value(raw_value, self._dialect)
target_type.process_result_value(raw_value, self._dialect)
except Exception: # noqa: BLE001, S110 # pylint: disable=broad-except
pass
else:
stats.skipped += 1
continue
# Current key cannot decrypt — try the previous key.
previous_encrypted_type = EncryptedType(
type_in=encrypted_type.underlying_type, key=self._previous_secret_key
)
try:
unencrypted_value = previous_encrypted_type.process_result_value(
raw_value, self._dialect
)
except Exception as prev_ex: # noqa: BLE001 # pylint: disable=broad-except
# Recover the plaintext from the first source decryptor that can
# read the value (current engine/key, then previous key).
unencrypted_value = None
decrypted = False
last_error: Optional[Exception] = None
for decryptor in self._source_decryptors(encrypted_type):
try:
unencrypted_value = decryptor.process_result_value(
raw_value, self._dialect
)
except Exception as ex: # noqa: BLE001 # pylint: disable=broad-except
last_error = ex
continue
decrypted = True
break
if not decrypted:
logger.error(
"Column [%s.%s] cannot be decrypted under the previous"
" or current secret key (%s: %s)",
"Column [%s.%s] cannot be decrypted under any known"
" key/engine (%s: %s)",
table_name,
column_name,
type(prev_ex).__name__,
prev_ex,
type(last_error).__name__ if last_error else "None",
last_error,
)
stats.failed += 1
continue
re_encrypted_columns[column_name] = encrypted_type.process_bind_param(
re_encrypted_columns[column_name] = target_type.process_bind_param(
unencrypted_value,
self._dialect,
)
@@ -276,12 +398,13 @@ class SecretsMigrator:
def run(self) -> ReEncryptStats:
"""
Re-encrypt every encrypted column in the ORM under the current
``SECRET_KEY``.
Re-encrypt every encrypted column in the ORM into the target form
(current ``SECRET_KEY`` and, for an engine migration, the target
engine).
Returns per-value counts of re-encrypted, skipped (already under the
current key), and failed (undecryptable) outcomes. If any failures
occurred the transaction is rolled back by raising after the
Returns per-value counts of re-encrypted, skipped (already in the
target form), null, and failed (undecryptable) outcomes. If any
failures occurred the transaction is rolled back by raising after the
summary is logged, so partial re-encryption never commits.
"""
encrypted_meta_info = self.discover_encrypted_fields()

View File

@@ -29,6 +29,7 @@ from freezegun import freeze_time
import superset.cli.importexport
import superset.cli.thumbnails
import superset.cli.update
import superset.utils.encrypt
from superset import db
from superset.models.dashboard import Dashboard
from tests.integration_tests.fixtures.birth_names_dashboard import (
@@ -364,3 +365,71 @@ def test_re_encrypt_secrets_failure_exits_nonzero(app_context):
# The failure path must be handled by the CLI, not leaked as an
# uncaught exception.
assert response.exception is None or isinstance(response.exception, SystemExit)
def test_re_encrypt_secrets_engine_option_invokes_migrator(app_context):
"""
When --engine is provided, the CLI must resolve the engine name to the
correct engine class and pass it to SecretsMigrator as target_engine.
"""
from sqlalchemy_utils.types.encrypted.encrypted_type import AesGcmEngine
current_app.config.pop("PREVIOUS_SECRET_KEY", None)
runner = current_app.test_cli_runner()
with mock.patch.object(
superset.cli.update,
"SecretsMigrator",
) as migrator_mock:
migrator_mock.return_value.run.return_value = (
superset.utils.encrypt.ReEncryptStats()
)
response = runner.invoke(
superset.cli.update.re_encrypt_secrets,
["--engine", "aes-gcm"],
)
assert response.exit_code == 0
call_kwargs = migrator_mock.call_args.kwargs
assert call_kwargs.get("target_engine") is AesGcmEngine
assert call_kwargs.get("previous_secret_key") is None
def test_re_encrypt_secrets_engine_option_case_insensitive(app_context):
"""
The --engine option must be case-insensitive per
click.Choice(..., case_sensitive=False).
"""
from sqlalchemy_utils.types.encrypted.encrypted_type import AesGcmEngine
current_app.config.pop("PREVIOUS_SECRET_KEY", None)
runner = current_app.test_cli_runner()
with mock.patch.object(
superset.cli.update,
"SecretsMigrator",
) as migrator_mock:
migrator_mock.return_value.run.return_value = (
superset.utils.encrypt.ReEncryptStats()
)
response = runner.invoke(
superset.cli.update.re_encrypt_secrets,
["--engine", "AES-GCM"],
)
assert response.exit_code == 0
assert migrator_mock.call_args.kwargs.get("target_engine") is AesGcmEngine
def test_re_encrypt_secrets_engine_option_invalid_raises_usage(app_context):
"""
An unrecognized engine name must produce a click usage error, not a
traceback or silent failure.
"""
runner = current_app.test_cli_runner()
response = runner.invoke(
superset.cli.update.re_encrypt_secrets,
["--engine", "nonexistent-engine"],
)
assert response.exit_code != 0
assert "Invalid value" in response.output or "Usage:" in response.output
assert "aes" in response.output or "aes-gcm" in response.output

View File

@@ -0,0 +1,246 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
from unittest.mock import MagicMock
from sqlalchemy import String
from sqlalchemy.engine import make_url
from sqlalchemy_utils.types.encrypted.encrypted_type import AesEngine, AesGcmEngine
from superset.utils.encrypt import (
EncryptedType,
ReEncryptStats,
SecretsMigrator,
SQLAlchemyUtilsAdapter,
)
SECRET = {"SECRET_KEY": "x" * 32}
SECRET_KEY = "k" * 32
DIALECT = make_url("sqlite://").get_dialect()
def _encrypted_type(engine: type) -> EncryptedType:
"""A standalone EncryptedType for a given engine under SECRET_KEY."""
return EncryptedType(String(1024), key=lambda: SECRET_KEY, engine=engine)
def _engine_migrator(target_engine: type) -> SecretsMigrator:
"""Build a SecretsMigrator in engine-migration mode without an app context.
``__init__`` reads ``current_app`` and the DB dialect, so — like the
existing row-level tests that override ``_dialect`` — we set the few
attributes ``_re_encrypt_row`` actually uses directly.
"""
migrator = SecretsMigrator.__new__(SecretsMigrator)
migrator._secret_key = SECRET_KEY # noqa: SLF001
migrator._target_engine = target_engine # noqa: SLF001
# Engine migration keeps the SECRET_KEY; previous key defaults to current.
migrator._previous_secret_key = SECRET_KEY # noqa: SLF001
migrator._dialect = DIALECT # noqa: SLF001
return migrator
def test_default_engine_is_aes_cbc() -> None:
"""Without config, the adapter keeps the historical AES-CBC engine."""
field = SQLAlchemyUtilsAdapter().create(SECRET, String(128))
assert isinstance(field.engine, AesEngine)
def test_aes_gcm_engine_selected_by_config() -> None:
"""SQLALCHEMY_ENCRYPTED_FIELD_ENGINE='aes-gcm' selects authenticated AES-GCM."""
field = SQLAlchemyUtilsAdapter().create(
{**SECRET, "SQLALCHEMY_ENCRYPTED_FIELD_ENGINE": "aes-gcm"},
String(128),
)
assert isinstance(field.engine, AesGcmEngine)
def test_unknown_engine_falls_back_to_aes_cbc() -> None:
"""An unrecognized engine name falls back to the safe historical default."""
field = SQLAlchemyUtilsAdapter().create(
{**SECRET, "SQLALCHEMY_ENCRYPTED_FIELD_ENGINE": "bogus"},
String(128),
)
assert isinstance(field.engine, AesEngine)
def test_explicit_engine_kwarg_takes_precedence() -> None:
"""An explicit engine kwarg overrides the config (used by the migrator)."""
field = SQLAlchemyUtilsAdapter().create(
{**SECRET, "SQLALCHEMY_ENCRYPTED_FIELD_ENGINE": "aes-gcm"},
String(128),
engine=AesEngine,
)
assert isinstance(field.engine, AesEngine)
def test_engine_migration_cbc_to_gcm_re_encrypts() -> None:
"""CBC source value is re-encrypted into GCM under the same SECRET_KEY.
Mirrors the recommended runbook: the migrator runs while the config still
points at AES-CBC (the column type), re-encrypting into AES-GCM.
"""
cbc = _encrypted_type(AesEngine)
ciphertext = cbc.process_bind_param("hunter2", DIALECT)
migrator = _engine_migrator(AesGcmEngine)
conn = MagicMock()
row = {"id": 1, "password": ciphertext}
stats = ReEncryptStats()
migrator._re_encrypt_row( # noqa: SLF001
conn, row, "dbs", {"password": cbc}, ["id"], stats
)
assert stats == ReEncryptStats(re_encrypted=1)
assert conn.execute.call_count == 1
new_value = conn.execute.call_args.kwargs["password"]
# The stored value changed and now decrypts as GCM back to the plaintext.
assert new_value != ciphertext
gcm = _encrypted_type(AesGcmEngine)
assert gcm.process_result_value(new_value, DIALECT) == "hunter2"
def test_engine_migration_idempotent_for_already_target() -> None:
"""A value already in the target (GCM) form is skipped — runs are resumable.
The column is still configured as CBC (config not yet flipped), but the
value has already been migrated to GCM, so it must be left untouched.
"""
gcm_value = _encrypted_type(AesGcmEngine).process_bind_param("hunter2", DIALECT)
cbc_column = _encrypted_type(AesEngine)
migrator = _engine_migrator(AesGcmEngine)
conn = MagicMock()
row = {"id": 1, "password": gcm_value}
stats = ReEncryptStats()
migrator._re_encrypt_row( # noqa: SLF001
conn, row, "dbs", {"password": cbc_column}, ["id"], stats
)
assert stats == ReEncryptStats(skipped=1)
assert conn.execute.call_count == 0
def test_engine_migration_reads_cbc_after_config_already_flipped() -> None:
"""CBC source is still migrated when the config was flipped to GCM first.
If an operator sets the config engine to GCM before running the migrator,
the column type can no longer read the CBC value; the previous-key (== the
current key) AES-CBC decryptor recovers it and it is re-encrypted as GCM.
"""
cbc_value = _encrypted_type(AesEngine).process_bind_param("hunter2", DIALECT)
gcm_column = _encrypted_type(AesGcmEngine)
migrator = _engine_migrator(AesGcmEngine)
conn = MagicMock()
row = {"id": 1, "password": cbc_value}
stats = ReEncryptStats()
migrator._re_encrypt_row( # noqa: SLF001
conn, row, "dbs", {"password": gcm_column}, ["id"], stats
)
assert stats == ReEncryptStats(re_encrypted=1)
new_value = conn.execute.call_args.kwargs["password"]
assert gcm_column.process_result_value(new_value, DIALECT) == "hunter2"
def test_engine_migration_gcm_to_cbc_rolls_back() -> None:
"""GCM source value is rolled back to CBC under the same SECRET_KEY.
The reverse of the forward migration (``--engine aes``). The idempotency
fast-path decrypts in the *target* form first; since the target here is
unauthenticated AES-CBC, this guards against it mis-reading the AES-GCM
ciphertext and wrongly skipping the value instead of re-encrypting it.
"""
gcm_value = _encrypted_type(AesGcmEngine).process_bind_param("hunter2", DIALECT)
gcm_column = _encrypted_type(AesGcmEngine)
migrator = _engine_migrator(AesEngine)
conn = MagicMock()
row = {"id": 1, "password": gcm_value}
stats = ReEncryptStats()
migrator._re_encrypt_row( # noqa: SLF001
conn, row, "dbs", {"password": gcm_column}, ["id"], stats
)
assert stats == ReEncryptStats(re_encrypted=1)
new_value = conn.execute.call_args.kwargs["password"]
assert new_value != gcm_value
# The rolled-back value now decrypts as AES-CBC back to the plaintext.
assert _encrypted_type(AesEngine).process_result_value(new_value, DIALECT) == (
"hunter2"
)
def _key_rotation_migrator(previous_secret_key: str) -> SecretsMigrator:
"""Build a SecretsMigrator in key-rotation mode without an app context.
Like ``_engine_migrator`` but with no target engine: values are decrypted
under ``previous_secret_key`` and re-encrypted under the current key using
the column's own engine.
"""
migrator = SecretsMigrator.__new__(SecretsMigrator)
migrator._secret_key = SECRET_KEY # noqa: SLF001
migrator._target_engine = None # noqa: SLF001
migrator._previous_secret_key = previous_secret_key # noqa: SLF001
migrator._dialect = DIALECT # noqa: SLF001
return migrator
def test_key_rotation_for_aes_gcm_column() -> None:
"""SECRET_KEY rotation works for an AES-GCM column.
The previous-key fallback must use the column's AES-GCM engine, otherwise
GCM ciphertext written under the old key cannot be decrypted and the
rotation rolls back.
"""
old_key = "o" * 32
gcm_old = EncryptedType(String(1024), key=lambda: old_key, engine=AesGcmEngine)
old_value = gcm_old.process_bind_param("hunter2", DIALECT)
gcm_column = _encrypted_type(AesGcmEngine)
migrator = _key_rotation_migrator(previous_secret_key=old_key)
conn = MagicMock()
row = {"id": 1, "password": old_value}
stats = ReEncryptStats()
migrator._re_encrypt_row( # noqa: SLF001
conn, row, "dbs", {"password": gcm_column}, ["id"], stats
)
assert stats == ReEncryptStats(re_encrypted=1)
new_value = conn.execute.call_args.kwargs["password"]
assert gcm_column.process_result_value(new_value, DIALECT) == "hunter2"
def test_engine_migration_unreadable_value_counts_as_failure() -> None:
"""A value no engine/key can read is a failure, not a silent pass-through."""
migrator = _engine_migrator(AesGcmEngine)
conn = MagicMock()
row = {"id": 1, "password": b"not-valid-ciphertext"}
stats = ReEncryptStats()
migrator._re_encrypt_row( # noqa: SLF001
conn, row, "dbs", {"password": _encrypted_type(AesEngine)}, ["id"], stats
)
assert stats == ReEncryptStats(failed=1)
assert conn.execute.call_count == 0