mirror of
https://github.com/apache/superset.git
synced 2026-06-10 01:59:17 +00:00
Compare commits
6 Commits
master
...
feat/aes-g
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
6671c90103 | ||
|
|
0fe86ea0c3 | ||
|
|
731a476acc | ||
|
|
05d87edf27 | ||
|
|
782c1db8d8 | ||
|
|
f748de3423 |
23
UPDATING.md
23
UPDATING.md
@@ -59,6 +59,29 @@ Both default to empty (no behavior change). They apply to both the `LOCAL_EXTENS
|
||||
|
||||
The Dynamic Group By chart customization now orders its display values according to the "Sort display control values" toggle: ascending (A–Z), descending (Z–A), or the dataset's source order when the toggle is unset. Previously the dropdown always sorted alphabetically. Existing dashboards where the toggle was never set will show options in source order instead of A–Z; open the customization and enable the toggle to restore alphabetical ordering.
|
||||
|
||||
### Selectable encryption engine for app-encrypted fields (AES-GCM)
|
||||
|
||||
App-encrypted fields (database passwords, SSH tunnel credentials, OAuth tokens, etc.) can now use authenticated **AES-GCM** encryption instead of the historical unauthenticated **AES-CBC**. A new config selects the engine for the default adapter:
|
||||
|
||||
```python
|
||||
# "aes" (AES-CBC, historical default) | "aes-gcm" (authenticated, recommended for new installs)
|
||||
SQLALCHEMY_ENCRYPTED_FIELD_ENGINE = "aes"
|
||||
```
|
||||
|
||||
**No action required / no behavior change:** the default remains `"aes"`, so existing installs are unaffected.
|
||||
|
||||
**Opting in on an existing install:** flipping the engine on a populated database without re-encrypting first will make stored secrets undecryptable, because the two ciphertext formats are not compatible. A migrator is provided. Recommended runbook:
|
||||
|
||||
1. Take a metadata-DB backup.
|
||||
2. Re-encrypt existing secrets into the new engine (the `SECRET_KEY` is unchanged):
|
||||
```bash
|
||||
superset re-encrypt-secrets --engine aes-gcm
|
||||
```
|
||||
3. Set `SQLALCHEMY_ENCRYPTED_FIELD_ENGINE = "aes-gcm"` in your config.
|
||||
4. Restart Superset.
|
||||
|
||||
The migration is transactional (all-or-nothing) and idempotent — it can be safely re-run or resumed. Note that AES-GCM, unlike AES-CBC, does not support querying directly over encrypted columns; audit any code that filters on an encrypted column before switching. See the SIP at `docs/sip/authenticated-encryption-at-rest.md` for details.
|
||||
|
||||
### Granular Export Controls
|
||||
|
||||
A new feature flag `GRANULAR_EXPORT_CONTROLS` introduces three fine-grained permissions that replace the legacy `can_csv` permission:
|
||||
|
||||
133
docs/sip/authenticated-encryption-at-rest.md
Normal file
133
docs/sip/authenticated-encryption-at-rest.md
Normal file
@@ -0,0 +1,133 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one
|
||||
or more contributor license agreements. See the NOTICE file
|
||||
distributed with this work for additional information
|
||||
regarding copyright ownership. The ASF licenses this file
|
||||
to you under the Apache License, Version 2.0 (the
|
||||
"License"); you may not use this file except in compliance
|
||||
with the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing,
|
||||
software distributed under the License is distributed on an
|
||||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations
|
||||
under the License.
|
||||
-->
|
||||
|
||||
# SIP: Authenticated encryption (AES-GCM) for app-encrypted fields
|
||||
|
||||
## [DRAFT — proposal for discussion]
|
||||
|
||||
This document is a draft proposal accompanying the code in this PR. It is
|
||||
intended to seed the formal SIP discussion. The code here ships the
|
||||
backward-compatible engine selection **and** the re-encryption migrator
|
||||
(Phases 1–2 below); both are opt-in and change nothing for existing installs by
|
||||
default. Flipping the default for fresh installs (Phase 3) remains future work.
|
||||
|
||||
## Motivation
|
||||
|
||||
Superset app-encrypts a number of sensitive fields before persisting them to
|
||||
the metadata database, including:
|
||||
|
||||
- database connection passwords and `encrypted_extra` (`superset/models/core.py`),
|
||||
- SSH tunnel credentials — password, private key, private-key password
|
||||
(`superset/databases/ssh_tunnel/models.py`),
|
||||
- OAuth2 tokens and other secrets stored via `EncryptedType`.
|
||||
|
||||
These fields are encrypted with `sqlalchemy_utils.EncryptedType`, which
|
||||
**defaults to `AesEngine` (AES-CBC)**. AES-CBC provides confidentiality but is
|
||||
**unauthenticated**: it has no integrity tag. An attacker with write access to
|
||||
the ciphertext (e.g. direct metadata-DB access, a backup, or a compromised
|
||||
replica) can perform **bit-flipping / chosen-ciphertext manipulation** to
|
||||
silently alter the decrypted plaintext of a secret without detection.
|
||||
|
||||
`AesGcmEngine` (AES-GCM) is authenticated encryption: tampering causes
|
||||
decryption to fail loudly rather than yielding attacker-influenced plaintext.
|
||||
Using authenticated encryption for secrets at rest is an ASVS L1 expectation
|
||||
(11.3.2 / cryptography best practice).
|
||||
|
||||
`config.py` already documents that operators *can* switch to GCM by writing a
|
||||
custom `AbstractEncryptedFieldAdapter`, but:
|
||||
|
||||
1. it is opt-in, undocumented as a security recommendation, and easy to miss;
|
||||
2. there is **no migration path** — flipping the engine on a populated database
|
||||
makes every existing secret undecryptable, because GCM ciphertext is not
|
||||
format-compatible with CBC.
|
||||
|
||||
## Proposed change
|
||||
|
||||
A three-part change, delivered incrementally so existing deployments are never
|
||||
broken:
|
||||
|
||||
### Phase 1 — engine selection (this PR)
|
||||
|
||||
- Add a `SQLALCHEMY_ENCRYPTED_FIELD_ENGINE` config (`"aes"` | `"aes-gcm"`),
|
||||
**defaulting to `"aes"`** (no behavior change for existing installs).
|
||||
- Teach the default `SQLAlchemyUtilsAdapter` to honor it (an explicit `engine`
|
||||
kwarg still wins, so the migrator can pin an engine).
|
||||
- This lets **new** deployments choose AES-GCM from day one with a one-line
|
||||
config, instead of writing a custom adapter.
|
||||
|
||||
### Phase 2 — CBC→GCM re-encryption migrator (this PR)
|
||||
|
||||
The existing `SecretsMigrator` (previously only used for `SECRET_KEY` rotation)
|
||||
gains an **engine migration** mode that:
|
||||
|
||||
1. discovers every `EncryptedType` column (via `discover_encrypted_fields()`),
|
||||
2. decrypts each value with the **source** engine (AES-CBC) under the current
|
||||
`SECRET_KEY`,
|
||||
3. re-encrypts with the **target** engine (AES-GCM),
|
||||
4. runs transactionally per the existing all-or-nothing semantics, and is
|
||||
idempotent per column (already-migrated values are skipped), so a run can be
|
||||
safely repeated or resumed.
|
||||
|
||||
Exposed via a new `--engine` option on the existing CLI command:
|
||||
`superset re-encrypt-secrets --engine aes-gcm`, runnable by operators with a DB
|
||||
backup in hand. The `SECRET_KEY` is unchanged; an engine change and a key
|
||||
rotation can also be combined (pass `--previous_secret_key` as well).
|
||||
|
||||
### Phase 3 — flip the default for new installs
|
||||
|
||||
Once the migrator and docs are in place, change the default to `"aes-gcm"` for
|
||||
**fresh** installs only (e.g. keyed off an empty metadata DB / documented in
|
||||
`UPDATING.md`), keeping existing installs on `"aes"` until they run Phase 2.
|
||||
|
||||
## New or changed public interfaces
|
||||
|
||||
- New config: `SQLALCHEMY_ENCRYPTED_FIELD_ENGINE: Literal["aes", "aes-gcm"]`.
|
||||
- New (Phase 2) CLI: `superset re-encrypt-secrets --engine <name>`.
|
||||
- No schema changes; ciphertext format changes per migrated column.
|
||||
|
||||
## Migration plan and compatibility
|
||||
|
||||
- **Backward compatible by default.** Phase 1 changes nothing unless the
|
||||
operator opts in.
|
||||
- Switching an existing deployment to `"aes-gcm"` **without** running the Phase
|
||||
2 migrator will make existing secrets undecryptable — this is called out in
|
||||
the config comment and must be in `UPDATING.md`.
|
||||
- Recommended operator runbook: take a metadata-DB backup → run
|
||||
`re-encrypt-secrets --engine aes-gcm` → set
|
||||
`SQLALCHEMY_ENCRYPTED_FIELD_ENGINE = "aes-gcm"` → restart.
|
||||
- `AesEngine` allows queryability over encrypted fields; AES-GCM does not.
|
||||
Any code that filters/queries on an encrypted column directly must be audited
|
||||
before Phase 3 (none is expected, but it must be verified).
|
||||
|
||||
## Rejected alternatives
|
||||
|
||||
- **Flip the default immediately.** Rejected: bricks every existing
|
||||
deployment's secrets with no migration path.
|
||||
- **Document-only (custom adapter).** Status quo; high friction and no
|
||||
migration tooling — most operators will never do it.
|
||||
|
||||
## Open questions
|
||||
|
||||
- GCM→CBC rollback (for operators who need queryability) already works via the
|
||||
same command (`re-encrypt-secrets --engine aes`), since the migrator is
|
||||
engine-symmetric. Should rollback be documented as a supported path or
|
||||
discouraged?
|
||||
- The migrator already supports a concurrent `SECRET_KEY` rotation + engine
|
||||
change in a single pass (pass `--previous_secret_key` alongside `--engine`).
|
||||
Is that combination worth calling out in the operator docs, or kept advanced?
|
||||
@@ -31,7 +31,7 @@ from flask_appbuilder.api.manager import resolver
|
||||
|
||||
import superset.utils.database as database_utils
|
||||
from superset.utils.decorators import transaction
|
||||
from superset.utils.encrypt import SecretsMigrator
|
||||
from superset.utils.encrypt import ENCRYPTION_ENGINES, SecretsMigrator
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -110,17 +110,37 @@ def update_api_docs() -> None:
|
||||
help="An optional previous secret key, if PREVIOUS_SECRET_KEY "
|
||||
"is not set on the config",
|
||||
)
|
||||
def re_encrypt_secrets(previous_secret_key: Optional[str] = None) -> None:
|
||||
@click.option(
|
||||
"--engine",
|
||||
"-e",
|
||||
"target_engine_name",
|
||||
required=False,
|
||||
type=click.Choice(sorted(ENCRYPTION_ENGINES), case_sensitive=False),
|
||||
help="Re-encrypt all app-encrypted fields with this encryption engine "
|
||||
"(e.g. 'aes-gcm' for authenticated encryption). The SECRET_KEY is "
|
||||
"unchanged. Take a metadata-DB backup first, then set "
|
||||
"SQLALCHEMY_ENCRYPTED_FIELD_ENGINE to the same value and restart.",
|
||||
)
|
||||
def re_encrypt_secrets(
|
||||
previous_secret_key: Optional[str] = None,
|
||||
target_engine_name: Optional[str] = None,
|
||||
) -> None:
|
||||
previous_secret_key = previous_secret_key or current_app.config.get(
|
||||
"PREVIOUS_SECRET_KEY"
|
||||
)
|
||||
if previous_secret_key is None:
|
||||
target_engine = (
|
||||
ENCRYPTION_ENGINES[target_engine_name] if target_engine_name else None
|
||||
)
|
||||
if previous_secret_key is None and target_engine is None:
|
||||
click.secho(
|
||||
"No previous secret key provided; nothing to re-encrypt.",
|
||||
"No previous secret key or target engine provided; nothing to re-encrypt.",
|
||||
fg="yellow",
|
||||
)
|
||||
return
|
||||
secrets_migrator = SecretsMigrator(previous_secret_key=previous_secret_key)
|
||||
secrets_migrator = SecretsMigrator(
|
||||
previous_secret_key=previous_secret_key,
|
||||
target_engine=target_engine,
|
||||
)
|
||||
try:
|
||||
stats = secrets_migrator.run()
|
||||
except Exception as exc: # pylint: disable=broad-except
|
||||
|
||||
@@ -270,7 +270,10 @@ SQLALCHEMY_CUSTOM_PASSWORD_STORE = None
|
||||
# as key material. Do note that AesEngine allows for queryability over the
|
||||
# encrypted fields.
|
||||
#
|
||||
# To change the default engine you need to define your own adapter:
|
||||
# To switch the engine used by the default adapter, prefer the
|
||||
# ``SQLALCHEMY_ENCRYPTED_FIELD_ENGINE`` knob below (e.g. "aes-gcm"). Defining a
|
||||
# custom adapter, as shown next, is only needed for behaviour the built-in
|
||||
# engines do not cover:
|
||||
#
|
||||
# e.g.:
|
||||
#
|
||||
@@ -295,6 +298,16 @@ SQLALCHEMY_ENCRYPTED_FIELD_TYPE_ADAPTER = ( # pylint: disable=invalid-name
|
||||
SQLAlchemyUtilsAdapter
|
||||
)
|
||||
|
||||
# Encryption engine used by the default SQLAlchemyUtilsAdapter for app-encrypted
|
||||
# fields. Options:
|
||||
# "aes" - AES-CBC (historical default; unauthenticated, queryable)
|
||||
# "aes-gcm" - AES-GCM (authenticated encryption; recommended for NEW installs)
|
||||
# WARNING: changing this on a database that already holds encrypted secrets
|
||||
# (database passwords, SSH tunnel credentials, OAuth tokens, ...) will make
|
||||
# those values undecryptable unless they are re-encrypted first. See the
|
||||
# authenticated-encryption SIP/migration before switching an existing install.
|
||||
SQLALCHEMY_ENCRYPTED_FIELD_ENGINE: Literal["aes", "aes-gcm"] = "aes"
|
||||
|
||||
# Extends the default SQLGlot dialects with additional dialects
|
||||
SQLGLOT_DIALECTS_EXTENSIONS: DialectExtensions | Callable[[], DialectExtensions] = {}
|
||||
|
||||
|
||||
@@ -19,17 +19,30 @@ from abc import ABC, abstractmethod
|
||||
from dataclasses import dataclass
|
||||
from typing import Any, Optional
|
||||
|
||||
from flask import Flask
|
||||
from flask import current_app, Flask
|
||||
from flask_babel import lazy_gettext as _
|
||||
from sqlalchemy import Table, text, TypeDecorator
|
||||
from sqlalchemy.engine import Connection, Dialect, Row
|
||||
from sqlalchemy_utils import EncryptedType as SqlaEncryptedType
|
||||
from sqlalchemy_utils.types.encrypted.encrypted_type import AesEngine, AesGcmEngine
|
||||
|
||||
|
||||
class EncryptedType(SqlaEncryptedType):
|
||||
cache_ok = True
|
||||
|
||||
|
||||
# Named encryption engines selectable via the ``SQLALCHEMY_ENCRYPTED_FIELD_ENGINE``
|
||||
# config. "aes" (AES-CBC) is the historical default; "aes-gcm" is authenticated
|
||||
# encryption (recommended for new deployments). NOTE: switching an existing
|
||||
# deployment from "aes" to "aes-gcm" requires re-encrypting all stored secrets
|
||||
# first — see the SIP referenced in the docs. Changing this on a populated
|
||||
# database without that migration will make existing secrets undecryptable.
|
||||
ENCRYPTION_ENGINES = {
|
||||
"aes": AesEngine,
|
||||
"aes-gcm": AesGcmEngine,
|
||||
}
|
||||
|
||||
|
||||
ENC_ADAPTER_TAG_ATTR_NAME = "__created_by_enc_field_adapter__"
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -65,6 +78,23 @@ class SQLAlchemyUtilsAdapter( # pylint: disable=too-few-public-methods
|
||||
**kwargs: Optional[dict[str, Any]],
|
||||
) -> TypeDecorator:
|
||||
if app_config:
|
||||
# Select the encryption engine from config, defaulting to the
|
||||
# historical AES-CBC engine for backward compatibility. An explicit
|
||||
# ``engine`` kwarg (e.g. from the migrator) always takes precedence.
|
||||
if "engine" not in kwargs:
|
||||
engine_name = app_config.get("SQLALCHEMY_ENCRYPTED_FIELD_ENGINE", "aes")
|
||||
if engine_name not in ENCRYPTION_ENGINES:
|
||||
# Do not log the configured value itself: it originates from
|
||||
# the app config (which also holds the SECRET_KEY), and
|
||||
# static analysis flags interpolating any config-sourced
|
||||
# value into a log as potential clear-text secret logging.
|
||||
# The set of valid engines is enough to diagnose the typo.
|
||||
logger.warning(
|
||||
"Unrecognized SQLALCHEMY_ENCRYPTED_FIELD_ENGINE;"
|
||||
" falling back to AES-CBC. Valid engines: %s",
|
||||
", ".join(sorted(ENCRYPTION_ENGINES)),
|
||||
)
|
||||
kwargs["engine"] = ENCRYPTION_ENGINES.get(engine_name, AesEngine)
|
||||
return EncryptedType(*args, lambda: app_config["SECRET_KEY"], **kwargs)
|
||||
|
||||
raise Exception( # pylint: disable=broad-exception-raised
|
||||
@@ -101,10 +131,40 @@ class EncryptedFieldFactory:
|
||||
|
||||
|
||||
class SecretsMigrator:
|
||||
def __init__(self, previous_secret_key: str) -> None:
|
||||
"""Re-encrypts every app-encrypted column in the ORM.
|
||||
|
||||
Two modes, which can also be combined:
|
||||
|
||||
- **Key rotation** — pass ``previous_secret_key``. Values are decrypted
|
||||
under the previous key and re-encrypted under the current ``SECRET_KEY``
|
||||
using the currently-configured engine.
|
||||
- **Engine migration** — pass ``target_engine`` (e.g. ``AesGcmEngine``).
|
||||
Values are decrypted under the current ``SECRET_KEY`` with the source
|
||||
engine and re-encrypted with the target engine. This is how an existing
|
||||
install moves from AES-CBC to authenticated AES-GCM without bricking
|
||||
stored secrets; the ``SECRET_KEY`` itself is unchanged.
|
||||
|
||||
Both modes share the same all-or-nothing transaction and per-column
|
||||
idempotency: a value already readable in the target form is left untouched,
|
||||
so a run can be safely repeated or resumed.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
previous_secret_key: Optional[str] = None,
|
||||
target_engine: Optional[type[Any]] = None,
|
||||
) -> None:
|
||||
from superset import db # pylint: disable=import-outside-toplevel
|
||||
|
||||
self._db = db
|
||||
self._secret_key = current_app.config["SECRET_KEY"]
|
||||
self._target_engine = target_engine
|
||||
# In engine-migration mode the SECRET_KEY does not change: the source
|
||||
# ciphertext is decrypted under the current key (with the source
|
||||
# engine), so default the "previous" key to the current one when the
|
||||
# caller only asked for an engine change.
|
||||
if target_engine is not None and not previous_secret_key:
|
||||
previous_secret_key = self._secret_key
|
||||
self._previous_secret_key = previous_secret_key
|
||||
self._dialect: Dialect = db.engine.url.get_dialect()
|
||||
|
||||
@@ -177,6 +237,54 @@ class SecretsMigrator:
|
||||
cols = ",".join(pk_columns + column_names)
|
||||
return conn.execute(f"SELECT {cols} FROM {table_name}") # noqa: S608
|
||||
|
||||
def _target_type(self, encrypted_type: EncryptedType) -> EncryptedType:
|
||||
"""The EncryptedType to re-encrypt a value *into*.
|
||||
|
||||
For a key rotation this is the column's own configured type (current
|
||||
engine + lazily-resolved current key). For an engine migration it is a
|
||||
type pinned to the requested target engine under the current key.
|
||||
"""
|
||||
if self._target_engine is None:
|
||||
return encrypted_type
|
||||
return EncryptedType(
|
||||
type_in=encrypted_type.underlying_type,
|
||||
key=self._secret_key,
|
||||
engine=self._target_engine,
|
||||
)
|
||||
|
||||
def _source_decryptors(self, encrypted_type: EncryptedType) -> list[EncryptedType]:
|
||||
"""Candidate decryptors, tried in order, to recover a value's plaintext.
|
||||
|
||||
1. The column's configured type — current key + currently-configured
|
||||
engine. During an engine migration this reads the source ciphertext
|
||||
while the config still points at the source engine.
|
||||
2. The previous key under each supported engine. Trying the column's
|
||||
configured engine covers ``SECRET_KEY`` rotation for any engine
|
||||
(including AES-GCM); also trying the historical AES-CBC engine
|
||||
covers an engine migration whose config was flipped to the target
|
||||
engine *before* the migrator ran (the source data is still CBC under
|
||||
the current key, which the previous key defaults to in
|
||||
engine-migration mode).
|
||||
"""
|
||||
decryptors = [encrypted_type]
|
||||
if self._previous_secret_key:
|
||||
# Try the column's own engine first, then any remaining supported
|
||||
# engines (notably the historical AES-CBC), de-duplicated so we
|
||||
# never build the same decryptor twice.
|
||||
engines: list[type[Any]] = [type(encrypted_type.engine)]
|
||||
for engine in ENCRYPTION_ENGINES.values():
|
||||
if engine not in engines:
|
||||
engines.append(engine)
|
||||
decryptors.extend(
|
||||
EncryptedType(
|
||||
type_in=encrypted_type.underlying_type,
|
||||
key=self._previous_secret_key,
|
||||
engine=engine,
|
||||
)
|
||||
for engine in engines
|
||||
)
|
||||
return decryptors
|
||||
|
||||
def _re_encrypt_row(
|
||||
self,
|
||||
conn: Connection,
|
||||
@@ -187,24 +295,28 @@ class SecretsMigrator:
|
||||
stats: ReEncryptStats,
|
||||
) -> None:
|
||||
"""
|
||||
Re encrypts all columns in a Row
|
||||
Re-encrypts all columns in a Row into the target form.
|
||||
|
||||
Re-encryption is idempotent per column: we first ask whether the
|
||||
current key can already decrypt the value, and skip if so. Only if
|
||||
the current key fails do we fall back to decrypting with the
|
||||
previous key and re-encrypting. Checking the current key first
|
||||
keeps ``run()`` idempotent regardless of what ``previous_secret_key``
|
||||
the caller supplies — even re-running with the same (unchanged)
|
||||
``SECRET_KEY`` will not rewrite rows.
|
||||
The "target form" is current-key + currently-configured engine for a
|
||||
``SECRET_KEY`` rotation, or current-key + the requested engine for an
|
||||
engine migration (see ``_target_type``).
|
||||
|
||||
Re-encryption is idempotent per column: a value that can already be
|
||||
read in the target form is left untouched. This keeps ``run()``
|
||||
idempotent regardless of what ``previous_secret_key`` the caller
|
||||
supplies, and lets an interrupted engine migration be resumed.
|
||||
Otherwise the plaintext is recovered from the first candidate source
|
||||
decryptor that can read it (see ``_source_decryptors``) and re-encrypted
|
||||
into the target form.
|
||||
|
||||
NULL values are never encrypted, so they are reported separately
|
||||
(neither re-encrypted nor "skipped because already current").
|
||||
|
||||
Per-column outcomes are accumulated onto ``stats`` so the caller can
|
||||
report a summary. Columns whose ciphertext is unreadable under both
|
||||
keys are counted as failures and logged; the exception is not
|
||||
propagated, so processing continues. The caller is responsible for
|
||||
raising once all rows have been scanned.
|
||||
report a summary. Columns whose ciphertext is unreadable under any
|
||||
candidate key/engine are counted as failures and logged; the exception
|
||||
is not propagated, so processing continues. The caller is responsible
|
||||
for raising once all rows have been scanned.
|
||||
|
||||
If no columns need re-encryption, no UPDATE is issued.
|
||||
|
||||
@@ -223,38 +335,48 @@ class SecretsMigrator:
|
||||
stats.null += 1
|
||||
continue
|
||||
|
||||
# Fast path: if the current key can already read the value,
|
||||
# leave it untouched. A failure here simply means we need to try
|
||||
# the previous key below — not a condition worth logging.
|
||||
target_type = self._target_type(encrypted_type)
|
||||
|
||||
# Fast path: if the value can already be read in the target form,
|
||||
# leave it untouched. A failure here simply means we need to try the
|
||||
# source decryptors below — not a condition worth logging.
|
||||
try:
|
||||
encrypted_type.process_result_value(raw_value, self._dialect)
|
||||
target_type.process_result_value(raw_value, self._dialect)
|
||||
except Exception: # noqa: BLE001, S110 # pylint: disable=broad-except
|
||||
pass
|
||||
else:
|
||||
stats.skipped += 1
|
||||
continue
|
||||
|
||||
# Current key cannot decrypt — try the previous key.
|
||||
previous_encrypted_type = EncryptedType(
|
||||
type_in=encrypted_type.underlying_type, key=self._previous_secret_key
|
||||
)
|
||||
try:
|
||||
unencrypted_value = previous_encrypted_type.process_result_value(
|
||||
raw_value, self._dialect
|
||||
)
|
||||
except Exception as prev_ex: # noqa: BLE001 # pylint: disable=broad-except
|
||||
# Recover the plaintext from the first source decryptor that can
|
||||
# read the value (current engine/key, then previous key).
|
||||
unencrypted_value = None
|
||||
decrypted = False
|
||||
last_error: Optional[Exception] = None
|
||||
for decryptor in self._source_decryptors(encrypted_type):
|
||||
try:
|
||||
unencrypted_value = decryptor.process_result_value(
|
||||
raw_value, self._dialect
|
||||
)
|
||||
except Exception as ex: # noqa: BLE001 # pylint: disable=broad-except
|
||||
last_error = ex
|
||||
continue
|
||||
decrypted = True
|
||||
break
|
||||
|
||||
if not decrypted:
|
||||
logger.error(
|
||||
"Column [%s.%s] cannot be decrypted under the previous"
|
||||
" or current secret key (%s: %s)",
|
||||
"Column [%s.%s] cannot be decrypted under any known"
|
||||
" key/engine (%s: %s)",
|
||||
table_name,
|
||||
column_name,
|
||||
type(prev_ex).__name__,
|
||||
prev_ex,
|
||||
type(last_error).__name__ if last_error else "None",
|
||||
last_error,
|
||||
)
|
||||
stats.failed += 1
|
||||
continue
|
||||
|
||||
re_encrypted_columns[column_name] = encrypted_type.process_bind_param(
|
||||
re_encrypted_columns[column_name] = target_type.process_bind_param(
|
||||
unencrypted_value,
|
||||
self._dialect,
|
||||
)
|
||||
@@ -276,12 +398,13 @@ class SecretsMigrator:
|
||||
|
||||
def run(self) -> ReEncryptStats:
|
||||
"""
|
||||
Re-encrypt every encrypted column in the ORM under the current
|
||||
``SECRET_KEY``.
|
||||
Re-encrypt every encrypted column in the ORM into the target form
|
||||
(current ``SECRET_KEY`` and, for an engine migration, the target
|
||||
engine).
|
||||
|
||||
Returns per-value counts of re-encrypted, skipped (already under the
|
||||
current key), and failed (undecryptable) outcomes. If any failures
|
||||
occurred the transaction is rolled back by raising after the
|
||||
Returns per-value counts of re-encrypted, skipped (already in the
|
||||
target form), null, and failed (undecryptable) outcomes. If any
|
||||
failures occurred the transaction is rolled back by raising after the
|
||||
summary is logged, so partial re-encryption never commits.
|
||||
"""
|
||||
encrypted_meta_info = self.discover_encrypted_fields()
|
||||
|
||||
@@ -29,6 +29,7 @@ from freezegun import freeze_time
|
||||
import superset.cli.importexport
|
||||
import superset.cli.thumbnails
|
||||
import superset.cli.update
|
||||
import superset.utils.encrypt
|
||||
from superset import db
|
||||
from superset.models.dashboard import Dashboard
|
||||
from tests.integration_tests.fixtures.birth_names_dashboard import (
|
||||
@@ -364,3 +365,71 @@ def test_re_encrypt_secrets_failure_exits_nonzero(app_context):
|
||||
# The failure path must be handled by the CLI, not leaked as an
|
||||
# uncaught exception.
|
||||
assert response.exception is None or isinstance(response.exception, SystemExit)
|
||||
|
||||
|
||||
def test_re_encrypt_secrets_engine_option_invokes_migrator(app_context):
|
||||
"""
|
||||
When --engine is provided, the CLI must resolve the engine name to the
|
||||
correct engine class and pass it to SecretsMigrator as target_engine.
|
||||
"""
|
||||
from sqlalchemy_utils.types.encrypted.encrypted_type import AesGcmEngine
|
||||
|
||||
current_app.config.pop("PREVIOUS_SECRET_KEY", None)
|
||||
runner = current_app.test_cli_runner()
|
||||
with mock.patch.object(
|
||||
superset.cli.update,
|
||||
"SecretsMigrator",
|
||||
) as migrator_mock:
|
||||
migrator_mock.return_value.run.return_value = (
|
||||
superset.utils.encrypt.ReEncryptStats()
|
||||
)
|
||||
response = runner.invoke(
|
||||
superset.cli.update.re_encrypt_secrets,
|
||||
["--engine", "aes-gcm"],
|
||||
)
|
||||
|
||||
assert response.exit_code == 0
|
||||
call_kwargs = migrator_mock.call_args.kwargs
|
||||
assert call_kwargs.get("target_engine") is AesGcmEngine
|
||||
assert call_kwargs.get("previous_secret_key") is None
|
||||
|
||||
|
||||
def test_re_encrypt_secrets_engine_option_case_insensitive(app_context):
|
||||
"""
|
||||
The --engine option must be case-insensitive per
|
||||
click.Choice(..., case_sensitive=False).
|
||||
"""
|
||||
from sqlalchemy_utils.types.encrypted.encrypted_type import AesGcmEngine
|
||||
|
||||
current_app.config.pop("PREVIOUS_SECRET_KEY", None)
|
||||
runner = current_app.test_cli_runner()
|
||||
with mock.patch.object(
|
||||
superset.cli.update,
|
||||
"SecretsMigrator",
|
||||
) as migrator_mock:
|
||||
migrator_mock.return_value.run.return_value = (
|
||||
superset.utils.encrypt.ReEncryptStats()
|
||||
)
|
||||
response = runner.invoke(
|
||||
superset.cli.update.re_encrypt_secrets,
|
||||
["--engine", "AES-GCM"],
|
||||
)
|
||||
|
||||
assert response.exit_code == 0
|
||||
assert migrator_mock.call_args.kwargs.get("target_engine") is AesGcmEngine
|
||||
|
||||
|
||||
def test_re_encrypt_secrets_engine_option_invalid_raises_usage(app_context):
|
||||
"""
|
||||
An unrecognized engine name must produce a click usage error, not a
|
||||
traceback or silent failure.
|
||||
"""
|
||||
runner = current_app.test_cli_runner()
|
||||
response = runner.invoke(
|
||||
superset.cli.update.re_encrypt_secrets,
|
||||
["--engine", "nonexistent-engine"],
|
||||
)
|
||||
|
||||
assert response.exit_code != 0
|
||||
assert "Invalid value" in response.output or "Usage:" in response.output
|
||||
assert "aes" in response.output or "aes-gcm" in response.output
|
||||
|
||||
246
tests/unit_tests/utils/encrypt_test.py
Normal file
246
tests/unit_tests/utils/encrypt_test.py
Normal file
@@ -0,0 +1,246 @@
|
||||
# Licensed to the Apache Software Foundation (ASF) under one
|
||||
# or more contributor license agreements. See the NOTICE file
|
||||
# distributed with this work for additional information
|
||||
# regarding copyright ownership. The ASF licenses this file
|
||||
# to you under the Apache License, Version 2.0 (the
|
||||
# "License"); you may not use this file except in compliance
|
||||
# with the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing,
|
||||
# software distributed under the License is distributed on an
|
||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
from sqlalchemy import String
|
||||
from sqlalchemy.engine import make_url
|
||||
from sqlalchemy_utils.types.encrypted.encrypted_type import AesEngine, AesGcmEngine
|
||||
|
||||
from superset.utils.encrypt import (
|
||||
EncryptedType,
|
||||
ReEncryptStats,
|
||||
SecretsMigrator,
|
||||
SQLAlchemyUtilsAdapter,
|
||||
)
|
||||
|
||||
SECRET = {"SECRET_KEY": "x" * 32}
|
||||
SECRET_KEY = "k" * 32
|
||||
DIALECT = make_url("sqlite://").get_dialect()
|
||||
|
||||
|
||||
def _encrypted_type(engine: type) -> EncryptedType:
|
||||
"""A standalone EncryptedType for a given engine under SECRET_KEY."""
|
||||
return EncryptedType(String(1024), key=lambda: SECRET_KEY, engine=engine)
|
||||
|
||||
|
||||
def _engine_migrator(target_engine: type) -> SecretsMigrator:
|
||||
"""Build a SecretsMigrator in engine-migration mode without an app context.
|
||||
|
||||
``__init__`` reads ``current_app`` and the DB dialect, so — like the
|
||||
existing row-level tests that override ``_dialect`` — we set the few
|
||||
attributes ``_re_encrypt_row`` actually uses directly.
|
||||
"""
|
||||
migrator = SecretsMigrator.__new__(SecretsMigrator)
|
||||
migrator._secret_key = SECRET_KEY # noqa: SLF001
|
||||
migrator._target_engine = target_engine # noqa: SLF001
|
||||
# Engine migration keeps the SECRET_KEY; previous key defaults to current.
|
||||
migrator._previous_secret_key = SECRET_KEY # noqa: SLF001
|
||||
migrator._dialect = DIALECT # noqa: SLF001
|
||||
return migrator
|
||||
|
||||
|
||||
def test_default_engine_is_aes_cbc() -> None:
|
||||
"""Without config, the adapter keeps the historical AES-CBC engine."""
|
||||
field = SQLAlchemyUtilsAdapter().create(SECRET, String(128))
|
||||
assert isinstance(field.engine, AesEngine)
|
||||
|
||||
|
||||
def test_aes_gcm_engine_selected_by_config() -> None:
|
||||
"""SQLALCHEMY_ENCRYPTED_FIELD_ENGINE='aes-gcm' selects authenticated AES-GCM."""
|
||||
field = SQLAlchemyUtilsAdapter().create(
|
||||
{**SECRET, "SQLALCHEMY_ENCRYPTED_FIELD_ENGINE": "aes-gcm"},
|
||||
String(128),
|
||||
)
|
||||
assert isinstance(field.engine, AesGcmEngine)
|
||||
|
||||
|
||||
def test_unknown_engine_falls_back_to_aes_cbc() -> None:
|
||||
"""An unrecognized engine name falls back to the safe historical default."""
|
||||
field = SQLAlchemyUtilsAdapter().create(
|
||||
{**SECRET, "SQLALCHEMY_ENCRYPTED_FIELD_ENGINE": "bogus"},
|
||||
String(128),
|
||||
)
|
||||
assert isinstance(field.engine, AesEngine)
|
||||
|
||||
|
||||
def test_explicit_engine_kwarg_takes_precedence() -> None:
|
||||
"""An explicit engine kwarg overrides the config (used by the migrator)."""
|
||||
field = SQLAlchemyUtilsAdapter().create(
|
||||
{**SECRET, "SQLALCHEMY_ENCRYPTED_FIELD_ENGINE": "aes-gcm"},
|
||||
String(128),
|
||||
engine=AesEngine,
|
||||
)
|
||||
assert isinstance(field.engine, AesEngine)
|
||||
|
||||
|
||||
def test_engine_migration_cbc_to_gcm_re_encrypts() -> None:
|
||||
"""CBC source value is re-encrypted into GCM under the same SECRET_KEY.
|
||||
|
||||
Mirrors the recommended runbook: the migrator runs while the config still
|
||||
points at AES-CBC (the column type), re-encrypting into AES-GCM.
|
||||
"""
|
||||
cbc = _encrypted_type(AesEngine)
|
||||
ciphertext = cbc.process_bind_param("hunter2", DIALECT)
|
||||
|
||||
migrator = _engine_migrator(AesGcmEngine)
|
||||
conn = MagicMock()
|
||||
row = {"id": 1, "password": ciphertext}
|
||||
stats = ReEncryptStats()
|
||||
|
||||
migrator._re_encrypt_row( # noqa: SLF001
|
||||
conn, row, "dbs", {"password": cbc}, ["id"], stats
|
||||
)
|
||||
|
||||
assert stats == ReEncryptStats(re_encrypted=1)
|
||||
assert conn.execute.call_count == 1
|
||||
new_value = conn.execute.call_args.kwargs["password"]
|
||||
# The stored value changed and now decrypts as GCM back to the plaintext.
|
||||
assert new_value != ciphertext
|
||||
gcm = _encrypted_type(AesGcmEngine)
|
||||
assert gcm.process_result_value(new_value, DIALECT) == "hunter2"
|
||||
|
||||
|
||||
def test_engine_migration_idempotent_for_already_target() -> None:
|
||||
"""A value already in the target (GCM) form is skipped — runs are resumable.
|
||||
|
||||
The column is still configured as CBC (config not yet flipped), but the
|
||||
value has already been migrated to GCM, so it must be left untouched.
|
||||
"""
|
||||
gcm_value = _encrypted_type(AesGcmEngine).process_bind_param("hunter2", DIALECT)
|
||||
cbc_column = _encrypted_type(AesEngine)
|
||||
|
||||
migrator = _engine_migrator(AesGcmEngine)
|
||||
conn = MagicMock()
|
||||
row = {"id": 1, "password": gcm_value}
|
||||
stats = ReEncryptStats()
|
||||
|
||||
migrator._re_encrypt_row( # noqa: SLF001
|
||||
conn, row, "dbs", {"password": cbc_column}, ["id"], stats
|
||||
)
|
||||
|
||||
assert stats == ReEncryptStats(skipped=1)
|
||||
assert conn.execute.call_count == 0
|
||||
|
||||
|
||||
def test_engine_migration_reads_cbc_after_config_already_flipped() -> None:
|
||||
"""CBC source is still migrated when the config was flipped to GCM first.
|
||||
|
||||
If an operator sets the config engine to GCM before running the migrator,
|
||||
the column type can no longer read the CBC value; the previous-key (== the
|
||||
current key) AES-CBC decryptor recovers it and it is re-encrypted as GCM.
|
||||
"""
|
||||
cbc_value = _encrypted_type(AesEngine).process_bind_param("hunter2", DIALECT)
|
||||
gcm_column = _encrypted_type(AesGcmEngine)
|
||||
|
||||
migrator = _engine_migrator(AesGcmEngine)
|
||||
conn = MagicMock()
|
||||
row = {"id": 1, "password": cbc_value}
|
||||
stats = ReEncryptStats()
|
||||
|
||||
migrator._re_encrypt_row( # noqa: SLF001
|
||||
conn, row, "dbs", {"password": gcm_column}, ["id"], stats
|
||||
)
|
||||
|
||||
assert stats == ReEncryptStats(re_encrypted=1)
|
||||
new_value = conn.execute.call_args.kwargs["password"]
|
||||
assert gcm_column.process_result_value(new_value, DIALECT) == "hunter2"
|
||||
|
||||
|
||||
def test_engine_migration_gcm_to_cbc_rolls_back() -> None:
|
||||
"""GCM source value is rolled back to CBC under the same SECRET_KEY.
|
||||
|
||||
The reverse of the forward migration (``--engine aes``). The idempotency
|
||||
fast-path decrypts in the *target* form first; since the target here is
|
||||
unauthenticated AES-CBC, this guards against it mis-reading the AES-GCM
|
||||
ciphertext and wrongly skipping the value instead of re-encrypting it.
|
||||
"""
|
||||
gcm_value = _encrypted_type(AesGcmEngine).process_bind_param("hunter2", DIALECT)
|
||||
gcm_column = _encrypted_type(AesGcmEngine)
|
||||
|
||||
migrator = _engine_migrator(AesEngine)
|
||||
conn = MagicMock()
|
||||
row = {"id": 1, "password": gcm_value}
|
||||
stats = ReEncryptStats()
|
||||
|
||||
migrator._re_encrypt_row( # noqa: SLF001
|
||||
conn, row, "dbs", {"password": gcm_column}, ["id"], stats
|
||||
)
|
||||
|
||||
assert stats == ReEncryptStats(re_encrypted=1)
|
||||
new_value = conn.execute.call_args.kwargs["password"]
|
||||
assert new_value != gcm_value
|
||||
# The rolled-back value now decrypts as AES-CBC back to the plaintext.
|
||||
assert _encrypted_type(AesEngine).process_result_value(new_value, DIALECT) == (
|
||||
"hunter2"
|
||||
)
|
||||
|
||||
|
||||
def _key_rotation_migrator(previous_secret_key: str) -> SecretsMigrator:
|
||||
"""Build a SecretsMigrator in key-rotation mode without an app context.
|
||||
|
||||
Like ``_engine_migrator`` but with no target engine: values are decrypted
|
||||
under ``previous_secret_key`` and re-encrypted under the current key using
|
||||
the column's own engine.
|
||||
"""
|
||||
migrator = SecretsMigrator.__new__(SecretsMigrator)
|
||||
migrator._secret_key = SECRET_KEY # noqa: SLF001
|
||||
migrator._target_engine = None # noqa: SLF001
|
||||
migrator._previous_secret_key = previous_secret_key # noqa: SLF001
|
||||
migrator._dialect = DIALECT # noqa: SLF001
|
||||
return migrator
|
||||
|
||||
|
||||
def test_key_rotation_for_aes_gcm_column() -> None:
|
||||
"""SECRET_KEY rotation works for an AES-GCM column.
|
||||
|
||||
The previous-key fallback must use the column's AES-GCM engine, otherwise
|
||||
GCM ciphertext written under the old key cannot be decrypted and the
|
||||
rotation rolls back.
|
||||
"""
|
||||
old_key = "o" * 32
|
||||
gcm_old = EncryptedType(String(1024), key=lambda: old_key, engine=AesGcmEngine)
|
||||
old_value = gcm_old.process_bind_param("hunter2", DIALECT)
|
||||
gcm_column = _encrypted_type(AesGcmEngine)
|
||||
|
||||
migrator = _key_rotation_migrator(previous_secret_key=old_key)
|
||||
conn = MagicMock()
|
||||
row = {"id": 1, "password": old_value}
|
||||
stats = ReEncryptStats()
|
||||
|
||||
migrator._re_encrypt_row( # noqa: SLF001
|
||||
conn, row, "dbs", {"password": gcm_column}, ["id"], stats
|
||||
)
|
||||
|
||||
assert stats == ReEncryptStats(re_encrypted=1)
|
||||
new_value = conn.execute.call_args.kwargs["password"]
|
||||
assert gcm_column.process_result_value(new_value, DIALECT) == "hunter2"
|
||||
|
||||
|
||||
def test_engine_migration_unreadable_value_counts_as_failure() -> None:
|
||||
"""A value no engine/key can read is a failure, not a silent pass-through."""
|
||||
migrator = _engine_migrator(AesGcmEngine)
|
||||
conn = MagicMock()
|
||||
row = {"id": 1, "password": b"not-valid-ciphertext"}
|
||||
stats = ReEncryptStats()
|
||||
|
||||
migrator._re_encrypt_row( # noqa: SLF001
|
||||
conn, row, "dbs", {"password": _encrypted_type(AesEngine)}, ["id"], stats
|
||||
)
|
||||
|
||||
assert stats == ReEncryptStats(failed=1)
|
||||
assert conn.execute.call_count == 0
|
||||
Reference in New Issue
Block a user