Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Joe Li <joe@preset.io>
BigQuery rejects filter values containing apostrophes (e.g. O'Brien): the
sqlalchemy-bigquery dialect renders string literals via repr(), which switches
to double-quote delimiters that BigQuery parses as identifiers, causing a
syntax error.
Monkey-patch the dialect's colspecs with a TypeDecorator whose literal_processor
emits single-quoted literals using backslash escaping ('O\'Brien'). Doubled
single quotes ('O''Brien') are NOT valid in BigQuery (parsed as concatenated
literals). Control characters are emitted as named escapes with a \xhh fallback,
since BigQuery forbids literal control chars in quoted strings. Follows the
existing Databricks dialect pattern.
Fixes#35857
--title "Scheduled Docker image refresh failed for ${LATEST_RELEASE}" \
--label "infra:container" \
--label "bug" \
--body "The weekly Docker base-image refresh failed for release \`${LATEST_RELEASE}\`. Published images may be missing upstream base-layer security patches until this is resolved.
description:'Use Cypress Dashboard (true/false) [paid service - trigger manually when needed]. You MUST provide a branch and/or PR number below for this to work.'
description:"Use Cypress Dashboard (true/false) [paid service - trigger manually when needed]. You MUST provide a branch and/or PR number below for this to work."
and please read our [Slack Community Guidelines](https://github.com/apache/superset/blob/master/CODE_OF_CONDUCT.md#slack-community-guidelines)
- [Join our dev@superset.apache.org Mailing list](https://lists.apache.org/list.html?dev@superset.apache.org). To join, simply send an email to [dev-subscribe@superset.apache.org](mailto:dev-subscribe@superset.apache.org)
- If you want to help troubleshoot GitHub Issues involving the numerous database drivers that Superset supports, please consider adding your name and the databases you have access to on the [Superset Database Familiarity Rolodex](https://docs.google.com/spreadsheets/d/1U1qxiLvOX0kBTUGME1AHHi6Ywel6ECF8xk_Qy-V9R8c/edit#gid=0)
- Join Superset's Town Hall and [Operational Model](https://preset.io/blog/the-superset-operational-model-wants-you/) recurring meetings. Meeting info is available on the [Superset Community Calendar](https://superset.apache.org/community)
@@ -242,16 +247,13 @@ Understanding the Superset Points of View
@@ -109,7 +109,7 @@ If yes, it is in scope. If no, it is out of scope. The lists below apply that te
- Any action an Admin role can perform through documented configuration, API, or UI. The Admin role is a trusted operational principal by policy. Per MITRE CNA Operational Rules 4.1, a qualifying vulnerability must violate a security policy; behavior within a documented trust boundary does not.
- Deployment or operator decisions: the values of secrets and tokens, whether internal networks are reachable from the server, which database connectors or cache backends are enabled, which feature flags are set, where notifications are delivered, and which third-party plugins are loaded.
- Compromise, modification, or malicious control of trusted backend infrastructure. Apache Superset assumes the integrity of its metastore, cache backends (for example Redis or Memcached), message brokers, secret stores, and other operator-managed infrastructure. Findings that require an attacker to read from, write to, or otherwise tamper with these systems, including injecting malicious state, serialized objects, cache entries, task metadata, configuration, or database records, are post-compromise scenarios and do not constitute vulnerabilities in Apache Superset itself. A finding remains in scope only if an unprivileged user can cause such modification through a vulnerability in Apache Superset.
-Code paths whose intended purpose is example data, demos, fixtures, local development, or documentation, rather than the production runtime.
-The continued presence of expired key-value or metastore-cache entries that have not yet beendeleted from the metadata database. Such entries are excluded from reads once expired, are purged opportunistically on write, and are removed in bulk by the scheduled `prune_key_value` maintenance task; their lingering until purged is an eventual-cleanup property, not a security boundary, and does not constitute a vulnerability.
- How a downstream application (spreadsheet program, email client, browser handling user-downloaded files) interprets output Apache Superset produced for it.
- Findings without a reproducible proof of concept against a supported release. The burden of demonstrating exploitability rests with the reporter; findings closed for lack of a proof of concept may be refiled if one is later produced.
- Brute force, rate limiting, denial of service, or resource exhaustion that does not bypass a documented control.
@@ -24,6 +24,210 @@ assists people when migrating to a new version.
## Next
### Guest-token RLS rules reject unknown fields
The `rls` rules passed to `POST /api/v1/security/guest_token/` are now validated strictly: a rule may only contain `dataset` and `clause`. Previously unknown fields were silently dropped, so a mistyped or legacy scope key (most commonly `datasource` instead of `dataset`) produced a rule with no `dataset`, which is treated as a *global* rule applied to every dataset the embedded resource can reach. Such a request now returns HTTP 400 identifying the offending field instead of issuing a token with an unintended global rule. Integrators that were sending extra fields in RLS rules must remove them; valid dataset-scoped (`{"dataset": 41, "clause": "..."}`) and global (`{"clause": "..."}`) rules are unaffected.
### MCP service requires `MCP_JWT_AUDIENCE` when JWT auth is enabled
When the MCP service has JWT auth enabled (`MCP_AUTH_ENABLED = True`), an audience must be configured via `MCP_JWT_AUDIENCE` so issued tokens are bound to this service. The service now fails to start with a clear configuration error when the audience is unset, instead of starting with audience validation skipped. Deployments that enable MCP JWT auth must set `MCP_JWT_AUDIENCE` to the audience value their identity provider issues for the MCP service. API-key-only MCP deployments (JWT auth disabled) are unaffected.
### Swagger UI is opt-in (off by default)
`FAB_API_SWAGGER_UI` now defaults to `False` and is driven by the `SUPERSET_ENABLE_SWAGGER_UI` environment variable. The interactive Swagger UI / OpenAPI documentation endpoints (e.g. `/swagger/v1`) are therefore no longer exposed by default. To enable them, set `SUPERSET_ENABLE_SWAGGER_UI=true` (the bundled Docker development environment sets this) or override `FAB_API_SWAGGER_UI = True` in `superset_config.py`.
### Build details (git SHA / build number) are admin-only by default
The git SHA and build number surfaced in the "About" section, the bootstrap payload, and the public `/version` endpoint are now only included for admin users by default; the release version string is still shown to everyone. To expose the build details to all users (the previous behavior), set the `SUPERSET_EXPOSE_BUILD_DETAILS` environment variable (or `EXPOSE_BUILD_DETAILS_TO_USERS = True` in `superset_config.py`).
### Pivot table First/Last aggregations follow data order
The pivot table chart's `First` and `Last` aggregations now return the first and last value in data (query result) order, instead of effectively returning the minimum and maximum. Existing pivot tables that use these aggregations for totals/subtotals may show different values after upgrading. For deterministic results, ensure the underlying query has a stable sort order.
### `thumbnail_url` removed from dashboard list API response
The `thumbnail_url` field has been removed from `GET /api/v1/dashboard/` list responses. External consumers relying on this field must now construct the thumbnail URL client-side using `id` and `changed_on_utc`:
The thumbnail endpoint redirects to the current digest URL regardless of whether the supplied digest is exact. If the image is not yet cached, that digest URL may return `202` and trigger async generation. Using `changed_on_utc` as the digest is sufficient for cache-busting purposes.
### Tagging fix for `create_all`-bootstrapped schemas
Only affects deployments whose metadata schema was created with SQLAlchemy's `create_all` (rather than `superset db upgrade`) on a foreign-key-enforcing backend — PostgreSQL, or MySQL with `FOREIGN_KEY_CHECKS=1`. Such schemas carry three invalid foreign keys on `tagged_object.object_id` that break tagging (`TAGGING_SYSTEM = True`) with a `ForeignKeyViolation`. Schemas built via `superset db upgrade` are unaffected.
This release stops the ORM from emitting these constraints, but it cannot drop ones already present in your schema. If affected, drop them manually (names vary by backend, so look them up first):
```sql
-- PostgreSQL: names are typically tagged_object_object_id_fkey, _fkey1, _fkey2
### Webhook alerts/reports block private/internal hosts by default
Webhook alert/report dispatch (`WebhookNotification.send`) now validates the target URL's host against the same private/internal-IP block applied to dataset import URLs. If the resolved host is in a loopback, link-local, private (RFC-1918), shared-CGNAT, or multicast range, the webhook is rejected with `NotificationParamException`.
Deployments that intentionally point webhooks at internal targets (chatops bridges, internal automation servers, on-premises Mattermost/Rocket.Chat, etc.) can opt out by setting `ALERT_REPORTS_WEBHOOK_ALLOW_INTERNAL_HOSTS = True` in `superset_config.py`. This mirrors the existing `DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS` opt-out for dataset imports.
### Impala cancel_query blocks private/internal hosts by default
The Impala engine spec's `cancel_query` issues an HTTP request from the Superset backend to the host configured on the Impala database connection. That host is now validated before the request: if it resolves to a private/internal IP range, the cancel call is refused and a warning is logged. Operators whose Impala cluster runs on an internal network can opt out by setting `IMPALA_CANCEL_QUERY_ALLOW_INTERNAL_HOSTS = True` in `superset_config.py`. This mirrors the dataset-import and webhook opt-out flags.
### Map chart renderer and OpenStreetMap migration behavior
The MapLibre migration for deck.gl charts preserves saved non-Mapbox styles on
the MapLibre-compatible path. Saved styles such as OpenStreetMap, `tile://`
tile templates, generic HTTPS style URLs, and charts without a saved style are
not reclassified as Mapbox during migration and do not require
`MAPBOX_API_KEY` only because of the migration.
Saved true Mapbox styles whose value starts with `mapbox://` remain
Mapbox-backed. If a Superset deployment does not configure `MAPBOX_API_KEY`,
those saved Mapbox charts keep the existing missing-key message instead of
silently falling back to MapLibre or another provider. In Explore, deck.gl and
point-cluster renderer controls preserve saved Mapbox state, but the Mapbox
choice is not available as a new working renderer without a configured key.
The MapLibre style choices include `Streets (OSM)`, backed by
`https://tile.openstreetmap.org/{z}/{x}/{y}.png`. This OpenStreetMap tile
be used through normal browser map tile requests and caching; it is not intended
for bulk prefetch or offline tile downloads.
### Password complexity policy enabled by default
Superset now ships a default password-complexity policy, enforced (via Flask-AppBuilder) across self-registration, the user create/edit/reset forms, and the User REST API. The policy requires a minimum password length of 8 characters and rejects a built-in blocklist of common/guessable passwords.
This is enabled by default (`FAB_PASSWORD_COMPLEXITY_ENABLED = True`), so new or reset passwords that are too short or appear in the blocklist will be rejected where they were previously accepted. Existing stored passwords are unaffected until they are next changed.
Operators can tune or disable the policy via config:
### Data uploads bounded by UPLOAD_MAX_FILE_SIZE_BYTES
Single data-file uploads (CSV, Excel, columnar) are now bounded by the `UPLOAD_MAX_FILE_SIZE_BYTES` config option, which defaults to `100 * 1024 * 1024` (100 MB). Files larger than this are rejected with a `413` before their contents are buffered into memory. Set `UPLOAD_MAX_FILE_SIZE_BYTES = None` to disable the check and restore unbounded uploads.
### Duration formatter precision
The `DURATION` number formatter now uses `Intl.DurationFormat` for locale-aware output. By default, sub-second fields are omitted, so values that previously displayed fractional seconds with `pretty-ms`, such as `10500` milliseconds rendering as `10.5s`, now render as `10s`.
To preserve sub-second precision in custom duration formatters, enable `formatSubMilliseconds`.
### Cache warmup authenticates via SUPERSET_CACHE_WARMUP_USER
The `cache-warmup` Celery task now drives a real WebDriver session for reliable authentication and reads the user to authenticate as from the new `SUPERSET_CACHE_WARMUP_USER` config option. It no longer consults `CACHE_WARMUP_EXECUTORS` for the warmup path. `SUPERSET_CACHE_WARMUP_USER` defaults to `None`, so the task fails fast with a clear message until you set it. Operators who previously relied on `CACHE_WARMUP_EXECUTORS` for cache warmup must set `SUPERSET_CACHE_WARMUP_USER` to a dedicated least-privilege user with access to the dashboards they want warmed up before the next warmup run.
### YDB now uses a native sqlglot dialect
YDB SQL parsing now relies on the dedicated [`ydb-sqlglot-plugin`](https://pypi.org/project/ydb-sqlglot-plugin/) dialect, which registers itself with sqlglot automatically. YDB users must install this plugin (e.g., via `pip install "apache-superset[ydb]"`) to avoid a `ValueError` when Superset parses YDB queries.
### Embedded dashboards enforce configured Allowed Domains for postMessage
The embedded dashboard page now validates the origin of incoming `postMessage` events against the dashboard's configured **Allowed Domains**. The server-rendered embedded page exposes the configured domains in its bootstrap payload, and the frontend rejects message events whose origin is not in that list.
Enforcement only applies when the Allowed Domains list is non-empty. If the list is empty (the default), any origin is accepted, so there is no behavior change for embeds that did not configure Allowed Domains.
### Default guest/async JWT secrets are rejected at startup
Superset already refuses to start in production (non-debug, non-testing) when `SECRET_KEY` is left at its built-in default, and when `GUEST_TOKEN_JWT_SECRET` is left at its default while `EMBEDDED_SUPERSET` is enabled. This behavior is extended to `GLOBAL_ASYNC_QUERIES_JWT_SECRET`: if the `GLOBAL_ASYNC_QUERIES` feature flag is enabled and the secret is still the publicly known default (`test-secret-change-me`), Superset logs a clear error and refuses to start.
As with the existing `SECRET_KEY` check, this only fails in production. In debug mode, testing mode, or under the test runner, a warning is logged instead of exiting, so local development is unaffected.
To resolve the error, set a strong random value in `superset_config.py`:
The check is only active when the relevant feature is enabled, so deployments that do not use global async queries (or embedding) are not affected.
### Guest token revocation (opt-in)
Embedded guest tokens can be coarsely revoked at runtime via a new opt-in mechanism. A new config flag `GUEST_TOKEN_REVOCATION_ENABLED` (default `False`) gates the feature. When enabled, every minted guest token carries a revocation version, and tokens whose version is below the current expected version (stored in the metadata database) are rejected at validation time.
Bump the expected version with the new CLI command to invalidate all outstanding guest tokens:
```bash
superset revoke-guest-tokens
```
This change is backward compatible. The feature is off by default, and even when enabled nothing is revoked until an admin explicitly bumps the version: the expected version starts at `0`, and tokens minted before this change (which carry no version claim) are treated as version `0`. No database migration is required.
### Sessions are terminated when an account is disabled
Disabling a user account (setting `active` to `False`, via the admin UI, REST API, or CLI) now terminates that user's outstanding sessions on their next request, instead of relying on a passive check. This works for both client-side cookie sessions and server-side session stores via a per-user invalidation epoch (`user_attribute.sessions_invalidated_at`, added by a migration). The mechanism is inert for users that were never disabled (NULL epoch), so there is no behavior change for active users. Re-enabling an account and logging in again starts a fresh, valid session. The migration backfills the epoch for accounts that are already disabled at upgrade time, so re-enabling such an account does not revive a session that predates this feature.
### Opt-in SSH tunnel server host key verification
SSH tunnels can now optionally pin the expected SSH server host key as a defense-in-depth measure against man-in-the-middle attacks. paramiko's transport performs no known-hosts checking by default, so previously the SSH server's identity was not verified. This feature is opt-in and off by default; existing tunnels are unaffected.
- A new nullable `server_host_key` column on the `ssh_tunnels` table stores the expected host key in authorized-key form (e.g. `ssh-ed25519 AAAA...`). It is a public key and is stored in plaintext. It can be set via the SSH tunnel POST/PUT payloads (`ssh_tunnel.server_host_key`).
- When a tunnel has `server_host_key` set, Superset connects to the SSH server, reads the host key it presents, and rejects the tunnel if it does not match.
- A new config flag `SSH_TUNNEL_STRICT_HOST_KEY_CHECKING` (default `False`) controls fail-closed behavior. When `True`, every tunnel must declare a `server_host_key`; a tunnel without one is rejected.
Runbook to adopt:
1. Capture the SSH server's host key, e.g. `ssh-keyscan -t ed25519 ssh.example.com` (verify it out-of-band).
2. Set that value on the tunnel's `server_host_key` (via the database/SSH tunnel API or UI payload).
3. Optionally set `SSH_TUNNEL_STRICT_HOST_KEY_CHECKING = True` in `superset_config.py` to require host-key verification on all tunnels.
### Dataset import validates catalog against the target connection
Importing a dataset now validates the `catalog` field against the target database connection. When the connection has multi-catalog disabled (`allow_multi_catalog` off) and the dataset's catalog is not the connection's default catalog, the import fails instead of silently persisting the non-default catalog. This matches the validation already enforced on the dataset update path and prevents imported datasets from querying an unintended database.
If you relied on importing datasets with a non-default catalog, enable "Allow changing catalogs" on the target connection, or set the dataset's catalog to the connection's default before importing.
### Extension supply-chain controls (denylist + version policy)
Two opt-in static gates control which extensions are allowed to load:
-`EXTENSION_DENYLIST` refuses extensions matching an id (every version) or `id@version` (a single version), e.g. `["compromised-extension", "other-ext@1.2.3"]`.
-`EXTENSION_VERSION_POLICY` enforces a minimum version per extension id, e.g. `{"acme.widget": "1.2.0"}` (PEP 440 comparison); a release below the minimum is refused.
Both default to empty (no behavior change). They apply to both the `LOCAL_EXTENSIONS` and `EXTENSIONS_PATH` load paths.
### Dynamic Group By respects the sort toggle for display values
The Dynamic Group By chart customization now orders its display values according to the "Sort display control values" toggle: ascending (A–Z), descending (Z–A), or the dataset's source order when the toggle is unset. Previously the dropdown always sorted alphabetically. Existing dashboards where the toggle was never set will show options in source order instead of A–Z; open the customization and enable the toggle to restore alphabetical ordering.
### Selectable encryption engine for app-encrypted fields (AES-GCM)
App-encrypted fields (database passwords, SSH tunnel credentials, OAuth tokens, etc.) can now use authenticated **AES-GCM** encryption instead of the historical unauthenticated **AES-CBC**. A new config selects the engine for the default adapter:
```python
# "aes" (AES-CBC, historical default) | "aes-gcm" (authenticated, recommended for new installs)
SQLALCHEMY_ENCRYPTED_FIELD_ENGINE="aes"
```
**No action required / no behavior change:** the default remains `"aes"`, so existing installs are unaffected.
**Opting in on an existing install:** flipping the engine on a populated database without re-encrypting first will make stored secrets undecryptable, because the two ciphertext formats are not compatible. A migrator is provided. Recommended runbook:
1. Take a metadata-DB backup.
2. Re-encrypt existing secrets into the new engine (the `SECRET_KEY` is unchanged):
```bash
superset re-encrypt-secrets --engine aes-gcm
```
3. Set `SQLALCHEMY_ENCRYPTED_FIELD_ENGINE = "aes-gcm"` in your config.
4. Restart Superset.
5. Re-run the migrator once more after the restart:
```bash
superset re-encrypt-secrets --engine aes-gcm
```
A live instance keeps writing *new* secrets as AES-CBC during the window between step 2 and the restart in step 4; this second pass sweeps those up (it is idempotent, so already-migrated values are skipped).
Schedule the cutover in a quiet window. Runtime reads use only the single configured engine, so in a multi-worker deployment there is an unavoidable brief decrypt-outage between the migration commit and the last worker restarting with the new config — each migrator run is transactional, but the fleet-wide cutover is not zero-downtime.
The migration is transactional (all-or-nothing) and idempotent — it can be safely re-run or resumed. Note that AES-GCM, unlike AES-CBC, does not support querying directly over encrypted columns; audit any code that filters on an encrypted column before switching. See the SIP at `docs/sip/authenticated-encryption-at-rest.md` for details.
### Granular Export Controls
A new feature flag `GRANULAR_EXPORT_CONTROLS` introduces three fine-grained permissions that replace the legacy `can_csv` permission:
@@ -53,6 +257,9 @@ Added a new combined datasource list endpoint at `GET /api/v1/datasource/` to se
- The endpoint is available to users with at least one of `can_read` on `Dataset` or `SemanticView`.
- Semantic views are included only when the `SEMANTIC_LAYERS` feature flag is enabled.
- The endpoint enforces strict `order_column` validation and returns `400` for invalid sort columns.
## 6.1.0
### ClickHouse minimum driver version bump
The minimum required version of `clickhouse-connect` has been raised to `>=0.13.0`. If you are using the ClickHouse connector, please upgrade your `clickhouse-connect` package. The `_mutate_label` workaround that appended hash suffixes to column aliases has also been removed, as it is no longer needed with modern versions of the driver.
max_attempts=300 # ~10 minutes at 2s intervals; first build can be slow
echo "Waiting for webpack dev server at $$url..."
attempt=0
until curl -sf --max-time 5 -H "Host: localhost" -o /dev/null "$$url"; do
attempt=$$((attempt + 1))
if [ "$$attempt" -ge "$$max_attempts" ]; then
echo "ERROR: webpack dev server did not serve $$url after $$max_attempts attempts." >&2
echo "Is the dev server running? With BUILD_SUPERSET_FRONTEND_IN_DOCKER=false you must start it on the host (e.g. 'npm run dev' in superset-frontend)." >&2
exit 1
fi
if [ $$((attempt % 15)) -eq 0 ]; then
echo "Still waiting for webpack dev server... ($$attempt/$$max_attempts)"
fi
sleep 2
done
echo "Webpack dev server is ready; starting nginx."
On a Slack Enterprise Grid org, an org-scoped token spans multiple workspaces, so
workspace-scoped methods such as `conversations.list` require a `team_id` to
indicate which workspace to target. Set `SLACK_TEAM_ID` to your workspace (team)
ID so Superset can list channels and deliver reports:
```python
# The workspace (team) ID to target, e.g. "T01234567"
SLACK_TEAM_ID = "T01234567"
```
This defaults to `None` and only needs to be set when using an org-scoped token;
it is accepted but ignored for standard workspace-level tokens.
### Webhook integration
Superset can send alert and report notifications to any HTTP endpoint — useful for chat platforms, incident management tools, or custom automation.
@@ -160,7 +175,7 @@ When enabled, Superset rejects webhook configurations that use `http://` URLs.
#### Retry Behavior
Superset automatically retries webhook deliveries on `429 Too Many Requests` and `5xx` server errors using exponential backoff.
Superset automatically retries webhook deliveries on `429 Too Many Requests` and `5xx` server errors using exponential backoff. Retries are bounded to roughly 120 seconds of cumulative wall-clock time (worst case ~210 seconds, because the bound is checked against the time elapsed before each attempt, so the final request can begin just under the limit and still run its full request timeout), after which the delivery is abandoned.
# Extend the default CeleryConfig to add cache warmup schedule
class CustomCeleryConfig(CeleryConfig):
beat_schedule = {
**CeleryConfig.beat_schedule,
'cache-warmup-hourly': {
'task': 'cache-warmup',
'schedule': crontab(minute=0, hour='*'), # hourly
'kwargs': {
'strategy_name': 'top_n_dashboards',
'top_n': 5,
'since': '7 days ago',
},
},
}
CELERY_CONFIG = CustomCeleryConfig
```
This will cache the top 5 most popular dashboards every hour. For other
strategies, check the `superset/tasks/cache.py` file.
## Caching Thumbnails
This is an optional feature that can be turned on by activating its [feature flag](/admin-docs/configuration/configuring-superset#feature-flags) on config:
Superset uses [Scarf Gateway](https://about.scarf.sh/scarf-gateway) to collect telemetry data. Knowing the installation counts for different Superset versions informs the project's decisions about patching and long-term support. Scarf purges personally identifiable information (PII) and provides only aggregated statistics.
To opt-out of this data collection in your Helm-based installation, edit the `repository:` line in your `helm/superset/values.yaml` file, replacing `apachesuperset.docker.scarf.sh/apache/superset` with `apache/superset` to pull the image directly from Docker Hub.
There are two independent telemetry channels:
- **Image pulls** (Scarf Gateway): to opt out, edit the `repository:` line in your `helm/superset/values.yaml` file, replacing `apachesuperset.docker.scarf.sh/apache/superset` with `apache/superset` to pull the image directly from Docker Hub.
- **The analytics pixel** rendered in the UI: to opt out, set the `SCARF_ANALYTICS` environment variable to `false` on the Superset containers via `extraEnv` in your `values.yaml`:
```yaml
extraEnv:
SCARF_ANALYTICS: "false"
```
This is read at runtime, so it takes effect on the pre-built images without rebuilding the frontend.
Ubuntu **24.04** uses python 3.12 per default, which currently is not supported by Superset. You need to add a second python installation of 3.11 and install the required additional dependencies.
In more recent versions of CentOS and Fedora, you may need to install a slightly different set of packages using `dnf`:
@@ -157,8 +150,15 @@ superset load_examples
superset init
# To start a development web server on port 8088, use -p to bind to another port
superset run -p 8088 --with-threads --reload --debugger
superset run -p 8088 --with-threads --reload
# For debugging with interactive console (⚠️ localhost only)
# superset run -p 8088 --with-threads --reload --debugger
```
:::warning Security Note
The `--debugger` flag enables Werkzeug's interactive console at `/console`. Only use this for local development and never bind to `0.0.0.0` or expose the server to networks when debugging is enabled.
:::
If everything worked, you should be able to navigate to `hostname:port` in your browser (e.g.
locally by default at `localhost:8088`) and login using the username and password you created.
@@ -161,6 +161,7 @@ Here's the documentation section how how to set up Talisman: https://superset.ap
- [ ] Regularly update to the latest major or minor versions of Superset. Those versions receive up-to-date security patches.
- [ ] Rotate the `SUPERSET_SECRET_KEY` periodically (e.g., quarterly) and after any potential security incident.
- [ ] Rotate the other security-critical secrets (guest-token and async-query JWT secrets, SMTP and database credentials) on the cadence in Appendix C, and after any potential security incident.
- [ ] Conduct quarterly access reviews for all users.
- [ ] Assuming logging and monitoring is in place, review security monitoring alerts weekly.
@@ -173,6 +174,24 @@ Rotating the `SUPERSET_SECRET_KEY` is a critical security procedure. It is manda
The procedure for safely rotating the SECRET_KEY must be followed precisely to avoid locking yourself out of your instance. The official Apache Superset documentation maintains the correct, up-to-date procedure. Please follow the official guide here:
### **Appendix C: Secrets Register and Rotation Schedule**
`SUPERSET_SECRET_KEY` is not the only security-critical secret in a Superset deployment. Maintain an inventory of all such secrets, store each in a secrets manager (not in `superset_config.py` or version control), assign an owner, and rotate them on a defined cadence as well as after any suspected compromise.
| Database connection passwords | Access to analytical databases and the metadata DB | Direct database access | Per organizational policy + post-incident |
Notes:
- Rotating `GUEST_TOKEN_JWT_SECRET` or `GLOBAL_ASYNC_QUERIES_JWT_SECRET` invalidates outstanding tokens of that type; schedule rotations accordingly.
- After a suspected compromise, rotate **all** of the above, not only `SUPERSET_SECRET_KEY`.
- Keep the register under change control so new secrets introduced by future features are added to the rotation schedule.
:::resources
- [Blog: Running Apache Superset on the Open Internet](https://preset.io/blog/running-apache-superset-on-the-open-internet-a-report-from-the-fireline/)
- [Blog: How Security Vulnerabilities are Reported & Handled in Apache Superset](https://preset.io/blog/how-security-vulnerabilities-are-reported-and-handled-in-apache-superset/)
# To start a development web server on port 8088, use -p to bind to another port
superset run -p 8088 --with-threads --reload --debugger
superset run -p 8088 --with-threads --reload
# For debugging with interactive console (⚠️ localhost only)
# superset run -p 8088 --with-threads --reload --debugger
```
:::warning Security Note
The `--debugger` flag enables Werkzeug's interactive console at `/console`. Only use this for local development and never bind to `0.0.0.0` or expose the server to networks when debugging is enabled.
:::
If everything worked, you should be able to navigate to `hostname:port` in your browser (e.g.
locally by default at `localhost:8088`) and login using the username and password you created.
@@ -34,15 +34,14 @@ Frontend contribution types allow extensions to extend Superset's user interface
Extensions can add new views or panels to the host application, such as custom SQL Lab panels, dashboards, or other UI components. Contribution areas are uniquely identified (e.g., `sqllab.panels` for SQL Lab panels), enabling seamless integration into specific parts of the application.
```tsx
importReactfrom'react';
```typescript
import{views}from'@apache-superset/core';
importMyPanelfrom'./MyPanel';
views.registerView(
{id:'my-extension.main',name:'My Panel Name'},
'sqllab.panels',
()=><MyPanel/>,
MyPanel,
);
```
@@ -112,6 +111,24 @@ editors.registerEditor(
See [Editors Extension Point](./extension-points/editors.md) for implementation details.
### Chat
Extensions can add a chat interface to Superset by registering a trigger component and a panel component. The host owns the layout, open/close state, and display mode — the extension only provides the UI. The panel can be displayed as a floating overlay or docked as a resizable sidebar beside the page content, and the user's preference is persisted across reloads.
```tsx
import{chat}from'@apache-superset/core';
importChatTriggerfrom'./ChatTrigger';
importChatPanelfrom'./ChatPanel';
chat.registerChat(
{id:'my-org.my-chat',name:'My Chat'},
ChatTrigger,
ChatPanel,
);
```
See [Chat](./extension-points/chat.md) for implementation details.
## Backend
Backend contribution types allow extensions to extend Superset's server-side capabilities. Backend contributions are registered at startup via classes and functions imported from the auto-discovered `entrypoint.py` file.
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Chat Contributions
Extensions can add a chat interface to Superset by registering a trigger and a panel. The host owns the layout, open/close state, and display mode — the extension only needs to provide the UI components.
## Overview
A chat registration consists of two React components:
| Component | Role |
|-----------|------|
| **Trigger** | Always-visible entry point (e.g., a floating button). Rendered in the bottom-right corner in floating mode, or as a fixed overlay in panel mode. |
| **Panel** | The chat UI itself (message list, input, etc.). Mounted by the host in the active display mode. |
## Display Modes
The host supports two display modes, switchable by the user or the extension at runtime:
| Mode | Behavior |
|------|----------|
| `floating` | Panel floats above page content, anchored to the bottom-right corner. |
| `panel` | Panel is docked to the right side of the application as a resizable sidebar, sitting beside the page content. |
The user's last selected mode and open/closed state are persisted across page reloads.
## Registering a Chat
Call `chat.registerChat` from your extension's entry point with a descriptor, a trigger factory, and a panel factory:
```tsx
import{chat}from'@apache-superset/core';
importChatTriggerfrom'./ChatTrigger';
importChatPanelfrom'./ChatPanel';
chat.registerChat(
{id:'my-org.my-chat',name:'My Chat'},
ChatTrigger,
ChatPanel,
);
```
Only one chat registration is active at a time. If a second extension calls `registerChat`, it replaces the first and a warning is logged.
## Opening and Closing the Chat
The trigger component is responsible for toggling the panel. Use `chat.isOpen()`, `chat.open()`, and `chat.close()` to control visibility:
Call `chat.setDisplayMode` to switch between `'floating'` and `'panel'` modes. In your panel component, subscribe to `onDidChangeDisplayMode` to react to changes (including those triggered by the user):
All methods are available on the `chat` namespace from `@apache-superset/core`:
| Method / Event | Description |
|----------------|-------------|
| `registerChat(descriptor, trigger, panel)` | Register a chat extension. Returns a `Disposable` to unregister. |
| `open()` | Open the chat panel. No-op if already open or no registration. |
| `close()` | Close the chat panel. |
| `isOpen()` | Returns `true` if the panel is currently open. |
| `getDisplayMode()` | Returns the current display mode (`'floating'` or `'panel'`). |
| `setDisplayMode(mode)` | Switch between `'floating'` and `'panel'` mode. |
| `onDidOpen(listener)` | Subscribe to panel open events. Returns a `Disposable`. |
| `onDidClose(listener)` | Subscribe to panel close events. Returns a `Disposable`. |
| `onDidChangeDisplayMode(listener)` | Subscribe to display mode changes. Returns a `Disposable`. |
| `onDidRegisterChat(listener)` | Subscribe to registration events. |
| `onDidUnregisterChat(listener)` | Subscribe to unregistration events. |
| `onDidResizePanel(listener)` | Subscribe to panel resize events (panel mode only). Not all hosts provide a resizer — do not rely on this firing. Returns a `Disposable`. |
## Next Steps
- **[Contribution Types](../contribution-types.md)** — Explore other contribution types
- **[Development](../development.md)** — Set up your development environment
@@ -321,8 +321,8 @@ This can be used, for example, to convert UTC time to local time.
Superset uses [Scarf](https://about.scarf.sh/) by default to collect basic telemetry data upon installing and/or running Superset. This data helps the maintainers of Superset better understand which versions of Superset are being used, in order to prioritize patch/minor releases and security fixes.
We use the [Scarf Gateway](https://docs.scarf.sh/gateway/) to sit in front of container registries, the [scarf-js](https://about.scarf.sh/package-sdks) package to track `npm` installations, and a Scarf pixel to gather anonymous analytics on Superset page views.
Scarf purges PII and provides aggregated statistics. Superset users can easily opt out of analytics in various ways documented [here](https://docs.scarf.sh/gateway/#do-not-track) and [here](https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics).
Superset maintainers can also opt out of telemetry data collection by setting the `SCARF_ANALYTICS` environment variable to `false` in the Superset container (or anywhere Superset/webpack are run).
Additional opt-out instructions for Docker users are available on the [Docker Installation](/admin-docs/installation/docker-compose) page.
You can also opt out of the analytics pixel by setting the `SCARF_ANALYTICS` environment variable to `false`. This is read at runtime, so setting it on the Superset container (for example via `extraEnv` in the Helm chart, or `docker/.env` for Docker Compose) disables the pixel on the pre-built images without rebuilding the frontend. Note that this only disables the page-view pixel; the Scarf Gateway (container registry) and `scarf-js` (`npm`) channels are opted out separately, as described above.
Additional opt-out instructions are available on the [Docker Compose](/admin-docs/installation/docker-compose) and [Kubernetes](/admin-docs/installation/kubernetes) installation pages.
## Does Superset have an archive panel or trash bin from which a user can recover deleted assets?
<a href="https://x.com/apachesuperset" target="_blank" rel="noopener noreferrer" title="Follow us on X" aria-label="X">
<img src="/img/community/x-symbol.svg" alt="X" />
</a>
<a href="https://www.linkedin.com/company/apache-superset" target="_blank" rel="noopener noreferrer" title="Follow us on LinkedIn" aria-label="LinkedIn">
<a href="https://bsky.app/profile/apachesuperset.bsky.social" target="_blank" rel="noopener noreferrer" title="Follow us on Bluesky" aria-label="Bluesky">
# Additional domains required for docs site functionality:
# - widget.kapa.ai: AI chatbot widget (uses Google reCAPTCHA). Approval here: https://privacy.apache.org/faq/committers.html
# - *.googleapis.com, *.google.com, *.gstatic.com: Google Calendar embed, kapa.ai reCAPTCHA - all of these loaded with user consent, following policy laid out in https://privacy.apache.org/faq/committers.html
# - github.com, *.github.com, *.githubusercontent.com: GitHub user-attachment images in docs (apex github.com serves user-attachments/* assets). Discussed/resolved in this thread: https://issues.apache.org/jira/browse/INFRA-25701?filter=-2 (DPA in place with GitHub)
@@ -1808,6 +1808,10 @@ If you enable DML in the meta database users will be able to run DML queries on
Second, you might want to change the value of `SUPERSET_META_DB_LIMIT`. The default value is 1000, and defines how many are read from each database before any aggregations and joins are executed. You can also set this value `None` if you only have small tables.
:::warning
`SUPERSET_META_DB_LIMIT` is applied to **each** underlying table *before* the in-memory join runs, not to the final result. If any table involved in a join has more rows than the limit, the meta database will read only the first `SUPERSET_META_DB_LIMIT` rows of that table, which means matching rows can be silently dropped and the join can return **incomplete or even empty** results with no error. If you join tables larger than the limit, raise `SUPERSET_META_DB_LIMIT` to comfortably exceed your largest joined table, or set it to `None` when working only with small tables, to get correct results.
:::
Additionally, you might want to restrict the databases to with the meta database has access to. This can be done in the database configuration, under "Advanced" -> "Other" -> "ENGINE PARAMETERS" and adding:
| supersetCeleryFlower.deploymentAdditionalPodSpec | object | `{}` | Custom pod spec to be added to supersetCeleryFlower deployment |
| supersetCeleryFlower.deploymentAnnotations | object | `{}` | Annotations to be added to supersetCeleryFlower deployment |
| supersetCeleryFlower.enabled | bool | `false` | Enables a Celery flower deployment (management UI to monitor celery jobs) WARNING: on superset 1.x, this requires a Superset image that has `flower<1.0.0` installed (which is NOT the case of the default images) flower>=1.0.0 requires Celery 5+ which Superset 1.5 does not support |
| supersetCeleryFlower.extraContainers | list | `[]` | Launch additional containers into supersetCeleryFlower pods |
@@ -207,18 +207,21 @@ On helm this can be set on `extraSecretEnv.SUPERSET_SECRET_KEY` or `configOverri
| supersetNode.connections.redis_host | string | `"{{ .Release.Name }}-redis-headless"` | Change in case of bringing your own redis and then also set redis.enabled:false |
| supersetNode.deploymentAdditionalPodSpec | object | `{}` | Custom pod spec to be added to supersetNode deployment |
| supersetNode.deploymentAnnotations | object | `{}` | Annotations to be added to supersetNode deployment |
| supersetNode.deploymentLabels | object | `{}` | Labels to be added to supersetNode deployment |
| supersetNode.env | object | `{}` | |
| supersetNode.extraContainers | list | `[]` | Launch additional containers into supersetNode pod |
| supersetNode.forceReload | bool | `false` | If true, forces deployment to reload on each upgrade |
| supersetNode.initContainers | list | a container waiting for postgres | Init containers |
| supersetNode.lifecycle | object | `{}` | Container lifecycle hooks, e.g. a preStop sleep so the Service/Ingress stops routing to the pod before gunicorn receives SIGTERM |
| supersetNode.livenessProbe.failureThreshold | int | `3` | |
@@ -251,11 +254,13 @@ On helm this can be set on `extraSecretEnv.SUPERSET_SECRET_KEY` or `configOverri
| supersetNode.startupProbe.successThreshold | int | `1` | |
| supersetNode.startupProbe.timeoutSeconds | int | `1` | |
| supersetNode.strategy | object | `{}` | |
| supersetNode.terminationGracePeriodSeconds | string | `nil` | Pod termination grace period (seconds). Set greater than GUNICORN_TIMEOUT so in-flight requests can drain before SIGKILL |
| supersetNode.topologySpreadConstraints | list | `[]` | TopologySpreadConstrains to be added to supersetNode deployments |
| supersetWebsockets.affinity | object | `{}` | Affinity to be added to supersetWebsockets deployment |
| supersetWebsockets.command | list | `[]` | |
| supersetWebsockets.config | object | see `values.yaml` | The config.json to pass to the server, see https://github.com/apache/superset/tree/master/superset-websocket Note that the configuration can also read from environment variables (which will have priority), see https://github.com/apache/superset/blob/master/superset-websocket/src/config.ts for a list of supported variables |
| supersetWebsockets.enabled | bool | `false` | This is only required if you intend to use `GLOBAL_ASYNC_QUERIES` in `ws` mode see https://superset.apache.org/docs/contributing/misc#async-chart-queries |
| supersetWebsockets.extraContainers | list | `[]` | Launch additional containers into supersetWebsockets pods |
@@ -309,11 +314,13 @@ On helm this can be set on `extraSecretEnv.SUPERSET_SECRET_KEY` or `configOverri
| supersetWorker.autoscaling.targetCPUUtilizationPercentage | int | `80` | |
| supersetWorker.command | list | a `celery worker` command | Worker startup command |
| supersetWorker.deploymentAdditionalPodSpec | object | `{}` | Custom pod spec to be added to supersetWorker deployment |
| supersetWorker.deploymentAnnotations | object | `{}` | Annotations to be added to supersetWorker deployment |
| supersetWorker.deploymentLabels | object | `{}` | Labels to be added to supersetWorker deployment |
| supersetWorker.extraContainers | list | `[]` | Launch additional containers into supersetWorker pod |
| supersetWorker.forceReload | bool | `false` | If true, forces deployment to reload on each upgrade |
| supersetWorker.initContainers | list | a container waiting for postgres and redis | Init container |
| supersetWorker.lifecycle | object | `{}` | Container lifecycle hooks for the worker pod |
| supersetWorker.livenessProbe.exec.command | list | a `celery inspect ping` command | Liveness probe command |
| supersetWorker.livenessProbe.failureThreshold | int | `3` | |
| supersetWorker.livenessProbe.initialDelaySeconds | int | `120` | |
@@ -334,6 +341,7 @@ On helm this can be set on `extraSecretEnv.SUPERSET_SECRET_KEY` or `configOverri
| supersetWorker.resources | object | `{}` | Resource settings for the supersetWorker pods - these settings overwrite might existing values from the global resources object defined above. |
| supersetWorker.startupProbe | object | `{}` | No startup/readiness probes by default since we don't really care about its startup time (it doesn't serve traffic) |
| supersetWorker.strategy | object | `{}` | |
| supersetWorker.terminationGracePeriodSeconds | string | `nil` | Pod termination grace period (seconds) for the worker pod so in-flight tasks can drain before SIGKILL |
| supersetWorker.topologySpreadConstraints | list | `[]` | TopologySpreadConstrains to be added to supersetWorker deployments |
| tolerations | list | `[]` | |
| topologySpreadConstraints | list | `[]` | TopologySpreadConstrains to be added to all deployments |
# See supersetNode.initContainers for the rationale.
SECONDS=0
until (exec 3<>/dev/tcp/"$DB_HOST"/"$DB_PORT") 2>/dev/null; do
if [ "$SECONDS" -ge 120 ]; then
echo "timeout waiting for postgres at $DB_HOST:$DB_PORT after 120s" >&2
exit 1
fi
echo "waiting for postgres at $DB_HOST:$DB_PORT (elapsed ${SECONDS}s)"
sleep 2
done
echo "postgres at $DB_HOST:$DB_PORT is up"
resources:
limits:
memory:"256Mi"
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.