Adds `update_layer_annotation` mutation tool alongside the existing
`create_layer_annotation`. Runs `UpdateAnnotationCommand` with only the
fields provided; handles AnnotationNotFoundError, AnnotationLayerNotFoundError,
AnnotationInvalidError, and AnnotationUpdateFailedError with structured
error responses. Also adds request/response schemas and unit tests.
CI ruff (≥0.9.7) enforces @pytest.fixture and @pytest.mark.asyncio
without parentheses (PT001/PT023). Local ruff 0.4.0 had the opposite
default, causing the file to be modified on CI after checkout.
Adds a new `create_layer_annotation` mutation tool that lets MCP clients
add annotations to existing annotation layers via `CreateAnnotationCommand`.
Registers the tool in app.py alongside other MCP tools.
Adds a new `create_layer_annotation` mutation tool that lets MCP clients
add annotations to existing annotation layers via `CreateAnnotationCommand`.
Registers the tool in app.py alongside other MCP tools.
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Joe Li <joe@preset.io>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Joe Li <joe@preset.io>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Joe Li <joe@preset.io>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Joe Li <joe@preset.io>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: codeant-ai-for-open-source[bot] <244253245+codeant-ai-for-open-source[bot]@users.noreply.github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Joe Li <joe@preset.io>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Joe Li <joe@preset.io>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: codeant-ai-for-open-source[bot] <244253245+codeant-ai-for-open-source[bot]@users.noreply.github.com>
Co-authored-by: Diego Pucci <diegopucci.me@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: codeant-ai-for-open-source[bot] <244253245+codeant-ai-for-open-source[bot]@users.noreply.github.com>
body: '⚠️ **DEPRECATED WORKFLOW** - Ephemeral environment shutdown and build artifacts deleted. Please migrate to the new Superset Showtime system for future PRs.'
body: `@${{ github.actor }} Ephemeral environment spinning up at http://${{ steps.get-ip.outputs.ip }}:8080. Credentials are 'admin'/'admin'. Please allow several minutes for bootstrapping and startup.`
We hope to see you in our [Slack](https://apache-superset.slack.com/) community too! Not signed up? Use our [Slack App](http://bit.ly/join-superset-slack) to self-register.
Before evaluating any code path for security issues, read [`SECURITY.md`](SECURITY.md). It is the canonical, authoritative source for Apache Superset's security model and is referenced by both human reporters and automated scanners.
In short, the test for whether a finding is in scope is one question:
> *Does it let a principal perform an action the role and capability matrix in `SECURITY.md` does not entitle them to?*
If yes, it is in scope. If no, it is not.
The three trust boundaries are:
1.**The Admin role** is a fully trusted operational principal. Anything an Admin can do through documented configuration, API, or UI is an intended capability, not a vulnerability.
2.**The operator** owns deployment-time decisions (secrets, network exposure, feature-flag selection, connector and codec choices, notification destinations, third-party plugins). Misconfiguration at this layer is a deployment defect, not a Superset vulnerability.
3.**The codebase** is responsible for enforcing the role and capability matrix wherever it exposes functionality to a principal: API routes, command and DAO layers, UI handlers, background jobs, and any other entry point. A missing or incorrect enforcement check is in scope no matter where it lives.
The security model assumes that operator-controlled infrastructure, including the metadata database, cache backends, message brokers, secret stores, and deployment environment, remains within the operator's trust boundary. Vulnerabilities must demonstrate a security boundary violation by an attacker who does not already control those systems.
Route-level authorization in this codebase uses one of three Flask-AppBuilder decorators depending on the route type:
-`@protect()` for REST API routes (`ModelRestApi` / `BaseApi`)
-`@has_access_api` for legacy view routes
-`@has_access` for legacy HTML view routes
Object-level authorization via `security_manager.raise_for_access(...)` applies to data-bearing resources: dashboards, charts, datasets and datasources, queries, database and table access, and query contexts. Other resources (annotations, tags, CSS templates, reports, RLS rules, and similar) rely on the route-level decorator plus DAO `base_filters` for ownership scoping; the absence of `raise_for_access` on these resources is by design, not a finding. Code that omits the per-object gate on a route that returns or mutates a specific data-bearing object is in scope; code that follows the correct pattern for its resource class can still contain injection, SSRF, XSS, or other classes of finding unrelated to authorization, which are evaluated separately.
The full role and capability matrix, in-scope and out-of-scope class lists, and CVE aggregation rules are in [`SECURITY.md`](SECURITY.md). Defer to that document for any specifics.
**Requirements for findings filed by automated tooling**
Automated scanners (LLM-based code scanners, static analyzers, dependency tools) that file findings against this codebase must, in each finding, name:
1. The specific role and capability matrix row in [`SECURITY.md`](SECURITY.md) the finding believes is violated.
2. The principal the finding assumes the attacker holds (Public, Gamma, sql_lab, Alpha, Admin, Embedded guest token, or a custom role with explicit capability grants).
Findings that cannot identify both should be filed as questions, not vulnerabilities. This requirement exists to ensure every reported issue is testable against the published security model and to keep speculative or pattern-match-only reports out of the triage queue.
## Key Directories
```
@@ -128,6 +165,7 @@ The Developer Portal auto-generates MDX documentation from Storybook stories. **
## Architecture Patterns
### Security & Features
- **Security model**: see the top-level [Security and Threat Model](#security-and-threat-model) section and [`SECURITY.md`](SECURITY.md)
- **RBAC**: Role-based access via Flask-AppBuilder
- **Feature flags**: Control feature rollouts
- **Row-level security**: SQL-based data access control
A modern, enterprise-ready business intelligence web application.
### Documentation
- **[User Guide](https://superset.apache.org/user-docs/)** — For analysts and business users. Explore data, build charts, create dashboards, and connect databases.
- **[Administrator Guide](https://superset.apache.org/admin-docs/)** — Install, configure, and operate Superset. Covers security, scaling, and database drivers.
- **[Developer Guide](https://superset.apache.org/developer-docs/)** — Contribute to Superset or build on its REST API and extension framework.
[**Why Superset?**](#why-superset) |
[**Supported Databases**](#supported-databases) |
[**Installation and Configuration**](#installation-and-configuration) |
# Part 2: Verify RSA key - this is the same as running `gpg --verify {release}.asc {release}` and comparing the RSA key and email address against the KEYS file # noqa: E501
To ensure engineering focus remains on verified risks and to manage high reporting volumes, all reports must meet the following criteria:
- Plain Text Format: In accordance with Apache guidelines, please provide all details in plain text within the email body. Avoid sending PDFs, Word documents, or password-protected archives.
- Mandatory AI Disclosure: If you utilized Large Language Models (LLMs) or AI tools to identify a flaw or assist in writing a report, you must disclose this in your submission so our triage team can contextualize the findings.
- Human-Verified PoC: All submissions must include a manual, step-by-step Proof of Concept (PoC) performed on a supported release. Raw AI outputs, hypothetical chat transcripts, or unverified scanner logs will be closed as Invalid.
We kindly ask you to include the following information in your report to assist our developers in triaging and remediating issues efficiently:
- Version/Commit: The specific version of Apache Superset or the Git commit hash you are using.
- Configuration: A sanitized copy of your `superset_config.py` file or any config overrides.
- Environment: Your deployment method (e.g., Docker Compose, Helm, or source) and relevant OS/Browser details.
- Impacted Component: Identification of the affected area (e.g., Python backend, React frontend, or a specific database connector).
- Expected vs. Actual Behavior: A clear description of the intended system behavior versus the observed vulnerability.
- Detailed Reproduction Steps: Clear, manual steps to reproduce the vulnerability.
## Security Model
This section defines what Apache Superset considers a security issue and what it does not. It is the canonical reference for reporters, the Apache Superset Security Team, and any automated tool (LLM-based scanner, static analyzer, dependency tool) that needs to constrain its hypotheses to behaviors that genuinely violate the project's security policy.
The model is intentionally written in terms of principals, trust boundaries, and capability surface rather than in terms of specific files, functions, or libraries. New code paths inherit the model automatically.
### Trust Boundaries
Apache Superset's threat model assumes three trust boundaries.
1.*The Admin role* is a fully trusted operational principal. Anything an Admin can do through the documented user interface, REST API, or configuration system is an intended capability, not a vulnerability, even if individually powerful or destructive. The Admin role is, by policy, equivalent to operating-system-level trust over the Apache Superset application. This is unavoidable rather than aspirational: an Admin can, for example, register new database connections of arbitrary type, execute arbitrary SQL through SQL Lab, render Jinja templates that resolve to SQL or rendered HTML, and override application configuration. Granting Admin is functionally equivalent to granting shell access on the host, which is the reasoning behind treating it as a trust boundary in the sense of MITRE CNA Operational Rules 4.1.
2.*The operator* is whoever deploys, configures, and runs Apache Superset. Behaviors that depend on deployment-time decisions are the operator's responsibility, not Apache Superset's. This includes the values of secrets, the network reachability of the application and its data sources, the choice of database connectors and cache backends, the selection of feature flags, the destinations of notifications, and the trust placed in third-party plugins. Defaults that fail closed are the responsibility of the Apache Superset codebase. Defaults that fail open must be accompanied by a documented hardening requirement; applying that hardening is the operator's responsibility, while shipping an undocumented or unflagged fail-open default is a codebase issue.
3.*The Apache Superset codebase* is responsible for enforcing the role and capability matrix below across its product surface. A failure to enforce, anywhere in that surface, is in scope. The codebase's commitments are limited to the role and capability matrix and to controls Apache Superset's own documentation (this file and the linked Security documentation) explicitly positions as security boundaries; configurable hardening that operators can layer on top is treated separately under *Vulnerability Scope* below.
### Roles and Capabilities
Apache Superset ships with the following first-class principals. Detailed permission definitions live in the [Security documentation](https://superset.apache.org/docs/security).
| Principal | Read data | Write objects | Execute SQL | Manage databases | Manage users, roles, RLS |
|---|---|---|---|---|---|
| Public (anonymous) | none by default | no | no | no | no |
| Gamma | only granted datasets | own charts and dashboards on granted datasets | no by default (requires the `sql_lab` role) | no | no |
| Alpha | all data sources | own charts, dashboards, and datasets | no by default (requires the `sql_lab` role) | data upload to existing databases only | no |
| Admin | all | all | yes | yes | yes |
| Embedded guest token | data sources reachable through the embedded dashboards the token authorizes | no | no | no | no |
The `sql_lab` role is *additive*: it grants the SQL Lab permission set on top of the base role above, and is the only path by which Gamma or Alpha gain SQL execution capability. Database access is still scoped per the base role's grants. Admin includes SQL Lab access by default.
Deployments may grant or revoke individual view-menu permissions, which shifts the boundary for that deployment but does not redefine the model. Any custom role created by an operator inherits the same principle: its capabilities are whatever the operator has explicitly granted it. The Public principal follows the same rule: operators may grant the Public role read access to specific datasets or dashboards (typically for anonymous reporting use cases), which shifts the boundary for that deployment without redefining the model.
### Vulnerability Scope
The test for whether a finding is in scope is a single question:
> *Does this finding let a principal perform an action the role and capability matrix above does not entitle them to?*
If yes, it is in scope. If no, it is out of scope. The lists below apply that test to the classes Apache Superset most commonly receives reports about; they are illustrative, not exhaustive.
*In Scope*
- A user, embedded guest, or anonymous visitor reads, modifies, or deletes data outside their granted set. Includes object-level access bypass on charts, dashboards, datasets, saved queries, tags, annotations, and similar per-object endpoints, and row-level-security rule bypass.
- A user supplies input that the codebase should sanitize or parameterize but does not, causing arbitrary SQL, template code, or scripts to execute. Includes injection through Jinja templates, SQL-construction paths, and any field the codebase passes to a query engine or template engine.
- A user bypasses authentication, fixates or reuses another user's session, or reaches an authenticated endpoint without logging in.
- An embedded guest token authorizes actions outside the dashboard it was issued for, or can be forged, replayed, or escalated to a higher principal.
- Apache Superset, acting on behalf of an unprivileged user, fetches an outbound URL the user controls in a feature where Apache Superset itself, not the operator, controls the outbound destination set (server-side request forgery).
- An Apache Superset default fails open without an accompanying documented hardening requirement. The codebase is responsible for shipping fail-closed defaults or for documenting the hardening required when a default fails open; failures of that responsibility are in scope (see *Trust Boundaries*).
- A user bypasses a control Apache Superset documents specifically as a security boundary. This includes row-level security, the access checks tied to the role and capability matrix above, and any feature whose documentation positions it as security-relevant. The codebase commits to enforcing those controls; bypasses are in scope regardless of which principal triggers them.
- A user causes a script to execute in another user's browser through a field the codebase renders to that other user (cross-site scripting), or causes cross-origin leakage of authenticated session state or data.
- A user reaches a route, page, or API endpoint that requires a role they do not have.
*Out of Scope*
- Any action an Admin role can perform through documented configuration, API, or UI. The Admin role is a trusted operational principal by policy. Per MITRE CNA Operational Rules 4.1, a qualifying vulnerability must violate a security policy; behavior within a documented trust boundary does not.
- Deployment or operator decisions: the values of secrets and tokens, whether internal networks are reachable from the server, which database connectors or cache backends are enabled, which feature flags are set, where notifications are delivered, and which third-party plugins are loaded.
- Compromise, modification, or malicious control of trusted backend infrastructure. Apache Superset assumes the integrity of its metastore, cache backends (for example Redis or Memcached), message brokers, secret stores, and other operator-managed infrastructure. Findings that require an attacker to read from, write to, or otherwise tamper with these systems, including injecting malicious state, serialized objects, cache entries, task metadata, configuration, or database records, are post-compromise scenarios and do not constitute vulnerabilities in Apache Superset itself. A finding remains in scope only if an unprivileged user can cause such modification through a vulnerability in Apache Superset.
- Code paths whose intended purpose is example data, demos, fixtures, local development, or documentation, rather than the production runtime.
- How a downstream application (spreadsheet program, email client, browser handling user-downloaded files) interprets output Apache Superset produced for it.
- Findings without a reproducible proof of concept against a supported release. The burden of demonstrating exploitability rests with the reporter; findings closed for lack of a proof of concept may be refiled if one is later produced.
- Brute force, rate limiting, denial of service, or resource exhaustion that does not bypass a documented control.
- Missing security headers, banner or version disclosure, user or object enumeration through error messages or timing, and similar low-impact information disclosure that does not enable a further concrete exploit.
- Bypasses of configurable defense-in-depth hardening that Apache Superset does not document as a security boundary. Apache Superset is not a SQL or database firewall: operator-deployable filters such as SQL function or table denylists, URI restrictions on already-authorized database connectors, and similar belt-and-braces controls are provided to let operators layer hardening on top of the role and capability matrix, not as firewall-grade guarantees the codebase commits to. Findings against such hardening are improvements, not vulnerabilities, unless the documentation positions the specific control as security-relevant.
- Hardening suggestions that improve defense in depth but do not violate the security model.
Findings in third-party dependencies fall into two cases. A finding in a transitive dependency, or in an operator-selected dependency that Apache Superset does not ship, is out of scope and should be reported to the dependency's maintainers. A finding caused by Apache Superset pinning a known-vulnerable version of a direct dependency it ships, or using a dependency in a way that creates a vulnerability the dependency itself does not have, remains in scope. Dependency findings in the official Apache Superset Docker image that fall into the first case can be remediated by extending the image at release time.
When uncertain whether a finding falls in scope, please file it through the reporting process above. The triage team will classify it and explain the reasoning if it is closed as out of scope.
**Outcome of Reports**
Reports that are deemed out of scope for a CVE but represent valid security best practices or hardening opportunities are typically converted into public GitHub issues, where the community can contribute fixes alongside the maintainers. The triage decision and reasoning are communicated back to the reporter in either case.
**Vulnerability Aggregation & CVE Attribution**
In accordance with MITRE CNA Operational Rules (4.1.10, 4.1.11, and 4.2.13), Apache Superset issues CVEs based on the underlying architectural root cause rather than the number of affected endpoints or exploit payloads.
- Aggregation: If multiple exploit vectors stem from the same programmatic failure or shared vulnerable code, they must be aggregated into a single, comprehensive report.
- Independent Fixes: Separate CVEs will only be assigned if the vulnerabilities reside in decoupled architectural modules and can be fixed independently of one another.
Reports that fail to aggregate related findings will be merged during triage to ensure an accurate and defensible CVE record.
**Your responsible disclosure and collaboration are invaluable.**
- [Common Vulnerabilities and Exposures by release](https://superset.apache.org/docs/security/cves)
- [How Security Vulnerabilities are Reported & Handled in Apache Superset (Blog)](https://preset.io/blog/how-security-vulnerabilities-are-reported-and-handled-in-apache-superset/)
**For LLM agents and automated scanners**
This file is the canonical source of truth for what Apache Superset considers a security vulnerability. Automated tooling (LLM-based code scanners, static analyzers, dependency tools) should treat the Security Model section as authoritative when classifying findings. The repository's [AGENTS.md](AGENTS.md) file contains a short pointer to this document for agents that read AGENTS.md as their entry point.
When the feature flag is enabled, these permissions are enforced on both the frontend (disabled buttons with tooltips) and backend (403 responses from API endpoints). When disabled, legacy `can_csv` behavior is preserved.
**Migration behavior:** All three new permissions are granted to every role that currently has `can_csv`, preserving existing access. Admins can then selectively revoke individual export permissions from specific roles as needed.
### Deck.gl MapBox viewport and opacity controls are functional
The Deck.gl MapBox chart's **Opacity**, **Default longitude**, **Default latitude**, and **Zoom** controls were previously non-functional — changing them had no effect on the rendered map. These controls are now wired up correctly.
**Behavior change for existing charts:** Previously, the viewport controls had hard-coded default values (`-122.405293`, `37.772123`, zoom `11` — San Francisco) that were stored in each chart's `form_data` but never applied. The map always used `fitBounds` to center on the data. With this fix, those stored values are now respected, which means existing MapBox charts may open centered on the old default coordinates instead of fitting to data bounds.
**To restore fit-to-data behavior:** Open the chart in Explore, clear the **Default longitude**, **Default latitude**, and **Zoom** fields in the Viewport section, and re-save the chart.
### Combined datasource list endpoint
Added a new combined datasource list endpoint at `GET /api/v1/datasource/` to serve datasets and semantic views in one response.
- The endpoint is available to users with at least one of `can_read` on `Dataset` or `SemanticView`.
- Semantic views are included only when the `SEMANTIC_LAYERS` feature flag is enabled.
- The endpoint enforces strict `order_column` validation and returns `400` for invalid sort columns.
### ClickHouse minimum driver version bump
The minimum required version of `clickhouse-connect` has been raised to `>=0.13.0`. If you are using the ClickHouse connector, please upgrade your `clickhouse-connect` package. The `_mutate_label` workaround that appended hash suffixes to column aliases has also been removed, as it is no longer needed with modern versions of the driver.
### MCP Tool Observability
MCP (Model Context Protocol) tools now include enhanced observability instrumentation for monitoring and debugging:
@@ -60,19 +93,19 @@ ORDER BY total_calls DESC;
**Security note:** Sensitive parameters (passwords, API keys, tokens) are automatically redacted in logs as `[REDACTED]`.
### Signal Cache Backend
### Distributed Coordination Backend
A new `SIGNAL_CACHE_CONFIG` configuration provides a unified Redis-based backend for real-time coordination features in Superset. This backend enables:
A new `DISTRIBUTED_COORDINATION_CONFIG` configuration provides a unified Redis-based backend for real-time coordination features in Superset. This backend enables:
- **Pub/sub messaging** for real-time event notifications between workers
- **Atomic distributed locking** using Redis SET NX EX (more performant than database-backed locks)
- **Event-based coordination** for background task management
The signal cache is used by the Global Task Framework (GTF) for abort notifications and task completion signaling, and will eventually replace `GLOBAL_ASYNC_QUERIES_CACHE_BACKEND` as the standard signaling backend. Configuring this is recommended for Redis enabled production deployments.
The distributed coordination is used by the Global Task Framework (GTF) for abort notifications and task completion signaling, and will eventually replace `GLOBAL_ASYNC_QUERIES_CACHE_BACKEND` as the standard signaling backend. Configuring this is recommended for Redis enabled production deployments.
Example configuration in `superset_config.py`:
```python
SIGNAL_CACHE_CONFIG={
DISTRIBUTED_COORDINATION_CONFIG={
"CACHE_TYPE":"RedisCache",
"CACHE_KEY_PREFIX":"signal_",
"CACHE_REDIS_URL":"redis://localhost:6379/1",
@@ -295,14 +328,14 @@ Note: Pillow is now a required dependency (previously optional) to support image
- [33116](https://github.com/apache/superset/pull/33116) In Echarts Series charts (e.g. Line, Area, Bar, etc.) charts, the `x_axis_sort_series` and `x_axis_sort_series_ascending` form data items have been renamed with `x_axis_sort` and `x_axis_sort_asc`.
There's a migration added that can potentially affect a significant number of existing charts.
- [32317](https://github.com/apache/superset/pull/32317) The horizontal filter bar feature is now out of testing/beta development and its feature flag `HORIZONTAL_FILTER_BAR` has been removed.
- [31590](https://github.com/apache/superset/pull/31590) Marks the begining of intricate work around supporting dynamic Theming, and breaks support for [THEME_OVERRIDES](https://github.com/apache/superset/blob/732de4ac7fae88e29b7f123b6cbb2d7cd411b0e4/superset/config.py#L671) in favor of a new theming system based on AntD V5. Likely this will be in disrepair until settling over the 5.x lifecycle.
- [32432](https://github.com/apache/superset/pull/31260) Moves the List Roles FAB view to the frontend and requires `FAB_ADD_SECURITY_API` to be enabled in the configuration and `superset init` to be executed.
- [31590](https://github.com/apache/superset/pull/31590) Marks the beginning of intricate work around supporting dynamic Theming, and breaks support for [THEME_OVERRIDES](https://github.com/apache/superset/blob/732de4ac7fae88e29b7f123b6cbb2d7cd411b0e4/superset/config.py#L671) in favor of a new theming system based on AntD V5. Likely this will be in disrepair until settling over the 5.x lifecycle.
- [32432](https://github.com/apache/superset/pull/32432) Moves the List Roles FAB view to the frontend and requires `FAB_ADD_SECURITY_API` to be enabled in the configuration and `superset init` to be executed.
- [34319](https://github.com/apache/superset/pull/34319) Drill to Detail and Drill By is now supported in Embedded mode, and also with the `DASHBOARD_RBAC` FF. If you don't want to expose these features in Embedded / `DASHBOARD_RBAC`, make sure the roles used for Embedded / `DASHBOARD_RBAC`don't have the required permissions to perform D2D actions.
## 5.0.0
- [31976](https://github.com/apache/superset/pull/31976) Removed the `DISABLE_LEGACY_DATASOURCE_EDITOR` feature flag. The previous value of the feature flag was `True` and now the feature is permanently removed.
- [31959](https://github.com/apache/superset/pull/32000) Removes CSV_UPLOAD_MAX_SIZE config, use your web server to control file upload size.
- [32000](https://github.com/apache/superset/pull/32000) Removes CSV_UPLOAD_MAX_SIZE config, use your web server to control file upload size.
- [31959](https://github.com/apache/superset/pull/31959) Removes the following endpoints from data uploads: `/api/v1/database/<id>/<file type>_upload` and `/api/v1/database/<file type>_metadata`, in favour of new one (Details on the PR). And simplifies permissions.
- [31844](https://github.com/apache/superset/pull/31844) The `ALERT_REPORTS_EXECUTE_AS` and `THUMBNAILS_EXECUTE_AS` config parameters have been renamed to `ALERT_REPORTS_EXECUTORS` and `THUMBNAILS_EXECUTORS` respectively. A new config flag `CACHE_WARMUP_EXECUTORS` has also been introduced to be able to control which user is used to execute cache warmup tasks. Finally, the config flag `THUMBNAILS_SELENIUM_USER` has been removed. To use a fixed executor for async tasks, use the new `FixedExecutor` class. See the config and docs for more info on setting up different executor profiles.
- [31894](https://github.com/apache/superset/pull/31894) Domain sharding is deprecated in favor of HTTP2. The `SUPERSET_WEBSERVER_DOMAINS` configuration will be removed in the next major version (6.0)
@@ -310,7 +343,7 @@ Note: Pillow is now a required dependency (previously optional) to support image
- [31774](https://github.com/apache/superset/pull/31774): Fixes the spelling of the `USE-ANALAGOUS-COLORS` feature flag. Please update any scripts/configuration item to use the new/corrected `USE-ANALOGOUS-COLORS` flag spelling.
- [31582](https://github.com/apache/superset/pull/31582) Removed the legacy Area, Bar, Event Flow, Heatmap, Histogram, Line, Sankey, and Sankey Loop charts. They were all automatically migrated to their ECharts counterparts with the exception of the Event Flow and Sankey Loop charts which were removed as they were not actively maintained and not widely used. If you were using the Event Flow or Sankey Loop charts, you will need to find an alternative solution.
- [31198](https://github.com/apache/superset/pull/31198) Disallows by default the use of the following ClickHouse functions: "version", "currentDatabase", "hostName".
- [29798](https://github.com/apache/superset/pull/29798) Since 3.1.0, the intial schedule for an alert or report was mistakenly offset by the specified timezone's relation to UTC. The initial schedule should now begin at the correct time.
- [29798](https://github.com/apache/superset/pull/29798) Since 3.1.0, the initial schedule for an alert or report was mistakenly offset by the specified timezone's relation to UTC. The initial schedule should now begin at the correct time.
- [30021](https://github.com/apache/superset/pull/30021) The `dev` layer in our Dockerfile no long includes firefox binaries, only Chromium to reduce bloat/docker-build-time.
- [30099](https://github.com/apache/superset/pull/30099) Translations are no longer included in the default docker image builds. If your environment requires translations, you'll want to set the docker build arg `BUILD_TRANSLATIONS=true`.
- [31262](https://github.com/apache/superset/pull/31262) NOTE: deprecated `pylint` in favor of `ruff` as our only python linter. Only affect development workflows positively (not the release itself). It should cover most important rules, be much faster, but some things linting rules that were enforced before may not be enforce in the exact same way as before.
@@ -323,7 +356,7 @@ Note: Pillow is now a required dependency (previously optional) to support image
- [25166](https://github.com/apache/superset/pull/25166) Changed the default configuration of `UPLOAD_FOLDER` from `/app/static/uploads/` to `/static/uploads/`. It also removed the unused `IMG_UPLOAD_FOLDER` and `IMG_UPLOAD_URL` configuration options.
- [30284](https://github.com/apache/superset/pull/30284) Deprecated GLOBAL_ASYNC_QUERIES_REDIS_CONFIG in favor of the new GLOBAL_ASYNC_QUERIES_CACHE_BACKEND configuration. To leverage Redis Sentinel, set CACHE_TYPE to RedisSentinelCache, or use RedisCache for standalone Redis
- [31961](https://github.com/apache/superset/pull/31961) Upgraded React from version 16.13.1 to 17.0.2. If you are using custom frontend extensions or plugins, you may need to update them to be compatible with React 17.
- [31260](https://github.com/apache/superset/pull/31260) Docker images now use `uv pip install` instead of `pip install` to manage the python envrionment. Most docker-based deployments will be affected, whether you derive one of the published images, or have custom bootstrap script that install python libraries (drivers)
- [31260](https://github.com/apache/superset/pull/31260) Docker images now use `uv pip install` instead of `pip install` to manage the python environment. Most docker-based deployments will be affected, whether you derive one of the published images, or have custom bootstrap script that install python libraries (drivers)
### Potential Downtime
@@ -400,7 +433,7 @@ Note: Pillow is now a required dependency (previously optional) to support image
- [26462](https://github.com/apache/superset/issues/26462): Removes the Profile feature given that it's not actively maintained and not widely used.
- [26377](https://github.com/apache/superset/pull/26377): Removes the deprecated Redirect API that supported short URLs used before the permalink feature.
- [26329](https://github.com/apache/superset/issues/26329): Removes the deprecated `DASHBOARD_NATIVE_FILTERS` feature flag. The previous value of the feature flag was `True` and now the feature is permanently enabled.
- [25510](https://github.com/apache/superset/pull/25510): Reenforces that any newly defined Python data format (other than epoch) must adhere to the ISO 8601 standard (enforced by way of validation at the API and database level) after a previous relaxation to include slashes in addition to dashes. From now on when specifying new columns, dataset owners will need to use a SQL expression instead to convert their string columns of the form %Y/%m/%d etc. to a `DATE`, `DATETIME`, etc. type.
- [25510](https://github.com/apache/superset/pull/25510): Reinforces that any newly defined Python data format (other than epoch) must adhere to the ISO 8601 standard (enforced by way of validation at the API and database level) after a previous relaxation to include slashes in addition to dashes. From now on when specifying new columns, dataset owners will need to use a SQL expression instead to convert their string columns of the form %Y/%m/%d etc. to a `DATE`, `DATETIME`, etc. type.
- [26372](https://github.com/apache/superset/issues/26372): Removes the deprecated `GENERIC_CHART_AXES` feature flag. The previous value of the feature flag was `True` and now the feature is permanently enabled.
WEBDRIVER_BASEURL=f"http://superset_app{os.environ.get('SUPERSET_APP_ROOT','/')}/"# When using docker compose baseurl should be http://superset_nginx{ENV{BASEPATH}}/ # noqa: E501
Each section maintains its own version history and can be versioned independently.
@@ -36,23 +37,45 @@ Each section maintains its own version history and can be versioned independentl
To create a new version for any section, use the Docusaurus version command with the appropriate plugin ID or use our automated scripts:
#### Before You Cut
The cut snapshots whatever's on disk into a frozen historical version, including auto-generated content (database pages from `superset/db_engine_specs/`, API reference from `static/resources/openapi.json`, component pages from Storybook stories). The cut script refreshes these via `generate:smart` before snapshotting, but the **`databases.json` diagnostics file** needs special care to capture full detail:
1.**Canonical release cut**: download the `database-diagnostics` artifact from a green `Python-Integration` run on master, place it at `docs/src/data/databases.json`, then run the cut script with `--skip-generate` to preserve it. This is what the production deploy uses and includes full Flask-context diagnostics (driver versions, feature support matrix, etc.).
2.**Local dev cut**: just run the script normally. `generate:smart` will regenerate `databases.json` using your local Flask environment — accurate to whatever drivers/extras you have installed, but typically less complete than the CI artifact.
3.**No Flask available**: also fine — the database generator falls back to AST parsing of engine spec files. The MDX pages are still correct; only the diagnostics JSON is leaner.
Also: confirm `master` CI is green, and that your local checkout matches the SHA you intend to cut from.
#### Using Automated Scripts (Required)
**⚠️ Important:** Always use these custom commands instead of the native Docusaurus commands. These scripts ensure that both the Docusaurus versioning system AND the `versions-config.json` file are updated correctly.
**⚠️ Important:** Always use these custom commands instead of the native Docusaurus commands. These scripts ensure that both the Docusaurus versioning system AND the `versions-config.json` file are updated correctly, AND that auto-generated content is refreshed before snapshotting.
```bash
# Main Documentation
yarn version:add:docs 1.2.0
yarn version:add:user_docs 1.2.0
# Developer Portal
yarn version:add:developer_portal 1.2.0
# Admin Docs
yarn version:add:admin_docs 1.2.0
# Component Playground (when enabled)
# Developer Docs
yarn version:add:developer_docs 1.2.0
# Component Playground
yarn version:add:components 1.2.0
```
What the script does:
1. Refreshes auto-generated content via `generate:smart` (database pages, API reference, component pages).
2. Calls `yarn docusaurus docs:version` (or the per-section equivalent) to snapshot the section.
3. Freezes any data-file imports (`@site/static/*.json`, `../../data/*.json`) into a snapshot-local `_versioned_data/` dir so the historical version doesn't silently mutate when the source files change.
4. Adjusts relative import paths (`../../src/...` → `../../../src/...`) for files now one directory deeper.
5. Updates `versions-config.json` and `<section>_versions.json`.
**Do NOT use** the native Docusaurus commands directly (`yarn docusaurus docs:version`), as they will:
- ❌ Create version files but NOT update `versions-config.json`
- ❌ Skip auto-gen refresh, freezing whatever was on disk
- ❌ Skip data-import freezing, leaving the snapshot pointed at live data
- ❌ Cause versions to not appear in dropdown menus
- ❌ Require manual fixes to synchronize the configuration
@@ -75,7 +98,7 @@ If creating versions manually, you'll need to:
Users can configure automated alerts and reports to send dashboards or charts to an email recipient or Slack channel.
- *Alerts* are sent when a SQL condition is reached
- *Reports* are sent on a schedule
Alerts and reports are disabled by default. To turn them on, you'll need to change configuration settings and install a suitable headless browser in your environment.
## Requirements
### Commons
#### In your `superset_config.py` or `superset_config_docker.py`
- `"ALERT_REPORTS"` [feature flag](/admin-docs/configuration/configuring-superset#feature-flags) must be turned to True.
- `beat_schedule` in CeleryConfig must contain schedule for `reports.scheduler`.
- At least one of those must be configured, depending on what you want to use:
- emails: `SMTP_*` settings
- Slack messages: `SLACK_API_TOKEN`
- Users can customize the email subject by including date code placeholders, which will automatically be replaced with the corresponding UTC date when the email is sent. To enable this functionality, activate the `"DATE_FORMAT_IN_EMAIL_SUBJECT"` [feature flag](/admin-docs/configuration/configuring-superset#feature-flags). This enables date formatting in email subjects, preventing all reporting emails from being grouped into the same thread (optional for the reporting feature).
- Use date codes from [strftime.org](https://strftime.org/) to create the email subject.
- If no date code is provided, the original string will be used as the email subject.
##### Disable dry-run mode
Screenshots will be taken but no messages actually sent as long as `ALERT_REPORTS_NOTIFICATION_DRY_RUN = True`, its default value in `docker/pythonpath_dev/superset_config.py`. To disable dry-run mode and start receiving email/Slack notifications, set `ALERT_REPORTS_NOTIFICATION_DRY_RUN` to `False` in [superset config](https://github.com/apache/superset/blob/master/docker/pythonpath_dev/superset_config.py).
#### In your `Dockerfile`
You'll need to extend the Superset image to include a headless browser. Your options include:
- Use Playwright with Chromium: this is the recommended approach as of version 4.1.x or greater. Playwright always uses Chromium — the `WEBDRIVER_TYPE` config setting has no effect when Playwright is active. A working example of a Dockerfile that installs these tools is provided under "Building your own production Docker image" on the [Docker Builds](/admin-docs/installation/docker-builds#building-your-own-production-docker-image) page. Enable the `PLAYWRIGHT_REPORTS_AND_THUMBNAILS` feature flag in your config to activate it.
- Use Firefox (Selenium): you'll need to install geckodriver and Firefox. Set `WEBDRIVER_TYPE` to `"firefox"` in your `superset_config.py`.
- Use Chrome (Selenium): you'll need to install Chrome. Set `WEBDRIVER_TYPE` to `"chrome"` in your `superset_config.py`.
In Superset versions <=4.0x, users installed Firefox or Chrome and that was documented here.
Only the worker container needs the browser.
### Slack integration
To send alerts and reports to Slack channels, you need to create a new Slack Application on your workspace.
1. Connect to your Slack workspace, then head to [https://api.slack.com/apps].
2. Create a new app.
3. Go to "OAuth & Permissions" section, and give the following scopes to your app:
- `incoming-webhook`
- `files:write`
- `chat:write`
- `channels:read`
- `groups:read`
4. At the top of the "OAuth and Permissions" section, click "install to workspace".
5. Select a default channel for your app and continue.
(You can post to any channel by inviting your Superset app into that channel).
6. The app should now be installed in your workspace, and a "Bot User OAuth Access Token" should have been created. Copy that token in the `SLACK_API_TOKEN` variable of your `superset_config.py`.
7. Ensure the feature flag `ALERT_REPORT_SLACK_V2` is set to True in `superset_config.py`
8. Restart the service (or run `superset init`) to pull in the new configuration.
Note: when you configure an alert or a report, the Slack channel list takes channel names without the leading '#' e.g. use `alerts` instead of `#alerts`.
#### Large Slack Workspaces (10k+ channels)
For workspaces with many channels, fetching the complete channel list can take several minutes and may encounter Slack API rate limits. Add the following to your `superset_config.py`:
When a report includes file attachments (CSV, PDF, or PNG screenshots), the request is sent as `multipart/form-data` instead. In that case, each top-level payload field (`name`, `text`, `description`, `url`) becomes its own form field, and nested structures like `header` are serialized as a JSON-encoded string in their own field. Every attachment is added as a repeated form field named `files`:
Webhook consumers should branch on `Content-Type`: parse the body as JSON when `application/json`, or read the individual form fields (decoding `header` as JSON) when `multipart/form-data`.
#### HTTPS Enforcement
To require HTTPS webhook URLs (recommended for production), set:
```python
ALERT_REPORTS_WEBHOOK_HTTPS_ONLY = True
```
When enabled, Superset rejects webhook configurations that use `http://` URLs.
#### Retry Behavior
Superset automatically retries webhook deliveries on `429 Too Many Requests` and `5xx` server errors using exponential backoff.
### Kubernetes-specific
- You must have a `celery beat` pod running. If you're using the chart included in the GitHub repository under [helm/superset](https://github.com/apache/superset/tree/master/helm/superset), you need to put `supersetCeleryBeat.enabled = true` in your values override.
- You can see the dedicated docs about [Kubernetes installation](/admin-docs/installation/kubernetes) for more details.
### Docker Compose specific
#### You must have in your `docker-compose.yml`
- A Redis message broker
- PostgreSQL DB instead of SQLlite
- One or more `celery worker`
- A single `celery beat`
This process also works in a Docker swarm environment, you would just need to add `Deploy:` to the Superset, Redis and Postgres services along with your specific configs for your swarm.
### Detailed config
The following configurations need to be added to the `superset_config.py` file. This file is loaded when the image runs, and any configurations in it will override the default configurations found in the `config.py`.
You can find documentation about each field in the default `config.py` in the GitHub repository under [superset/config.py](https://github.com/apache/superset/blob/master/superset/config.py).
You need to replace default values with your custom Redis, Slack and/or SMTP config.
Superset uses Celery beat and Celery worker(s) to send alerts and reports.
- The beat is the scheduler that tells the worker when to perform its tasks. This schedule is defined when you create the alert or report.
- The worker will process the tasks that need to be performed when an alert or report is fired.
In the `CeleryConfig`, only the `beat_schedule` is relevant to this feature, the rest of the `CeleryConfig` can be changed for your needs.
Alternatively, you can assign a function to `ALERT_MINIMUM_INTERVAL` and/or `REPORT_MINIMUM_INTERVAL`. This is useful to dynamically retrieve a value as needed:
For security, Superset rewrites external links in alert/report email HTML so
they go through a warning page before the user is navigated to the external
site. Internal links (matching your configured base URL) are not affected.
```python
# Disable external link redirection entirely (default: True)
ALERT_REPORTS_ENABLE_LINK_REDIRECT = False
```
The feature uses `WEBDRIVER_BASEURL_USER_FRIENDLY` (or `WEBDRIVER_BASEURL`)
to determine which hosts are internal.
## Troubleshooting
There are many reasons that reports might not be working. Try these steps to check for specific issues.
### Confirm feature flag is enabled and you have sufficient permissions
If you don't see "Alerts & Reports" under the *Manage* section of the Settings dropdown in the Superset UI, you need to enable the `ALERT_REPORTS` feature flag (see above). Enable another feature flag and check to see that it took effect, to verify that your config file is getting loaded.
Log in as an admin user to ensure you have adequate permissions.
### Check the logs of your Celery worker
This is the best source of information about the problem. In a docker compose deployment, you can do this with a command like `docker logs superset_worker --since 1h`.
### Check web browser and webdriver installation
To take a screenshot, the worker visits the dashboard or chart using a headless browser, then takes a screenshot. If you are able to send a chart as CSV or text but can't send as PNG, your problem may lie with the browser.
If you are handling the installation of the headless browser on your own, do your own verification to ensure that the headless browser opens successfully in the worker environment.
### Send a test email
One symptom of an invalid connection to an email server is receiving an error of `[Errno 110] Connection timed out` in your logs when the report tries to send.
Confirm via testing that your outbound email configuration is correct. Here is the simplest test, for an un-authenticated email SMTP email service running on port 25. If you are sending over SSL, for instance, study how [Superset's codebase sends emails](https://github.com/apache/superset/blob/master/superset/utils/core.py#L818) and then test with those commands and arguments.
Start Python in your worker environment, replace all example values, and run:
- Some cloud hosts disable outgoing unauthenticated SMTP email to prevent spam. For instance, [Azure blocks port 25 by default on some machines](https://learn.microsoft.com/en-us/azure/virtual-network/troubleshoot-outbound-smtp-connectivity). Enable that port or use another sending method.
- Use another set of SMTP credentials that you verify works in this setup.
### Browse to your report from the worker
The worker may be unable to reach the report. It will use the value of `WEBDRIVER_BASEURL` to browse to the report. If that route is invalid, or presents an authentication challenge that the worker can't pass, the report screenshot will fail.
Check this by attempting to `curl` the URL of a report that you see in the error logs of your worker. For instance, from the worker environment, run `curl http://superset_app:8088/superset/dashboard/1/`. You may get different responses depending on whether the dashboard exists - for example, you may need to change the `1` in that URL. If there's a URL in your logs from a failed report screenshot, that's a good place to start. The goal is to determine a valid value for `WEBDRIVER_BASEURL` and determine if an issue like HTTPS or authentication is redirecting your worker.
In a deployment with authentication measures enabled like HTTPS and Single Sign-On, it may make sense to have the worker navigate directly to the Superset application running in the same location, avoiding the need to sign in. For instance, you could use `WEBDRIVER_BASEURL="http://superset_app:8088"` for a docker compose deployment, and set `"force_https": False,` in your `TALISMAN_CONFIG`.
### Duplicate report deliveries
In some deployment configurations a scheduled report can be delivered more than once around its planned time. This typically happens when more than one process is responsible for running the alerts & reports schedule (for example, multiple schedulers or Celery beat instances). To avoid duplicate emails or notifications:
- Ensure that only a **single scheduler/beat process** is configured to trigger alerts and reports for a given environment.
- If you run **multiple Celery workers**, verify that there is still only one component responsible for scheduling the report tasks (workers should execute tasks, not schedule them independently).
- Review your deployment/orchestration setup (for example systemd, Docker, or Kubernetes) to make sure the alerts & reports scheduler is **not started from multiple places by accident**.
## Scheduling Queries as Reports
You can optionally allow your users to schedule queries directly in SQL Lab. This is done by adding
extra metadata to saved queries, which are then picked up by an external scheduled (like
[Apache Airflow](https://airflow.apache.org/)).
To allow scheduled queries, add the following to `SCHEDULED_QUERIES` in your configuration file:
```python
SCHEDULED_QUERIES = {
# This information is collected when the user clicks "Schedule query",
# and saved into the `extra` field of saved queries.
[react-jsonschema-form](https://github.com/mozilla-services/react-jsonschema-form) and will add a
menu item called “Schedule” to SQL Lab. When the menu item is clicked, a modal will show up where
the user can add the metadata required for scheduling the query.
This information can then be retrieved from the endpoint `/api/v1/saved_query/` and used to
schedule the queries that have `schedule_info` in their JSON metadata. For schedulers other than
Airflow, additional fields can be easily added to the configuration file above.
:::resources
- [Tutorial: Automated Alerts and Reporting via Slack/Email in Superset](https://dev.to/ngtduc693/apache-superset-topic-5-automated-alerts-and-reporting-via-slackemail-in-superset-2gbe)
- [Blog: Integrating Slack alerts and Apache Superset for better data observability](https://medium.com/affinityanswers-tech/integrating-slack-alerts-and-apache-superset-for-better-data-observability-fd2f9a12c350)
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
*/}
---
title: AWS IAM Authentication
sidebar_label: AWS IAM Authentication
sidebar_position: 15
---
# AWS IAM Authentication for AWS Databases
Superset supports IAM-based authentication for **Amazon Aurora** (PostgreSQL and MySQL) and **Amazon Redshift**. IAM auth eliminates the need for database passwords — Superset generates a short-lived auth token using temporary AWS credentials instead.
Cross-account IAM role assumption via STS `AssumeRole` is supported, allowing a Superset deployment in one AWS account to connect to databases in a different account.
## Prerequisites
- Enable the `AWS_DATABASE_IAM_AUTH` feature flag in `superset_config.py`. IAM authentication is gated behind this flag; if it is disabled, connections using `aws_iam` fail with *"AWS IAM database authentication is not enabled."*
```python
FEATURE_FLAGS = {
"AWS_DATABASE_IAM_AUTH": True,
}
```
- `boto3` must be installed in your Superset environment:
```bash
pip install boto3
```
- The Superset server's IAM role (or static credentials) must have permission to call `sts:AssumeRole` (for cross-account) or the same-account permissions for the target service:
- **Redshift Serverless**: `redshift-serverless:GetCredentials` and `redshift-serverless:GetWorkgroup`
- SSL must be enabled on the Aurora / Redshift endpoint (required for IAM token auth).
## Configuration
IAM authentication is configured via the **encrypted_extra** field of the database connection. Access this field in the **Advanced** → **Security** section of the database connection form, under **Secure Extra**.
**3. Configure the database connection in Superset** using the `role_arn` and `external_id` from the trust policy (as shown in the configuration example above).
## Credential Caching
STS credentials are cached in memory keyed by `(role_arn, region, external_id)` with a 10-minute TTL. This reduces the number of STS API calls when multiple queries are executed with the same connection. Tokens are refreshed automatically before expiry.
@@ -84,11 +84,11 @@ Caching for SQL Lab query results is used when async queries are enabled and is
Note that this configuration does not use a flask-caching dictionary for its configuration, but
instead requires a cachelib object.
See [Async Queries via Celery](/docs/configuration/async-queries-celery) for details.
See [Async Queries via Celery](/admin-docs/configuration/async-queries-celery) for details.
## Caching Thumbnails
This is an optional feature that can be turned on by activating its [feature flag](/docs/configuration/configuring-superset#feature-flags) on config:
This is an optional feature that can be turned on by activating its [feature flag](/admin-docs/configuration/configuring-superset#feature-flags) on config:
Additional selenium web drive configuration can be set using `WEBDRIVER_CONFIGURATION`. You can
implement a custom function to authenticate selenium. The default function uses the `flask-login`
To control which user account is used for rendering thumbnails and warming up caches, configure
`THUMBNAIL_EXECUTORS` and `CACHE_WARMUP_EXECUTORS`. Each accepts a list of executor types (which
resolve to an owner, creator, modifier, or the currently-logged-in user) and/or a `FixedExecutor`
pinned to a specific username. By default, thumbnails render as the current user
(`ExecutorType.CURRENT_USER`) and cache warmup runs as the chart/dashboard owner
(`ExecutorType.OWNER`).
To force both to run as a dedicated service account (`admin` in this example):
```python
from superset.tasks.types import ExecutorType, FixedExecutor
THUMBNAIL_EXECUTORS = [FixedExecutor("admin")]
CACHE_WARMUP_EXECUTORS = [FixedExecutor("admin")]
```
Use a dedicated read-only service account here rather than a personal admin account, so that
thumbnail rendering and cache warmup tasks don't fail if a specific user's credentials change.
Additional Selenium WebDriver configuration can be set using `WEBDRIVER_CONFIGURATION`. You can
implement a custom function to authenticate Selenium. The default function uses the `flask-login`
session cookie. Here's an example of a custom function signature:
```python
@@ -159,9 +178,23 @@ Then on configuration:
WEBDRIVER_AUTH_FUNC = auth_driver
```
## Signal Cache Backend
## ETag Support for Thumbnails
Superset supports an optional signal cache (`SIGNAL_CACHE_CONFIG`) for
Thumbnail and screenshot endpoints return `ETag` response headers based on the cached content digest. Clients can use conditional requests to avoid downloading unchanged images:
```
GET /api/v1/chart/42/thumbnail/
If-None-Match: "abc123..."
→ 304 Not Modified (if unchanged)
→ 200 OK (with new image if changed)
```
This is particularly useful for embedded dashboards and external integrations that periodically poll for updated screenshots — unchanged thumbnails return immediately with no payload.
## Distributed Coordination Backend
Superset supports an optional distributed coordination (`DISTRIBUTED_COORDINATION_CONFIG`) for
high-performance distributed operations. This configuration enables:
- **Distributed locking**: Moves lock operations from the metadata database to Redis, improving
@@ -176,11 +209,11 @@ that are not available in general Flask-Caching backends.
### Configuration
The signal cache uses Flask-Caching style configuration for consistency with other cache
backends. Configure `SIGNAL_CACHE_CONFIG` in `superset_config.py`:
The distributed coordination uses Flask-Caching style configuration for consistency with other cache
backends. Configure `DISTRIBUTED_COORDINATION_CONFIG` in `superset_config.py`:
For public OAuth2 clients that cannot securely store a client secret, enable Proof Key for Code Exchange (PKCE) by adding `code_challenge_method` to the `remote_app` configuration:
```python
OAUTH_PROVIDERS = [
{
'name': 'myProvider',
'remote_app': {
'client_id': 'myClientId',
'client_secret': 'mySecret', # may be empty for pure public clients
PKCE (`S256`) is recommended for all OAuth2 flows, even when a client secret is present, as it protects against authorization code interception attacks.
## LDAP Authentication
FAB supports authenticating user credentials against an LDAP server.
@@ -441,7 +470,39 @@ FEATURE_FLAGS = {
}
```
A current list of feature flags can be found in the [Feature Flags](/docs/configuration/feature-flags) documentation.
A current list of feature flags can be found in the [Feature Flags](/admin-docs/configuration/feature-flags) documentation.
## Security Configuration
### HASH_ALGORITHM
Controls the hashing algorithm used for internal checksums and cache keys (thumbnails, cache keys, etc.). The default is `sha256`, which satisfies environments with stricter compliance requirements (e.g., FedRAMP). Set it to `md5` to retain the legacy behavior from older Superset deployments:
```python
HASH_ALGORITHM = "sha256" # default; set to "md5" for legacy behavior
```
A companion `HASH_ALGORITHM_FALLBACKS` list (default: `["md5"]`) lets UUID lookups fall back to older algorithms, which enables gradual migration without breaking existing entries. Set it to `[]` for strict mode (use only `HASH_ALGORITHM`).
:::note
This setting affects internal Superset operations only, not user passwords or authentication tokens. Changing it in an existing deployment may invalidate cached values but does not require a database migration.
:::
## SQL Lab Query History Pruning
SQL Lab query history is stored in the metadata database and is **not** pruned by default. To trim older rows, enable the `prune_query` Celery beat task by uncommenting it in `CELERY_BEAT_SCHEDULE` and choosing a retention window:
Adjust `retention_period_days` to control how long query rows are kept. Companion opt-in tasks (`prune_logs`, `prune_tasks`) exist for pruning the logs and tasks tables; see the commented-out examples in `superset/config.py`. Without enabling these tasks, the metadata database will grow unbounded over time.
:::resources
- [Blog: Feature Flags in Apache Superset](https://preset.io/blog/feature-flags-in-apache-superset-and-preset/)
The superset cli allows you to import and export datasources from and to YAML. Datasources include
databases. The data is expected to be organized in the following hierarchy:
:::info
Superset's ZIP-based import/export also covers **dashboards**, **charts**, and **saved queries**, exercised through the UI and REST API. The [Dashboard Import Overwrite Behavior](#dashboard-import-overwrite-behavior) and [UUIDs in API Responses](#uuids-in-api-responses) sections below document the behavior shared across all asset types.
:::
```text
├──databases
| ├──database_1
@@ -26,6 +30,10 @@ databases. The data is expected to be organized in the following hierarchy:
| └── ... (more databases)
```
:::note
When you export a database connection, the `masked_encrypted_extra` field (used for sensitive connection parameters such as service account JSON, OAuth tokens, and other encrypted credentials) is included in the export. When importing on another instance, these values are decrypted and re-encrypted using the destination instance's `SECRET_KEY`. Ensure the receiving instance has a valid `SECRET_KEY` configured before importing.
:::
## Exporting Datasources to YAML
You can print your current datasources to stdout by running:
@@ -75,6 +83,29 @@ The optional username flag **-u** sets the user used for the datasource import.
When importing a dashboard ZIP with the **overwrite** option enabled, any existing charts that are part of the dashboard are **replaced** rather than duplicated. This applies to:
- Charts whose UUID matches a chart already present in the target instance
- The full chart configuration (query, visualization type, columns, metrics) is replaced by the imported version
If you import without the overwrite flag, existing charts with conflicting UUIDs are left unchanged and the import skips those objects. Use overwrite when you want to push a fully updated dashboard (including chart definitions) from a development or staging environment to production.
## UUIDs in API Responses
The REST API POST endpoints for **datasets**, **charts**, and **dashboards** include the auto-generated `uuid` field in the response body:
```json
{
"id": 42,
"uuid": "b8a8d5c3-1234-4abc-8def-0123456789ab",
...
}
```
UUIDs remain stable across import/export cycles and can be used for cross-environment workflows — for example, recording a UUID when creating a chart in development and using it to identify the matching chart after importing into production.
## Legacy Importing Datasources
### From older versions of Superset to current version
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# MCP Server Deployment & Authentication
Superset includes a built-in [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) server that lets AI assistants -- Claude, ChatGPT, and other MCP-compatible clients -- interact with your Superset instance. Through MCP, clients can list dashboards, query datasets, execute SQL, create charts, and more.
This guide covers how to run, secure, and deploy the MCP server.
:::tip Looking for user docs?
See **[Using AI with Superset](/user-docs/using-superset/using-ai-with-superset)** for a guide on what AI can do with Superset and how to connect your AI client.
Both containers share the same `superset_config.py`, so authentication settings, database connections, and feature flags stay in sync.
### Multi-Pod (Kubernetes)
For high-availability deployments, configure Redis so that replicas share session state:
```mermaid
flowchart TD
LB["Load Balancer"] --> M1["MCP Pod 1"]
LB --> M2["MCP Pod 2"]
LB --> M3["MCP Pod 3"]
M1 --> R[("Redis<br/>(session store)")]
M2 --> R
M3 --> R
M1 --> DB[("Postgres")]
M2 --> DB
M3 --> DB
```
**superset_config.py:**
```python
MCP_STORE_CONFIG = {
"enabled": True,
"CACHE_REDIS_URL": "redis://redis-host:6379/0",
"event_store_max_events": 100,
"event_store_ttl": 3600,
}
```
When `CACHE_REDIS_URL` is set, the MCP server uses a Redis-backed EventStore for session management, allowing replicas to share state. Without Redis, each pod manages its own in-memory sessions and stateful MCP interactions may fail when requests hit different replicas.
---
## Configuration Reference
All MCP settings go in `superset_config.py`. Defaults are defined in `superset/mcp_service/mcp_config.py`.
### Core
| Setting | Default | Description |
|---------|---------|-------------|
| `MCP_SERVICE_HOST` | `"localhost"` | Host the MCP server binds to |
| `MCP_SERVICE_PORT` | `5008` | Port the MCP server binds to |
| `MCP_SERVICE_URL` | `None` | Public base URL for MCP-generated links (set this when behind a reverse proxy) |
| `MCP_DEBUG` | `False` | Enable debug logging |
| `MCP_DEV_USERNAME` | -- | Superset username for development mode (no auth) |
| `MCP_RBAC_ENABLED` | `True` | Enforce Superset's role-based access control on MCP tool calls. When `True`, each tool checks that the authenticated user has the required FAB permission before executing. Disable only for testing or trusted-network deployments. |
| `MCP_DISABLED_TOOLS` | `set()` | Set of tool names to remove from the MCP server at startup. Disabled tools are never advertised to AI clients during tool discovery. Useful when a custom extension tool should replace a built-in Superset tool. See [Disabling built-in tools](#disabling-built-in-tools). |
| `MCP_USER_RESOLVER` | `None` | Custom function `(app, access_token) -> username` to extract a Superset username from a validated JWT token. When `None`, the default resolver checks `preferred_username`, `username`, `email`, and `sub` claims in that order. |
### Response Size Guard
Limits response sizes to prevent exceeding LLM context windows:
| `event_store_max_events` | `100` | Maximum events retained per session |
| `event_store_ttl` | `3600` | Event TTL in seconds |
### Tool Search
By default the MCP server exposes a lightweight tool-search interface instead of advertising every tool at once. This reduces the initial context sent to the LLM by ~70%, which lowers cost and latency. The AI client discovers tools on demand by calling `search_tools` and then invokes them via `call_tool`.
```python
MCP_TOOL_SEARCH_CONFIG = {
"enabled": True,
"strategy": "bm25", # "bm25" (natural language) or "regex"
| `max_results` | `5` | Maximum tools returned per search query |
| `always_visible` | See above | Tools that always appear in `list_tools`, regardless of search |
| `include_schemas` | `False` | When `False` (default, "summary mode"), search results omit `inputSchema` entirely and include a lightweight `parameters_hint` listing top-level parameter names. Set to `True` to include the full `inputSchema` in search results. Full schemas are always used when a tool is actually invoked via `call_tool`. |
| `compact_schemas` | `True` | Strip `$defs` / `$ref` and replace with `{"type": "object"}` in search results to reduce token cost. Only takes effect when `include_schemas=True` — ignored in summary mode. |
| `max_description_length` | `300` | Truncate tool descriptions in search results (0 = no truncation). Applies in both summary and full-schema modes. |
:::tip
Set `enabled: False` to revert to the traditional "show all tools at once" behavior, which some clients or workflows may prefer.
:::
Tool search reduces the initial token cost from ~15–20K tokens (full catalog) down to ~4–5K tokens (pinned tools + search interface) — roughly 85% savings at the start of each conversation.
### Session & CSRF
These values are flat-merged into the Flask app config used by the MCP server process:
```python
MCP_SESSION_CONFIG = {
"SESSION_COOKIE_HTTPONLY": True,
"SESSION_COOKIE_SECURE": False,
"SESSION_COOKIE_SAMESITE": "Lax",
"SESSION_COOKIE_NAME": "superset_session",
"PERMANENT_SESSION_LIFETIME": 86400,
}
MCP_CSRF_CONFIG = {
"WTF_CSRF_ENABLED": True,
"WTF_CSRF_TIME_LIMIT": None,
}
```
---
## Access Control
### RBAC Enforcement
The MCP server respects Superset's full role-based access control (RBAC). Every authenticated user can only access the data and operations their Superset roles permit — the same rules that apply in the Superset UI apply through MCP.
Each tool declares one or more required FAB permissions. The table below maps tool groups to their permission requirements:
| Tool group | Required FAB permission |
|------------|------------------------|
| `list_charts`, `get_chart_info`, `get_chart_data`, `get_chart_preview`, `generate_chart`, `update_chart` | `can_read` on `Chart` (read), `can_write` on `Chart` (mutate) |
| `list_dashboards`, `get_dashboard_info`, `generate_dashboard`, `add_chart_to_existing_dashboard` | `can_read` on `Dashboard` (read), `can_write` on `Dashboard` (mutate) |
| `list_datasets`, `get_dataset_info`, `create_virtual_dataset` | `can_read` on `Dataset` (read), `can_write` on `Dataset` (mutate) |
| `list_databases`, `get_database_info` | `can_read` on `Database` |
| `execute_sql` | `can_execute_sql_query` on `SQLLab` |
| `open_sql_lab_with_context` | `can_read` on `SQLLab` |
| `save_sql_query` | `can_write` on `SavedQuery` |
| `health_check` | None (public) |
To disable RBAC checking globally (for trusted-network deployments or testing), set:
```python
# superset_config.py
MCP_RBAC_ENABLED = False
```
:::warning
Disabling RBAC removes all permission checks from MCP tool calls. Only do this on isolated, internal deployments where all MCP users are trusted admins.
:::
### Audit Log
All MCP tool calls are recorded in Superset's action log. You can view them at **Settings → Action Log** (admin only). Each log entry records:
- The tool name (e.g., `mcp.generate_chart.db_write`)
- The authenticated user
- A timestamp
This makes MCP activity fully auditable alongside regular Superset activity. The action log uses the same event logger as the rest of Superset, so existing log ingestion pipelines (e.g., sending logs to Elasticsearch or a SIEM) capture MCP events automatically.
### Middleware Pipeline
Every MCP request passes through a middleware stack before reaching the tool function. The default stack (assembled in `build_middleware_list()` in `server.py`) is:
| Middleware | Purpose | Default |
|------------|---------|---------|
| `StructuredContentStripperMiddleware` | Strips `structuredContent` from responses for Claude.ai bridge compatibility | Enabled |
| `LoggingMiddleware` | Logs each tool call with user, parameters, and duration | Enabled |
| `GlobalErrorHandlerMiddleware` | Catches unhandled exceptions and sanitizes sensitive data before it reaches the client | Enabled |
| `ResponseSizeGuardMiddleware` | Estimates token count, warns at 80% of limit, blocks at limit | Enabled (configurable via `MCP_RESPONSE_SIZE_CONFIG`) |
| `ResponseCachingMiddleware` | Caches read-heavy tool responses (in-memory or Redis) | Disabled (enable via `MCP_CACHE_CONFIG`) |
Additional middleware classes (`RateLimitMiddleware`, `FieldPermissionsMiddleware`, `PrivateToolMiddleware`) are implemented in `superset/mcp_service/middleware.py` but are not added to the default pipeline. They are available for operators who want to layer them in via a custom startup path.
### Error Sanitization
The `GlobalErrorHandlerMiddleware` automatically redacts sensitive information from all error messages before they reach the LLM client. The following are replaced with generic messages:
- **Database connection strings** — replaced with a generic connection error message
- **API keys and tokens** — redacted from error traces
- **File system paths** — stripped to prevent information disclosure
- **IP addresses** — removed from error context
This ensures that a misconfigured database connection or an unexpected exception never leaks credentials or internal topology to the LLM or its users. All regex patterns used for redaction are bounded to prevent ReDoS attacks.
---
## Performance
### Connection Pooling
Each MCP server process maintains its own SQLAlchemy connection pool to the database. For multi-worker deployments, total open connections = **workers × pool size**.
```python
# superset_config.py
SQLALCHEMY_POOL_SIZE = 5
SQLALCHEMY_MAX_OVERFLOW = 10
SQLALCHEMY_POOL_TIMEOUT = 30
SQLALCHEMY_POOL_RECYCLE = 3600 # Recycle connections after 1 hour
```
For a 3-pod Kubernetes deployment with the defaults above, expect up to 3 × (5 + 10) = 45 connections. Size your database's `max_connections` accordingly.
### Response Caching
Enable response caching for read-heavy workloads (dashboards/datasets that don't change frequently). With the in-memory backend (default when `MCP_STORE_CONFIG` is disabled), caching is per-process. Use Redis-backed caching for consistent cache hits across multiple pods:
Mutating tools (`generate_chart`, `update_chart`, `execute_sql`, `generate_dashboard`) are always excluded from caching regardless of this setting.
---
## Troubleshooting
### Server won't start
- Verify `fastmcp` is installed: `pip install fastmcp`
- Check that `MCP_DEV_USERNAME` is set if auth is disabled -- the server requires a user identity
- Confirm the port is not already in use: `lsof -i :5008`
### 401 Unauthorized
- Verify your JWT token has not expired (`exp` claim)
- Check that `MCP_JWT_ISSUER` and `MCP_JWT_AUDIENCE` match the token's `iss` and `aud` claims exactly
- For RS256 with JWKS: confirm the JWKS URI is reachable from the MCP server
- For RS256 with static key: confirm the public key string includes the `BEGIN`/`END` markers
- For HS256: confirm the secret matches between the token issuer and `MCP_JWT_SECRET`
- Enable `MCP_JWT_DEBUG_ERRORS = True` for detailed server-side logging (errors are never leaked to the client)
### Tool not found
- Ensure the MCP server and Superset share the same `superset_config.py`
- Check server logs at startup -- tool registration errors are logged with the tool name and reason
### Client can't connect
- Verify the MCP server URL is reachable from the client machine
- For Claude Desktop: fully quit the app (not just close the window) and restart after config changes
- For remote access: ensure your firewall and reverse proxy allow traffic to the MCP port
- Confirm the URL path ends with `/mcp` (e.g., `http://localhost:5008/mcp`)
### Permission errors on tool calls
- The MCP server enforces Superset's RBAC permissions -- the authenticated user must have the required roles
- In development mode, ensure `MCP_DEV_USERNAME` maps to a user with appropriate roles (e.g., Admin)
- Check `superset/security/manager.py` for the specific permission tuples required by each tool domain (e.g., `("can_execute_sql_query", "SQLLab")`)
### Response too large
- If a tool call returns an error about exceeding token limits, the response size guard is blocking an oversized result
- Reduce `page_size` or `limit` parameters, use `select_columns` to exclude large fields, or add filters to narrow results
- To adjust the threshold, change `token_limit` in `MCP_RESPONSE_SIZE_CONFIG`
- To disable the guard entirely, set `MCP_RESPONSE_SIZE_CONFIG = {"enabled": False}`
---
## Audit Events
All MCP tool calls are logged to Superset's event logger, the same system used by the web UI (viewable at **Settings → Action Log**). Each event captures:
- **User**: the resolved Superset username from the JWT or dev config
- **Timestamp**: when the operation ran
This means MCP activity is auditable alongside normal user activity. No additional configuration is required — logging is on by default whenever the event logger is enabled in your Superset deployment.
## Tool Pagination
MCP list tools (`list_datasets`, `list_charts`, `list_dashboards`, `list_databases`) use **offset pagination** via `page` (1-based) and `page_size` parameters. Responses include `page`, `page_size`, `total_count`, `total_pages`, `has_previous`, and `has_next`. To iterate through all results:
```python
# Example: fetch all charts across pages
all_charts = []
page = 1
while True:
result = mcp.list_charts(page=page, page_size=50)
all_charts.extend(result["charts"])
if not result.get("has_next"):
break
page += 1
```
## Disabling built-in tools
If you have deployed a custom tool via a Superset extension that supersedes one of the built-in Superset tools, you can suppress the built-in version so AI clients only discover your replacement. Disabled tools are removed from the server at startup and are never advertised during tool discovery.
Set `MCP_DISABLED_TOOLS` in your `superset_config.py` to a set of tool names:
Tool names match the function name used in the `@tool` decorator (e.g., `execute_sql`, `list_charts`, `health_check`). Extension-prefixed tools can also be disabled using their full prefixed name:
Specifying a tool name that does not exist logs a warning at startup and is otherwise ignored — it will not prevent the server from starting.
:::
## Security Best Practices
- **Use TLS** for all production MCP endpoints -- place the server behind a reverse proxy with HTTPS
- **Enable JWT authentication** for any internet-facing deployment
- **RBAC enforcement** -- The MCP server respects Superset's role-based access control. Users can only access data their roles permit
- **Secrets management** -- Store `MCP_JWT_SECRET`, database credentials, and API keys in environment variables or a secrets manager, never in config files committed to version control
- **Scoped tokens** -- Use `MCP_REQUIRED_SCOPES` to limit what operations a token can perform
- **Network isolation** -- In Kubernetes, restrict MCP pod network policies to only allow traffic from your AI client endpoints
- Review the **[Security documentation](/developer-docs/extensions/security)** for additional extension security guidance
---
## Next Steps
- **[Using AI with Superset](/user-docs/using-superset/using-ai-with-superset)** -- What AI can do with Superset and how to get started
- **[MCP Integration](/developer-docs/extensions/mcp)** -- Build custom MCP tools and prompts via Superset extensions
- **[Security](/developer-docs/extensions/security)** -- Security best practices for extensions
- **[Deployment](/developer-docs/extensions/deployment)** -- Package and deploy Superset extensions
1. Set `PUBLIC_ROLE_LIKE = "Public"` in `superset_config.py`
2. Add the `'DASHBOARD_RBAC': True` [Feature Flag](/docs/configuration/feature-flags)
2. Add the `'DASHBOARD_RBAC': True` [Feature Flag](/admin-docs/configuration/feature-flags)
3. Edit each dashboard's properties and add the "Public" role
4. Only dashboards with the Public role explicitly assigned are visible to anonymous users
See the [Public role documentation](/docs/security/security#public) for more details.
See the [Public role documentation](/admin-docs/security/#public) for more details.
#### Embedding a Public Dashboard
@@ -96,6 +96,24 @@ To enable this entry, add the following line to the `.env` file:
SUPERSET_FEATURE_EMBEDDED_SUPERSET=true
```
### Hiding the Logout Button in Embedded Contexts
When Superset is embedded in an application that manages authentication via SSO (OAuth2, SAML, or JWT), the logout button should be hidden since session management is handled by the parent application.
To hide the logout button in embedded contexts, add to `superset_config.py`:
```python
FEATURE_FLAGS = {
"DISABLE_EMBEDDED_SUPERSET_LOGOUT": True,
}
```
This flag only hides the logout button when Superset detects it is running inside an iframe. Users accessing Superset directly (not embedded) will still see the logout button regardless of this setting.
:::note
When embedding with SSO, also set `SESSION_COOKIE_SAMESITE = 'None'` and `SESSION_COOKIE_SECURE = True`. See [Security documentation](/admin-docs/security/securing_superset) for details.
:::
## CSRF settings
Similarly, [flask-wtf](https://flask-wtf.readthedocs.io/en/0.15.x/config/) is used to manage
For a user-focused guide on writing Jinja templates in SQL Lab and virtual datasets, see the [SQL Templating User Guide](/user-docs/using-superset/sql-templating). This page covers administrator configuration options.
:::
## Jinja Templates
SQL Lab and Explore supports [Jinja templating](https://jinja.palletsprojects.com/en/2.11.x/) in queries.
To enable templating, the `ENABLE_TEMPLATE_PROCESSING` [feature flag](/docs/configuration/configuring-superset#feature-flags) needs to be enabled in `superset_config.py`.
To enable templating, the `ENABLE_TEMPLATE_PROCESSING` [feature flag](/admin-docs/configuration/configuring-superset#feature-flags) needs to be enabled in `superset_config.py`.
:::warning[Security Warning]
@@ -18,6 +22,15 @@ While powerful, this feature executes template code on the server. Within the Su
If you grant these permissions to untrusted users, this feature can be exploited as a **Server-Side Template Injection (SSTI)** vulnerability. Do not enable `ENABLE_TEMPLATE_PROCESSING` unless you fully understand and accept the associated security risks.
Additionally:
- The `url_param()` macro allows URL parameters to influence the rendered SQL. Always validate or restrict `url_param()` values in your templates rather than interpolating them directly.
- `filter.get('val')` returns raw filter values without escaping. Use the safe helpers described below (`|where_in`, `| replace("'", "''")`) rather than concatenating values directly into SQL strings.
:::
:::tip
`ENABLE_TEMPLATE_PROCESSING` defaults to `False`. Only enable it if your deployment requires Jinja templates and all users with dataset/chart edit access are administrators or fully trusted internal users.
:::
When templating is enabled, python code can be embedded in virtual datasets and
@@ -320,6 +333,16 @@ cache hit in the future and Superset can retrieve cached data.
The `{{ url_param('custom_variable') }}` macro lets you define arbitrary URL
parameters and reference them in your SQL code.
:::warning
Always treat `url_param()` values as untrusted input. Escaping behaviour varies by context and configuration, so do not rely on it. Restrict values to an explicit allowlist before using them in SQL:
```sql
{% set cc = url_param('countrycode') %}
{% if cc not in ('US', 'ES', 'FR') %}{% set cc = 'US' %}{% endif %}
WHERE country_code = '{{ cc }}'
```
:::
Here's a concrete example:
- You write the following query in SQL Lab:
@@ -394,6 +417,16 @@ This is useful if:
- You want to handle generating custom SQL conditions for a filter
- You want to have the ability to filter inside the main query for speed purposes
:::warning
`filter.get('val')` returns the raw filter value without escaping. For multi-value filters, use the `|where_in` Jinja filter, which handles quoting safely. For single-value operators like `LIKE`, escape single quotes before interpolating:
```sql
{%- if filter.get('op') == 'LIKE' -%}
AND full_name LIKE '{{ filter.get('val') | replace("'", "''") }}'
{%- endif -%}
```
:::
Here's a concrete example:
```sql
@@ -420,7 +453,7 @@ Here's a concrete example:
{%- if filter.get('op') == 'LIKE' -%}
AND
full_name LIKE {{ "'" + filter.get('val') + "'" }}
full_name LIKE '{{ filter.get('val') | replace("'", "''") }}'
# - OS preference detection is automatically enabled
```
### App Branding
The application name shown in the browser title bar and navigation can be
set via the `brandAppName` theme token:
```python
THEME_DEFAULT = {
"token": {
"brandAppName": "Acme Analytics",
# ... other tokens
}
}
```
Or in the theme CRUD UI JSON editor:
```json
{
"token": {
"brandAppName": "Acme Analytics"
}
}
```
The existing `APP_NAME` Python config key continues to work for backward compatibility.
`brandAppName` takes precedence when both are set, and allows different themes to carry different brand names.
Email and alert/report notification subjects are driven by backend settings such as
`EMAIL_REPORTS_SUBJECT_PREFIX` and `APP_NAME`, not by this theme token.
### Migration from Configuration to UI
When `ENABLE_UI_THEME_ADMINISTRATION = True`:
@@ -93,6 +122,17 @@ When `ENABLE_UI_THEME_ADMINISTRATION = True`:
3. Administrators can change system themes without restarting Superset
4. Configuration file themes serve as fallbacks when no UI themes are set
### Theme Validation and Fallback
Superset validates theme JSON when it is saved, either through the UI or via configuration. If a theme contains invalid tokens or an unrecognized structure, Superset logs a warning and falls back to the built-in default theme rather than applying a broken configuration. This prevents a bad theme from rendering the application unusable.
The fallback order is:
1. **UI-configured system theme** (highest priority, if `ENABLE_UI_THEME_ADMINISTRATION = True`)
2. **`THEME_DEFAULT` / `THEME_DARK`** from `superset_config.py`
3. **Built-in Superset default theme** (always present as a safety net)
If you see unexpected styling after a config change, check the Superset server logs for theme validation warnings.
### Copying Themes Between Systems
To export a theme for use in configuration files or another instance:
@@ -114,7 +154,11 @@ Superset supports custom fonts through the theme configuration, allowing you to
### Default Fonts
By default, Superset uses Inter and Fira Code fonts which are bundled with the application via `@fontsource` packages. These fonts work offline and require no external network calls.
By default, Superset uses **Inter** for UI text and **IBM Plex Mono** for code (SQL editors, JSON fields, and other monospace contexts). Both fonts are bundled with the application via `@fontsource` packages and work offline without any external network calls.
:::note
IBM Plex Mono replaced Fira Code as the default code font in Superset 6.1. If you have an existing theme that explicitly sets `fontFamilyCode: "Fira Code, ..."`, you may want to update it.
:::
### Configuring Custom Fonts
@@ -312,11 +356,25 @@ Available chart types for `echartsOptionsOverridesByChartType`:
- `echarts_heatmap` - Heatmaps
- `echarts_mixed_timeseries` - Mixed time series
### Array Property Overrides
Array properties (such as color palettes) are fully supported in overrides. Arrays are **replaced entirely** rather than merged, so specify the complete array:
```python
THEME_DEFAULT = {
"token": { ... },
"echartsOptionsOverrides": {
# Replace the default color palette for all ECharts visualizations
@@ -20,7 +20,7 @@ To help make the problem somewhat tractable—given that Apache Superset has no
To strive for data consistency (regardless of the timezone of the client) the Apache Superset backend tries to ensure that any timestamp sent to the client has an explicit (or semi-explicit as in the case with [Epoch time](https://en.wikipedia.org/wiki/Unix_time) which is always in reference to UTC) timezone encoded within.
The challenge however lies with the slew of [database engines](/docs/databases#installing-drivers-in-docker) which Apache Superset supports and various inconsistencies between their [Python Database API (DB-API)](https://www.python.org/dev/peps/pep-0249/) implementations combined with the fact that we use [Pandas](https://pandas.pydata.org/) to read SQL into a DataFrame prior to serializing to JSON. Regrettably Pandas ignores the DB-API [type_code](https://www.python.org/dev/peps/pep-0249/#type-objects) relying by default on the underlying Python type returned by the DB-API. Currently only a subset of the supported database engines work correctly with Pandas, i.e., ensuring timestamps without an explicit timestamp are serializd to JSON with the server timezone, thus guaranteeing the client will display timestamps in a consistent manner irrespective of the client's timezone.
The challenge however lies with the slew of [database engines](/user-docs/databases#installing-drivers-in-docker) which Apache Superset supports and various inconsistencies between their [Python Database API (DB-API)](https://www.python.org/dev/peps/pep-0249/) implementations combined with the fact that we use [Pandas](https://pandas.pydata.org/) to read SQL into a DataFrame prior to serializing to JSON. Regrettably Pandas ignores the DB-API [type_code](https://www.python.org/dev/peps/pep-0249/#type-objects) relying by default on the underlying Python type returned by the DB-API. Currently only a subset of the supported database engines work correctly with Pandas, i.e., ensuring timestamps without an explicit timestamp are serialized to JSON with the server timezone, thus guaranteeing the client will display timestamps in a consistent manner irrespective of the client's timezone.
For example the following is a comparison of MySQL and Presto,
If you install with Kubernetes or Docker Compose, all of these components will be created.
@@ -59,7 +59,7 @@ The caching layer serves two main functions:
- Store the results of queries to your data warehouse so that when a chart is loaded twice, it pulls from the cache the second time, speeding up the application and reducing load on your data warehouse.
- Act as a message broker for the worker, enabling the Alerts & Reports, async queries, and thumbnail caching features.
Most people use Redis for their cache, but Superset supports other options too. See the [cache docs](/docs/configuration/cache/) for more.
Most people use Redis for their cache, but Superset supports other options too. See the [cache docs](/admin-docs/configuration/cache/) for more.
### Worker and Beat
@@ -67,6 +67,6 @@ This is one or more workers who execute tasks like run async queries or take sna
## Other components
Other components can be incorporated into Superset. The best place to learn about additional configurations is the [Configuration page](/docs/configuration/configuring-superset). For instance, you could set up a load balancer or reverse proxy to implement HTTPS in front of your Superset application, or specify a Mapbox URL to enable geospatial charts, etc.
Other components can be incorporated into Superset. The best place to learn about additional configurations is the [Configuration page](/admin-docs/configuration/configuring-superset). For instance, you could set up a load balancer or reverse proxy to implement HTTPS in front of your Superset application, or specify a Mapbox URL to enable geospatial charts, etc.
Superset won't even start without certain configuration settings established, so it's essential to review that page.
@@ -9,11 +9,11 @@ import useBaseUrl from "@docusaurus/useBaseUrl";
# Installation Methods
How should you install Superset? Here's a comparison of the different options. It will help if you've first read the [Architecture](/docs/installation/architecture.mdx) page to understand Superset's different components.
How should you install Superset? Here's a comparison of the different options. It will help if you've first read the [Architecture](/admin-docs/installation/architecture) page to understand Superset's different components.
The fundamental trade-off is between you needing to do more of the detail work yourself vs. using a more complex deployment route that handles those details.
**Summary:** This takes advantage of containerization while remaining simpler than Kubernetes. This is the best way to try out Superset; it's also useful for developing & contributing back to Superset.
@@ -27,9 +27,9 @@ You will need to back up your metadata DB. That could mean backing up the servic
You will also need to extend the Superset docker image. The default `lean` images do not contain drivers needed to access your metadata database (Postgres or MySQL), nor to access your data warehouse, nor the headless browser needed for Alerts & Reports. You could run a `-dev` image while demoing Superset, which has some of this, but you'll still need to install the driver for your data warehouse. The `-dev` images run as root, which is not recommended for production.
Ideally you will build your own image of Superset that extends `lean`, adding what your deployment needs. See [Building your own production Docker image](/docs/installation/docker-builds/#building-your-own-production-docker-image).
Ideally you will build your own image of Superset that extends `lean`, adding what your deployment needs. See [Building your own production Docker image](/admin-docs/installation/docker-builds/#building-your-own-production-docker-image).
**Summary:** This is the best-practice way to deploy a production instance of Superset, but has the steepest skill requirement - someone who knows Kubernetes.
@@ -41,7 +41,7 @@ A K8s deployment can scale up and down based on usage and deploy rolling updates
You will need to build your own Docker image, and back up your metadata DB, both as described in Docker Compose above. You'll also need to customize your Helm chart values and deploy and maintain your Kubernetes cluster.
## [PyPI (Python)](/docs/installation/pypi.mdx)
## [PyPI (Python)](/admin-docs/installation/pypi)
**Summary:** This is the only method that requires no knowledge of containers. It requires the most hands-on work to deploy, connect, and maintain each component.
Then, define mandatory configurations, SECRET_KEY and FLASK_APP:
```bash
export SUPERSET_SECRET_KEY=YOUR-SECRET-KEY # For production use, make sure this is a strong key, for example generated using `openssl rand -base64 42`. See https://superset.apache.org/docs/configuration/configuring-superset#specifying-a-secret_key
export SUPERSET_SECRET_KEY=YOUR-SECRET-KEY # For production use, make sure this is a strong key, for example generated using `openssl rand -base64 42`. See https://superset.apache.org/admin-docs/configuration/configuring-superset#specifying-a-secret_key
@@ -114,7 +114,7 @@ Superset can use Flask-Talisman to set security headers. However, it must be exp
>
> In Superset 4.0 and later, Talisman is disabled by default (`TALISMAN_ENABLED = False`). You **must** explicitly enable it in your `superset_config.py` for the security headers defined in `TALISMAN_CONFIG` to take effect.
Here's the documentation section how how to set up Talisman: https://superset.apache.org/docs/security/#content-security-policy-csp
Here's the documentation section how how to set up Talisman: https://superset.apache.org/admin-docs/security/#content-security-policy-csp
### **Database Security**
@@ -171,7 +171,7 @@ Rotating the `SUPERSET_SECRET_KEY` is a critical security procedure. It is manda
**Procedure for Rotating the Key**
The procedure for safely rotating the SECRET_KEY must be followed precisely to avoid locking yourself out of your instance. The official Apache Superset documentation maintains the correct, up-to-date procedure. Please follow the official guide here:
- [Blog: Running Apache Superset on the Open Internet](https://preset.io/blog/running-apache-superset-on-the-open-internet-a-report-from-the-fireline/)
@@ -24,6 +24,14 @@ A table with the permissions for these roles can be found at [/RESOURCES/STANDAR
Admins have all possible rights, including granting or revoking rights from other
users and altering other people’s slices and dashboards.
>#### Threat Model and Privilege Boundaries: The Admin Role
>
>Apache Superset is built with a granular permission model where users assigned the Admin role are considered fully trusted. Admins possess complete control over the application's configuration, UI rendering, and access controls.
>
>Consequently, actions performed by an Admin that alter the application's behavior or presentation—such as injecting custom CSS, modifying Jinja templates, or altering security flags—are intended administrative capabilities by design.
>
>In accordance with MITRE CNA Rule 4.1, a vulnerability must represent a violation of an explicit security policy. Because the Admin role is defined as a trusted operational boundary, actions executed with Admin privileges do not cross a security perimeter. Therefore, exploit vectors that strictly require Admin access are not classified as security vulnerabilities and are ineligible for CVE assignment.
### Alpha
Alpha users have access to all data sources, but they cannot grant or revoke access
@@ -44,6 +52,15 @@ only see the objects that they have access to.
The **sql_lab** role grants access to SQL Lab. Note that while **Admin** users have access
to all databases by default, both **Alpha** and **Gamma** users need to be given access on a per database basis.
Beyond the base `sql_lab` role, two additional SQL Lab permissions must be explicitly granted for users who need these capabilities:
| Permission | Feature |
|------------|---------|
| `can_estimate_query_cost` on `SQLLab` | Estimate query cost before running |
| `can_format_sql` on `SQLLab` | Format SQL using the database's dialect |
Grant these in **Security → List Roles** by adding the permissions to the relevant role.
### Public
The **Public** role is the most restrictive built-in role, designed specifically for anonymous/unauthenticated
@@ -174,6 +191,8 @@ However, it is crucial to understand the following:
By combining Superset's configurable safeguards with strong database-level security practices, you can achieve a more robust and layered security posture.
**Dataset Sample Access**: The `get_samples()` endpoint now enforces datasource-level access control. Users can only fetch sample rows from datasets they have been explicitly granted access to — the same permission check applied when running chart queries. This closes a prior gap where unauthenticated or under-privileged access could retrieve sample data.
### REST API for user & role management
Flask-AppBuilder supports a REST API for user CRUD,
@@ -186,6 +205,57 @@ FAB_ADD_SECURITY_API = True
Once configured, the documentation for additional "Security" endpoints will be visible in Swagger for you to explore.
### API Key Authentication
Superset supports long-lived API keys for service accounts, CI/CD pipelines, and programmatic integrations (including MCP clients).
#### Enabling API Key Authentication
API key authentication is **disabled by default**. To turn it on, set the Flask-AppBuilder config value in `superset_config.py` and also enable the matching feature flag so the management UI is exposed:
```python
FAB_API_KEY_ENABLED = True
FEATURE_FLAGS = {
"FAB_API_KEY_ENABLED": True,
}
```
The config value registers the `ApiKeyApi` blueprint on the backend; the feature flag controls whether the UI for managing keys appears for the user. See the [Feature Flags](/admin-docs/configuration/feature-flags) documentation for more on feature flag configuration.
#### Creating an API Key
Once enabled, each user manages their own keys from their profile page:
1. Open the user menu (top-right) and click **Info** to navigate to the User Info page
2. Expand the **API Keys** section
3. Click **+ API Key**
4. Enter a name and (optionally) an expiration date
5. Copy the generated token — it is shown only once
Only users with the `can_read` and `can_write` permissions on `ApiKey` (granted by default to Admins) can manage API keys.
#### Using an API Key
Pass the key as a Bearer token in the `Authorization` header:
```
Authorization: Bearer <your-api-key>
```
This works for all REST API endpoints and the MCP server. The request is executed with the permissions of the user who created the key.
#### Use Cases
- **CI/CD pipelines** — automated chart/dashboard exports and imports
- **MCP integrations** — connect AI assistants without interactive login
- **External services** — dashboards embedded in other applications
- **Service accounts** — long-lived credentials that don't expire with session cookies
:::caution
Store API keys securely. Anyone with a valid key can make requests on behalf of the creating user. Revoke keys promptly if they are compromised by deleting them from the **API Keys** section of your User Info page.
:::
### Customizing Permissions
The permissions exposed by FAB are very granular and allow for a great level of
@@ -231,26 +301,143 @@ based on the roles and permissions that were attributed.
### Row Level Security
Using Row Level Security filters (under the **Security** menu) you can create filters
that are assigned to a particular table, as well as a set of roles.
that are assigned to a particular dataset, as well as a set of roles.
If you want members of the Finance team to only have access to
rows where `department = "finance"`, you could:
- Create a Row Level Security filter with that clause (`department = "finance"`)
- Then assign the clause to the **Finance** role and the table it applies to
- Then assign the clause to the **Finance** role and the dataset it applies to
The **clause** field, which can contain arbitrary text, is then added to the generated
SQL statement’s WHERE clause. So you could even do something like create a filter
SQL statement's WHERE clause. So you could even do something like create a filter
for the last 30 days and apply it to a specific role, with a clause
like `date_field > DATE_SUB(NOW(), INTERVAL 30 DAY)`. It can also support
multiple conditions: `client_id = 6` AND `advertiser="foo"`, etc.
All relevant Row level security filters will be combined together (under the hood,
the different SQL clauses are combined using AND statements). This means it's
possible to create a situation where two roles conflict in such a way as to limit a table subset to empty.
RLS clauses also support **Jinja templating** when `ENABLE_TEMPLATE_PROCESSING` is enabled, so you can write dynamic filters such as
`user_id = '{{ current_username() }}'` to restrict rows based on the logged-in user.
For example, the filters `client_id=4` and `client_id=5`, applied to a role,
will result in users of that role having `client_id=4` AND `client_id=5`
added to their query, which can never be true.
#### Filter Types
There are two types of RLS filters:
- **Regular** — The filter clause is applied when the querying user belongs to one of the
roles assigned to the filter. Use this to restrict what specific roles can see.
- **Base** — The filter clause is applied to **all** users _except_ those in the assigned
roles. Use this to define a default restriction that privileged roles (e.g. Admin) are
exempt from. For example, a Base filter with clause `1 = 0` and the Admin role would
hide all rows from everyone except Admin — useful as a deny-by-default baseline.
#### Group Keys and Filter Combination
All applicable RLS filters are combined before being added to the query. The combination
rules are:
- Filters that share the **same group key** are combined with **OR** (any match within
the group is sufficient).
- Different filter groups (different group keys, or no group key) are combined with
**AND** (all groups must match).
- Filters with **no group key** are each treated as their own group and are always AND'd.
"description":"Enables filter functionality in Alerts and Reports"
},
{
"name":"ALERT_REPORT_SLACK_V2",
"default":false,
"lifecycle":"testing",
"description":"Enables Slack V2 integration for Alerts and Reports"
},
{
"name":"ALERT_REPORT_WEBHOOK",
"default":false,
"lifecycle":"testing",
"description":"Enables webhook integration for Alerts and Reports"
},
{
"name":"ALLOW_FULL_CSV_EXPORT",
"default":false,
"lifecycle":"testing",
"description":"Allow users to export full CSV of table viz type. Warning: Could cause server memory/compute issues with large datasets."
},
{
"name":"AWS_DATABASE_IAM_AUTH",
"default":false,
"lifecycle":"testing",
"description":"Enable AWS IAM authentication for database connections (Aurora, Redshift). Allows cross-account role assumption via STS AssumeRole. Security note: When enabled, ensure Superset's IAM role has restricted sts:AssumeRole permissions to prevent unauthorized access."
},
{
"name":"CACHE_IMPERSONATION",
"default":false,
"lifecycle":"testing",
"description":"Enable caching per impersonation key in datasources with user impersonation"
},
{
"name":"DATE_FORMAT_IN_EMAIL_SUBJECT",
"default":false,
"lifecycle":"testing",
"description":"Allow users to optionally specify date formats in email subjects",
"description":"Generate screenshots (PDF/JPG) of dashboards using web driver. Depends on ENABLE_DASHBOARD_SCREENSHOT_ENDPOINTS."
},
{
"name":"ENABLE_DASHBOARD_SCREENSHOT_ENDPOINTS",
"default":false,
"lifecycle":"testing",
"description":"Enables endpoints to cache and retrieve dashboard screenshots via webdriver. Requires Celery and THUMBNAIL_CACHE_CONFIG."
},
{
"name":"ENABLE_SUPERSET_META_DB",
"default":false,
"lifecycle":"testing",
"description":"Allows users to add a superset:// DB that can query across databases. Experimental with potential security/performance risks. See SUPERSET_META_DB_LIMIT.",
"description":"Avoid color collisions in charts by using distinct colors"
},
{
"name":"DRILL_TO_DETAIL",
"default":true,
"lifecycle":"deprecated",
"description":"Enable drill-to-detail functionality in charts"
},
{
"name":"ENABLE_JAVASCRIPT_CONTROLS",
"default":false,
"lifecycle":"deprecated",
"description":"Allow JavaScript in chart controls. WARNING: XSS security vulnerability!"
}
]
}
}
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.