mirror of
https://github.com/apache/superset.git
synced 2026-05-08 01:15:46 +00:00
docs: cut 6.1.0 versions for docs, admin_docs, developer_docs, components
- Snapshot all four versioned docs sections at v6.1.0; master continues to serve as "Next" (lastVersion: current, banner: unreleased) so editing master keeps updating the canonical URLs - Enable the previously-disabled components plugin and version it - Rename stale "developer_portal" references to "developer_docs" across package.json scripts, manage-versions.mjs, theme files (DocVersionBadge, DocVersionBanner), DOCS_CLAUDE.md, and README.md (URL backward-compat redirect /developer_portal/* preserved) - Add admin_docs version scripts; drop dead "tutorials" plugin id from the version badge - Generalize fixVersionedImports in manage-versions.mjs to walk every section's snapshot and rewrite ../../src/ and ../../data/ imports, catching admin_docs and components files that previous version cuts would have broken - Remove orphan files: developer_portal_versions.json, tutorials_versions.json, and stray empty versions.json files inside components/ and developer_docs/ content directories
This commit is contained in:
@@ -0,0 +1,419 @@
|
||||
---
|
||||
title: Alerts and Reports
|
||||
hide_title: true
|
||||
sidebar_position: 2
|
||||
version: 2
|
||||
---
|
||||
|
||||
# Alerts and Reports
|
||||
|
||||
Users can configure automated alerts and reports to send dashboards or charts to an email recipient or Slack channel.
|
||||
|
||||
- *Alerts* are sent when a SQL condition is reached
|
||||
- *Reports* are sent on a schedule
|
||||
|
||||
Alerts and reports are disabled by default. To turn them on, you'll need to change configuration settings and install a suitable headless browser in your environment.
|
||||
|
||||
## Requirements
|
||||
|
||||
### Commons
|
||||
|
||||
#### In your `superset_config.py` or `superset_config_docker.py`
|
||||
|
||||
- `"ALERT_REPORTS"` [feature flag](/admin-docs/configuration/configuring-superset#feature-flags) must be turned to True.
|
||||
- `beat_schedule` in CeleryConfig must contain schedule for `reports.scheduler`.
|
||||
- At least one of those must be configured, depending on what you want to use:
|
||||
- emails: `SMTP_*` settings
|
||||
- Slack messages: `SLACK_API_TOKEN`
|
||||
- Users can customize the email subject by including date code placeholders, which will automatically be replaced with the corresponding UTC date when the email is sent. To enable this functionality, activate the `"DATE_FORMAT_IN_EMAIL_SUBJECT"` [feature flag](/admin-docs/configuration/configuring-superset#feature-flags). This enables date formatting in email subjects, preventing all reporting emails from being grouped into the same thread (optional for the reporting feature).
|
||||
- Use date codes from [strftime.org](https://strftime.org/) to create the email subject.
|
||||
- If no date code is provided, the original string will be used as the email subject.
|
||||
|
||||
##### Disable dry-run mode
|
||||
|
||||
Screenshots will be taken but no messages actually sent as long as `ALERT_REPORTS_NOTIFICATION_DRY_RUN = True`, its default value in `docker/pythonpath_dev/superset_config.py`. To disable dry-run mode and start receiving email/Slack notifications, set `ALERT_REPORTS_NOTIFICATION_DRY_RUN` to `False` in [superset config](https://github.com/apache/superset/blob/master/docker/pythonpath_dev/superset_config.py).
|
||||
|
||||
#### In your `Dockerfile`
|
||||
|
||||
You'll need to extend the Superset image to include a headless browser. Your options include:
|
||||
- Use Playwright with Chrome: this is the recommended approach as of version 4.1.x or greater. A working example of a Dockerfile that installs these tools is provided under "Building your own production Docker image" on the [Docker Builds](/admin-docs/installation/docker-builds#building-your-own-production-docker-image) page. Read the code comments there as you'll also need to change a feature flag in your config.
|
||||
- Use Firefox: you'll need to install geckodriver and Firefox.
|
||||
- Use Chrome without Playwright: you'll need to install Chrome and set the value of `WEBDRIVER_TYPE` to `"chrome"` in your `superset_config.py`.
|
||||
|
||||
In Superset versions <=4.0x, users installed Firefox or Chrome and that was documented here.
|
||||
|
||||
Only the worker container needs the browser.
|
||||
|
||||
### Slack integration
|
||||
|
||||
To send alerts and reports to Slack channels, you need to create a new Slack Application on your workspace.
|
||||
|
||||
1. Connect to your Slack workspace, then head to [https://api.slack.com/apps].
|
||||
2. Create a new app.
|
||||
3. Go to "OAuth & Permissions" section, and give the following scopes to your app:
|
||||
- `incoming-webhook`
|
||||
- `files:write`
|
||||
- `chat:write`
|
||||
- `channels:read`
|
||||
- `groups:read`
|
||||
4. At the top of the "OAuth and Permissions" section, click "install to workspace".
|
||||
5. Select a default channel for your app and continue.
|
||||
(You can post to any channel by inviting your Superset app into that channel).
|
||||
6. The app should now be installed in your workspace, and a "Bot User OAuth Access Token" should have been created. Copy that token in the `SLACK_API_TOKEN` variable of your `superset_config.py`.
|
||||
7. Ensure the feature flag `ALERT_REPORT_SLACK_V2` is set to True in `superset_config.py`
|
||||
8. Restart the service (or run `superset init`) to pull in the new configuration.
|
||||
|
||||
Note: when you configure an alert or a report, the Slack channel list takes channel names without the leading '#' e.g. use `alerts` instead of `#alerts`.
|
||||
|
||||
#### Large Slack Workspaces (10k+ channels)
|
||||
|
||||
For workspaces with many channels, fetching the complete channel list can take several minutes and may encounter Slack API rate limits. Add the following to your `superset_config.py`:
|
||||
|
||||
```python
|
||||
from datetime import timedelta
|
||||
|
||||
# Increase cache timeout to reduce API calls
|
||||
# Default: 1 day (86400 seconds)
|
||||
SLACK_CACHE_TIMEOUT = int(timedelta(days=2).total_seconds())
|
||||
|
||||
# Increase retry count for rate limit errors
|
||||
# Default: 2
|
||||
SLACK_API_RATE_LIMIT_RETRY_COUNT = 5
|
||||
```
|
||||
|
||||
### Kubernetes-specific
|
||||
|
||||
- You must have a `celery beat` pod running. If you're using the chart included in the GitHub repository under [helm/superset](https://github.com/apache/superset/tree/master/helm/superset), you need to put `supersetCeleryBeat.enabled = true` in your values override.
|
||||
- You can see the dedicated docs about [Kubernetes installation](/admin-docs/installation/kubernetes) for more details.
|
||||
|
||||
### Docker Compose specific
|
||||
|
||||
#### You must have in your `docker-compose.yml`
|
||||
|
||||
- A Redis message broker
|
||||
- PostgreSQL DB instead of SQLlite
|
||||
- One or more `celery worker`
|
||||
- A single `celery beat`
|
||||
|
||||
This process also works in a Docker swarm environment, you would just need to add `Deploy:` to the Superset, Redis and Postgres services along with your specific configs for your swarm.
|
||||
|
||||
### Detailed config
|
||||
|
||||
The following configurations need to be added to the `superset_config.py` file. This file is loaded when the image runs, and any configurations in it will override the default configurations found in the `config.py`.
|
||||
|
||||
You can find documentation about each field in the default `config.py` in the GitHub repository under [superset/config.py](https://github.com/apache/superset/blob/master/superset/config.py).
|
||||
|
||||
You need to replace default values with your custom Redis, Slack and/or SMTP config.
|
||||
|
||||
Superset uses Celery beat and Celery worker(s) to send alerts and reports.
|
||||
|
||||
- The beat is the scheduler that tells the worker when to perform its tasks. This schedule is defined when you create the alert or report.
|
||||
- The worker will process the tasks that need to be performed when an alert or report is fired.
|
||||
|
||||
In the `CeleryConfig`, only the `beat_schedule` is relevant to this feature, the rest of the `CeleryConfig` can be changed for your needs.
|
||||
|
||||
```python
|
||||
from celery.schedules import crontab
|
||||
|
||||
FEATURE_FLAGS = {
|
||||
"ALERT_REPORTS": True
|
||||
}
|
||||
|
||||
REDIS_HOST = "superset_cache"
|
||||
REDIS_PORT = "6379"
|
||||
|
||||
class CeleryConfig:
|
||||
broker_url = f"redis://{REDIS_HOST}:{REDIS_PORT}/0"
|
||||
imports = (
|
||||
"superset.sql_lab",
|
||||
"superset.tasks.scheduler",
|
||||
)
|
||||
result_backend = f"redis://{REDIS_HOST}:{REDIS_PORT}/0"
|
||||
worker_prefetch_multiplier = 10
|
||||
task_acks_late = True
|
||||
task_annotations = {
|
||||
"sql_lab.get_sql_results": {
|
||||
"rate_limit": "100/s",
|
||||
},
|
||||
}
|
||||
beat_schedule = {
|
||||
"reports.scheduler": {
|
||||
"task": "reports.scheduler",
|
||||
"schedule": crontab(minute="*", hour="*"),
|
||||
},
|
||||
"reports.prune_log": {
|
||||
"task": "reports.prune_log",
|
||||
"schedule": crontab(minute=0, hour=0),
|
||||
},
|
||||
}
|
||||
CELERY_CONFIG = CeleryConfig
|
||||
|
||||
SCREENSHOT_LOCATE_WAIT = 100
|
||||
SCREENSHOT_LOAD_WAIT = 600
|
||||
|
||||
# Slack configuration
|
||||
SLACK_API_TOKEN = "xoxb-"
|
||||
|
||||
# Email configuration
|
||||
SMTP_HOST = "smtp.sendgrid.net" # change to your host
|
||||
SMTP_PORT = 2525 # your port, e.g. 587
|
||||
SMTP_STARTTLS = True
|
||||
SMTP_SSL_SERVER_AUTH = True # If you're using an SMTP server with a valid certificate
|
||||
SMTP_SSL = False
|
||||
SMTP_USER = "your_user" # use the empty string "" if using an unauthenticated SMTP server
|
||||
SMTP_PASSWORD = "your_password" # use the empty string "" if using an unauthenticated SMTP server
|
||||
SMTP_MAIL_FROM = "noreply@youremail.com"
|
||||
EMAIL_REPORTS_SUBJECT_PREFIX = "[Superset] " # optional - overwrites default value in config.py of "[Report] "
|
||||
|
||||
# WebDriver configuration
|
||||
# If you use Firefox or Playwright with Chrome, you can stick with default values
|
||||
# If you use Chrome and are *not* using Playwright, then add the following WEBDRIVER_TYPE and WEBDRIVER_OPTION_ARGS
|
||||
WEBDRIVER_TYPE = "chrome"
|
||||
WEBDRIVER_OPTION_ARGS = [
|
||||
"--force-device-scale-factor=2.0",
|
||||
"--high-dpi-support=2.0",
|
||||
"--headless",
|
||||
"--disable-gpu",
|
||||
"--disable-dev-shm-usage",
|
||||
"--no-sandbox",
|
||||
"--disable-setuid-sandbox",
|
||||
"--disable-extensions",
|
||||
]
|
||||
|
||||
# This is for internal use, you can keep http
|
||||
WEBDRIVER_BASEURL = "http://superset:8088" # When running using docker compose use "http://superset_app:8088'
|
||||
# This is the link sent to the recipient. Change to your domain, e.g. https://superset.mydomain.com
|
||||
WEBDRIVER_BASEURL_USER_FRIENDLY = "http://localhost:8088"
|
||||
```
|
||||
|
||||
You also need
|
||||
to specify on behalf of which username to render the dashboards. In general, dashboards and charts
|
||||
are not accessible to unauthorized requests, that is why the worker needs to take over credentials
|
||||
of an existing user to take a snapshot.
|
||||
|
||||
By default, Alerts and Reports are executed as the owner of the alert/report object. To use a fixed user account,
|
||||
just change the config as follows (`admin` in this example):
|
||||
|
||||
```python
|
||||
from superset.tasks.types import FixedExecutor
|
||||
|
||||
ALERT_REPORTS_EXECUTORS = [FixedExecutor("admin")]
|
||||
```
|
||||
|
||||
Please refer to `ExecutorType` in the codebase for other executor types.
|
||||
|
||||
**Important notes**
|
||||
|
||||
- Be mindful of the concurrency setting for celery (using `-c 4`). Selenium/webdriver instances can
|
||||
consume a lot of CPU / memory on your servers.
|
||||
- In some cases, if you notice a lot of leaked geckodriver processes, try running your celery
|
||||
processes with `celery worker --pool=prefork --max-tasks-per-child=128 ...`
|
||||
- It is recommended to run separate workers for the `sql_lab` and `email_reports` tasks. This can be
|
||||
done using the `queue` field in `task_annotations`.
|
||||
- Adjust `WEBDRIVER_BASEURL` in your configuration file if celery workers can’t access Superset via
|
||||
its default value of `http://0.0.0.0:8080/`.
|
||||
|
||||
It's also possible to specify a minimum interval between each report's execution through the config file:
|
||||
|
||||
``` python
|
||||
# Set a minimum interval threshold between executions (for each Alert/Report)
|
||||
# Value should be an integer
|
||||
ALERT_MINIMUM_INTERVAL = int(timedelta(minutes=10).total_seconds())
|
||||
REPORT_MINIMUM_INTERVAL = int(timedelta(minutes=5).total_seconds())
|
||||
```
|
||||
|
||||
Alternatively, you can assign a function to `ALERT_MINIMUM_INTERVAL` and/or `REPORT_MINIMUM_INTERVAL`. This is useful to dynamically retrieve a value as needed:
|
||||
|
||||
``` python
|
||||
def alert_dynamic_minimal_interval(**kwargs) -> int:
|
||||
"""
|
||||
Define logic here to retrieve the value dynamically
|
||||
"""
|
||||
|
||||
ALERT_MINIMUM_INTERVAL = alert_dynamic_minimal_interval
|
||||
```
|
||||
|
||||
## External Link Redirection
|
||||
|
||||
For security, Superset rewrites external links in alert/report email HTML so
|
||||
they go through a warning page before the user is navigated to the external
|
||||
site. Internal links (matching your configured base URL) are not affected.
|
||||
|
||||
```python
|
||||
# Disable external link redirection entirely (default: True)
|
||||
ALERT_REPORTS_ENABLE_LINK_REDIRECT = False
|
||||
```
|
||||
|
||||
The feature uses `WEBDRIVER_BASEURL_USER_FRIENDLY` (or `WEBDRIVER_BASEURL`)
|
||||
to determine which hosts are internal.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
There are many reasons that reports might not be working. Try these steps to check for specific issues.
|
||||
|
||||
### Confirm feature flag is enabled and you have sufficient permissions
|
||||
|
||||
If you don't see "Alerts & Reports" under the *Manage* section of the Settings dropdown in the Superset UI, you need to enable the `ALERT_REPORTS` feature flag (see above). Enable another feature flag and check to see that it took effect, to verify that your config file is getting loaded.
|
||||
|
||||
Log in as an admin user to ensure you have adequate permissions.
|
||||
|
||||
### Check the logs of your Celery worker
|
||||
|
||||
This is the best source of information about the problem. In a docker compose deployment, you can do this with a command like `docker logs superset_worker --since 1h`.
|
||||
|
||||
### Check web browser and webdriver installation
|
||||
|
||||
To take a screenshot, the worker visits the dashboard or chart using a headless browser, then takes a screenshot. If you are able to send a chart as CSV or text but can't send as PNG, your problem may lie with the browser.
|
||||
|
||||
If you are handling the installation of the headless browser on your own, do your own verification to ensure that the headless browser opens successfully in the worker environment.
|
||||
|
||||
### Send a test email
|
||||
|
||||
One symptom of an invalid connection to an email server is receiving an error of `[Errno 110] Connection timed out` in your logs when the report tries to send.
|
||||
|
||||
Confirm via testing that your outbound email configuration is correct. Here is the simplest test, for an un-authenticated email SMTP email service running on port 25. If you are sending over SSL, for instance, study how [Superset's codebase sends emails](https://github.com/apache/superset/blob/master/superset/utils/core.py#L818) and then test with those commands and arguments.
|
||||
|
||||
Start Python in your worker environment, replace all example values, and run:
|
||||
|
||||
```python
|
||||
import smtplib
|
||||
from email.mime.multipart import MIMEMultipart
|
||||
from email.mime.text import MIMEText
|
||||
|
||||
from_email = 'superset_emails@example.com'
|
||||
to_email = 'your_email@example.com'
|
||||
msg = MIMEMultipart()
|
||||
msg['From'] = from_email
|
||||
msg['To'] = to_email
|
||||
msg['Subject'] = 'Superset SMTP config test'
|
||||
message = 'It worked'
|
||||
msg.attach(MIMEText(message))
|
||||
mailserver = smtplib.SMTP('smtpmail.example.com', 25)
|
||||
mailserver.sendmail(from_email, to_email, msg.as_string())
|
||||
mailserver.quit()
|
||||
```
|
||||
|
||||
This should send an email.
|
||||
|
||||
Possible fixes:
|
||||
|
||||
- Some cloud hosts disable outgoing unauthenticated SMTP email to prevent spam. For instance, [Azure blocks port 25 by default on some machines](https://learn.microsoft.com/en-us/azure/virtual-network/troubleshoot-outbound-smtp-connectivity). Enable that port or use another sending method.
|
||||
- Use another set of SMTP credentials that you verify works in this setup.
|
||||
|
||||
### Browse to your report from the worker
|
||||
|
||||
The worker may be unable to reach the report. It will use the value of `WEBDRIVER_BASEURL` to browse to the report. If that route is invalid, or presents an authentication challenge that the worker can't pass, the report screenshot will fail.
|
||||
|
||||
Check this by attempting to `curl` the URL of a report that you see in the error logs of your worker. For instance, from the worker environment, run `curl http://superset_app:8088/superset/dashboard/1/`. You may get different responses depending on whether the dashboard exists - for example, you may need to change the `1` in that URL. If there's a URL in your logs from a failed report screenshot, that's a good place to start. The goal is to determine a valid value for `WEBDRIVER_BASEURL` and determine if an issue like HTTPS or authentication is redirecting your worker.
|
||||
|
||||
In a deployment with authentication measures enabled like HTTPS and Single Sign-On, it may make sense to have the worker navigate directly to the Superset application running in the same location, avoiding the need to sign in. For instance, you could use `WEBDRIVER_BASEURL="http://superset_app:8088"` for a docker compose deployment, and set `"force_https": False,` in your `TALISMAN_CONFIG`.
|
||||
|
||||
### Duplicate report deliveries
|
||||
|
||||
In some deployment configurations a scheduled report can be delivered more than once around its planned time. This typically happens when more than one process is responsible for running the alerts & reports schedule (for example, multiple schedulers or Celery beat instances). To avoid duplicate emails or notifications:
|
||||
|
||||
- Ensure that only a **single scheduler/beat process** is configured to trigger alerts and reports for a given environment.
|
||||
- If you run **multiple Celery workers**, verify that there is still only one component responsible for scheduling the report tasks (workers should execute tasks, not schedule them independently).
|
||||
- Review your deployment/orchestration setup (for example systemd, Docker, or Kubernetes) to make sure the alerts & reports scheduler is **not started from multiple places by accident**.
|
||||
|
||||
## Scheduling Queries as Reports
|
||||
|
||||
You can optionally allow your users to schedule queries directly in SQL Lab. This is done by adding
|
||||
extra metadata to saved queries, which are then picked up by an external scheduled (like
|
||||
[Apache Airflow](https://airflow.apache.org/)).
|
||||
|
||||
To allow scheduled queries, add the following to `SCHEDULED_QUERIES` in your configuration file:
|
||||
|
||||
```python
|
||||
SCHEDULED_QUERIES = {
|
||||
# This information is collected when the user clicks "Schedule query",
|
||||
# and saved into the `extra` field of saved queries.
|
||||
# See: https://github.com/mozilla-services/react-jsonschema-form
|
||||
'JSONSCHEMA': {
|
||||
'title': 'Schedule',
|
||||
'description': (
|
||||
'In order to schedule a query, you need to specify when it '
|
||||
'should start running, when it should stop running, and how '
|
||||
'often it should run. You can also optionally specify '
|
||||
'dependencies that should be met before the query is '
|
||||
'executed. Please read the documentation for best practices '
|
||||
'and more information on how to specify dependencies.'
|
||||
),
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'output_table': {
|
||||
'type': 'string',
|
||||
'title': 'Output table name',
|
||||
},
|
||||
'start_date': {
|
||||
'type': 'string',
|
||||
'title': 'Start date',
|
||||
# date-time is parsed using the chrono library, see
|
||||
# https://www.npmjs.com/package/chrono-node#usage
|
||||
'format': 'date-time',
|
||||
'default': 'tomorrow at 9am',
|
||||
},
|
||||
'end_date': {
|
||||
'type': 'string',
|
||||
'title': 'End date',
|
||||
# date-time is parsed using the chrono library, see
|
||||
# https://www.npmjs.com/package/chrono-node#usage
|
||||
'format': 'date-time',
|
||||
'default': '9am in 30 days',
|
||||
},
|
||||
'schedule_interval': {
|
||||
'type': 'string',
|
||||
'title': 'Schedule interval',
|
||||
},
|
||||
'dependencies': {
|
||||
'type': 'array',
|
||||
'title': 'Dependencies',
|
||||
'items': {
|
||||
'type': 'string',
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
'UISCHEMA': {
|
||||
'schedule_interval': {
|
||||
'ui:placeholder': '@daily, @weekly, etc.',
|
||||
},
|
||||
'dependencies': {
|
||||
'ui:help': (
|
||||
'Check the documentation for the correct format when '
|
||||
'defining dependencies.'
|
||||
),
|
||||
},
|
||||
},
|
||||
'VALIDATION': [
|
||||
# ensure that start_date <= end_date
|
||||
{
|
||||
'name': 'less_equal',
|
||||
'arguments': ['start_date', 'end_date'],
|
||||
'message': 'End date cannot be before start date',
|
||||
# this is where the error message is shown
|
||||
'container': 'end_date',
|
||||
},
|
||||
],
|
||||
# link to the scheduler; this example links to an Airflow pipeline
|
||||
# that uses the query id and the output table as its name
|
||||
'linkback': (
|
||||
'https://airflow.example.com/admin/airflow/tree?'
|
||||
'dag_id=query_${id}_${extra_json.schedule_info.output_table}'
|
||||
),
|
||||
}
|
||||
```
|
||||
|
||||
This configuration is based on
|
||||
[react-jsonschema-form](https://github.com/mozilla-services/react-jsonschema-form) and will add a
|
||||
menu item called “Schedule” to SQL Lab. When the menu item is clicked, a modal will show up where
|
||||
the user can add the metadata required for scheduling the query.
|
||||
|
||||
This information can then be retrieved from the endpoint `/api/v1/saved_query/` and used to
|
||||
schedule the queries that have `schedule_info` in their JSON metadata. For schedulers other than
|
||||
Airflow, additional fields can be easily added to the configuration file above.
|
||||
|
||||
:::resources
|
||||
- [Tutorial: Automated Alerts and Reporting via Slack/Email in Superset](https://dev.to/ngtduc693/apache-superset-topic-5-automated-alerts-and-reporting-via-slackemail-in-superset-2gbe)
|
||||
- [Blog: Integrating Slack alerts and Apache Superset for better data observability](https://medium.com/affinityanswers-tech/integrating-slack-alerts-and-apache-superset-for-better-data-observability-fd2f9a12c350)
|
||||
:::
|
||||
@@ -0,0 +1,108 @@
|
||||
---
|
||||
title: Async Queries via Celery
|
||||
hide_title: true
|
||||
sidebar_position: 4
|
||||
version: 1
|
||||
---
|
||||
|
||||
# Async Queries via Celery
|
||||
|
||||
## Celery
|
||||
|
||||
On large analytic databases, it’s common to run queries that execute for minutes or hours. To enable
|
||||
support for long running queries that execute beyond the typical web request’s timeout (30-60
|
||||
seconds), it is necessary to configure an asynchronous backend for Superset which consists of:
|
||||
|
||||
- one or many Superset workers (which is implemented as a Celery worker), and can be started with
|
||||
the `celery worker` command, run `celery worker --help` to view the related options.
|
||||
- a celery broker (message queue) for which we recommend using Redis or RabbitMQ
|
||||
- a results backend that defines where the worker will persist the query results
|
||||
|
||||
Configuring Celery requires defining a `CELERY_CONFIG` in your `superset_config.py`. Both the worker
|
||||
and web server processes should have the same configuration.
|
||||
|
||||
```python
|
||||
class CeleryConfig(object):
|
||||
broker_url = "redis://localhost:6379/0"
|
||||
imports = (
|
||||
"superset.sql_lab",
|
||||
"superset.tasks.scheduler",
|
||||
)
|
||||
result_backend = "redis://localhost:6379/0"
|
||||
worker_prefetch_multiplier = 10
|
||||
task_acks_late = True
|
||||
task_annotations = {
|
||||
"sql_lab.get_sql_results": {
|
||||
"rate_limit": "100/s",
|
||||
},
|
||||
}
|
||||
|
||||
CELERY_CONFIG = CeleryConfig
|
||||
```
|
||||
|
||||
To start a Celery worker to leverage the configuration, run the following command:
|
||||
|
||||
```bash
|
||||
celery --app=superset.tasks.celery_app:app worker --pool=prefork -O fair -c 4
|
||||
```
|
||||
|
||||
To start a job which schedules periodic background jobs, run the following command:
|
||||
|
||||
```bash
|
||||
celery --app=superset.tasks.celery_app:app beat
|
||||
```
|
||||
|
||||
To setup a result backend, you need to pass an instance of a derivative of `BaseCache` (`from
|
||||
flask_caching.backends.base import BaseCache`) to the RESULTS_BACKEND configuration key in your
|
||||
superset_config.py. You can use Memcached, Redis, S3 (https://pypi.python.org/pypi/s3werkzeugcache),
|
||||
memory or the file system (in a single server-type setup or for testing), or to write your own
|
||||
caching interface. Your `superset_config.py` may look something like:
|
||||
|
||||
```python
|
||||
# On S3
|
||||
from s3cache.s3cache import S3Cache
|
||||
S3_CACHE_BUCKET = 'foobar-superset'
|
||||
S3_CACHE_KEY_PREFIX = 'sql_lab_result'
|
||||
RESULTS_BACKEND = S3Cache(S3_CACHE_BUCKET, S3_CACHE_KEY_PREFIX)
|
||||
|
||||
# On Redis
|
||||
from flask_caching.backends.rediscache import RedisCache
|
||||
RESULTS_BACKEND = RedisCache(
|
||||
host='localhost', port=6379, key_prefix='superset_results')
|
||||
```
|
||||
|
||||
For performance gains, [MessagePack](https://github.com/msgpack/msgpack-python) and
|
||||
[PyArrow](https://arrow.apache.org/docs/python/) are now used for results serialization. This can be
|
||||
disabled by setting `RESULTS_BACKEND_USE_MSGPACK = False` in your `superset_config.py`, should any
|
||||
issues arise. Please clear your existing results cache store when upgrading an existing environment.
|
||||
|
||||
**Important Notes**
|
||||
|
||||
- It is important that all the worker nodes and web servers in the Superset cluster _share a common
|
||||
metadata database_. This means that SQLite will not work in this context since it has limited
|
||||
support for concurrency and typically lives on the local file system.
|
||||
|
||||
- There should _only be one instance of celery beat running_ in your entire setup. If not,
|
||||
background jobs can get scheduled multiple times resulting in weird behaviors like duplicate
|
||||
delivery of reports, higher than expected load / traffic etc.
|
||||
|
||||
- SQL Lab will _only run your queries asynchronously if_ you enable **Asynchronous Query Execution**
|
||||
in your database settings (Sources > Databases > Edit record).
|
||||
|
||||
## Celery Flower
|
||||
|
||||
Flower is a web based tool for monitoring the Celery cluster which you can install from pip:
|
||||
|
||||
```bash
|
||||
pip install flower
|
||||
```
|
||||
|
||||
You can run flower using:
|
||||
|
||||
```bash
|
||||
celery --app=superset.tasks.celery_app:app flower
|
||||
```
|
||||
|
||||
:::resources
|
||||
- [Blog: How to Set Up Global Async Queries (GAQ) in Apache Superset](https://medium.com/@ngigilevis/how-to-set-up-global-async-queries-gaq-in-apache-superset-a-complete-guide-9d2f4a047559)
|
||||
:::
|
||||
@@ -0,0 +1,162 @@
|
||||
{/*
|
||||
Licensed to the Apache Software Foundation (ASF) under one
|
||||
or more contributor license agreements. See the NOTICE file
|
||||
distributed with this work for additional information
|
||||
regarding copyright ownership. The ASF licenses this file
|
||||
to you under the Apache License, Version 2.0 (the
|
||||
"License"); you may not use this file except in compliance
|
||||
with the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing,
|
||||
software distributed under the License is distributed on an
|
||||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations
|
||||
under the License.
|
||||
*/}
|
||||
|
||||
---
|
||||
title: AWS IAM Authentication
|
||||
sidebar_label: AWS IAM Authentication
|
||||
sidebar_position: 15
|
||||
---
|
||||
|
||||
# AWS IAM Authentication for AWS Databases
|
||||
|
||||
Superset supports IAM-based authentication for **Amazon Aurora** (PostgreSQL and MySQL) and **Amazon Redshift**. IAM auth eliminates the need for database passwords — Superset generates a short-lived auth token using temporary AWS credentials instead.
|
||||
|
||||
Cross-account IAM role assumption via STS `AssumeRole` is supported, allowing a Superset deployment in one AWS account to connect to databases in a different account.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Enable the `AWS_DATABASE_IAM_AUTH` feature flag in `superset_config.py`. IAM authentication is gated behind this flag; if it is disabled, connections using `aws_iam` fail with *"AWS IAM database authentication is not enabled."*
|
||||
```python
|
||||
FEATURE_FLAGS = {
|
||||
"AWS_DATABASE_IAM_AUTH": True,
|
||||
}
|
||||
```
|
||||
- `boto3` must be installed in your Superset environment:
|
||||
```bash
|
||||
pip install boto3
|
||||
```
|
||||
- The Superset server's IAM role (or static credentials) must have permission to call `sts:AssumeRole` (for cross-account) or the same-account permissions for the target service:
|
||||
- **Aurora (RDS)**: `rds-db:connect`
|
||||
- **Redshift provisioned**: `redshift:GetClusterCredentials`
|
||||
- **Redshift Serverless**: `redshift-serverless:GetCredentials` and `redshift-serverless:GetWorkgroup`
|
||||
- SSL must be enabled on the Aurora / Redshift endpoint (required for IAM token auth).
|
||||
|
||||
## Configuration
|
||||
|
||||
IAM authentication is configured via the **encrypted_extra** field of the database connection. Access this field in the **Advanced** → **Security** section of the database connection form, under **Secure Extra**.
|
||||
|
||||
### Aurora PostgreSQL or Aurora MySQL
|
||||
|
||||
```json
|
||||
{
|
||||
"aws_iam": {
|
||||
"enabled": true,
|
||||
"role_arn": "arn:aws:iam::222222222222:role/SupersetDatabaseAccess",
|
||||
"external_id": "superset-prod-12345",
|
||||
"region": "us-east-1",
|
||||
"db_username": "superset_iam_user",
|
||||
"session_duration": 3600
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `enabled` | Yes | Set to `true` to activate IAM auth |
|
||||
| `role_arn` | No | ARN of the cross-account IAM role to assume via STS. Omit for same-account auth |
|
||||
| `external_id` | No | External ID for the STS `AssumeRole` call, if required by the target role's trust policy |
|
||||
| `region` | Yes | AWS region of the database cluster |
|
||||
| `db_username` | Yes | The database username associated with the IAM identity |
|
||||
| `session_duration` | No | STS session duration in seconds (default: `3600`) |
|
||||
|
||||
### Redshift (Serverless)
|
||||
|
||||
```json
|
||||
{
|
||||
"aws_iam": {
|
||||
"enabled": true,
|
||||
"role_arn": "arn:aws:iam::222222222222:role/SupersetRedshiftAccess",
|
||||
"region": "us-east-1",
|
||||
"workgroup_name": "my-workgroup",
|
||||
"db_name": "dev"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Redshift (Provisioned Cluster)
|
||||
|
||||
```json
|
||||
{
|
||||
"aws_iam": {
|
||||
"enabled": true,
|
||||
"role_arn": "arn:aws:iam::222222222222:role/SupersetRedshiftAccess",
|
||||
"region": "us-east-1",
|
||||
"cluster_identifier": "my-cluster",
|
||||
"db_username": "superset_iam_user",
|
||||
"db_name": "dev"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Cross-Account IAM Setup
|
||||
|
||||
To connect to a database in Account B from a Superset deployment in Account A:
|
||||
|
||||
**1. In Account B — create a database-access role:**
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": ["rds-db:connect"],
|
||||
"Resource": "arn:aws:rds-db:us-east-1:222222222222:dbuser/db-XXXXXXXXXXXX/superset_iam_user"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Trust policy** (allows Account A's Superset role to assume it):
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Principal": {
|
||||
"AWS": "arn:aws:iam::111111111111:role/SupersetInstanceRole"
|
||||
},
|
||||
"Action": "sts:AssumeRole",
|
||||
"Condition": {
|
||||
"StringEquals": {
|
||||
"sts:ExternalId": "superset-prod-12345"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**2. In Account A — grant Superset's role permission to assume the Account B role:**
|
||||
|
||||
```json
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": "sts:AssumeRole",
|
||||
"Resource": "arn:aws:iam::222222222222:role/SupersetDatabaseAccess"
|
||||
}
|
||||
```
|
||||
|
||||
**3. Configure the database connection in Superset** using the `role_arn` and `external_id` from the trust policy (as shown in the configuration example above).
|
||||
|
||||
## Credential Caching
|
||||
|
||||
STS credentials are cached in memory keyed by `(role_arn, region, external_id)` with a 10-minute TTL. This reduces the number of STS API calls when multiple queries are executed with the same connection. Tokens are refreshed automatically before expiry.
|
||||
@@ -0,0 +1,276 @@
|
||||
---
|
||||
title: Caching
|
||||
hide_title: true
|
||||
sidebar_position: 3
|
||||
version: 1
|
||||
---
|
||||
|
||||
# Caching
|
||||
|
||||
:::note
|
||||
When a cache backend is configured, Superset expects it to remain available. Operations will
|
||||
fail if the configured backend becomes unavailable rather than silently degrading. This
|
||||
fail-fast behavior ensures operators are immediately aware of infrastructure issues.
|
||||
:::
|
||||
|
||||
Superset uses [Flask-Caching](https://flask-caching.readthedocs.io/) for caching purposes.
|
||||
Flask-Caching supports various caching backends, including Redis (recommended), Memcached,
|
||||
SimpleCache (in-memory), or the local filesystem.
|
||||
[Custom cache backends](https://flask-caching.readthedocs.io/en/latest/#custom-cache-backends)
|
||||
are also supported.
|
||||
|
||||
Caching can be configured by providing dictionaries in
|
||||
`superset_config.py` that comply with [the Flask-Caching config specifications](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching).
|
||||
|
||||
The following cache configurations can be customized in this way:
|
||||
|
||||
- Dashboard filter state (required): `FILTER_STATE_CACHE_CONFIG`.
|
||||
- Explore chart form data (required): `EXPLORE_FORM_DATA_CACHE_CONFIG`
|
||||
- Metadata cache (optional): `CACHE_CONFIG`
|
||||
- Charting data queried from datasets (optional): `DATA_CACHE_CONFIG`
|
||||
|
||||
For example, to configure the filter state cache using Redis:
|
||||
|
||||
```python
|
||||
FILTER_STATE_CACHE_CONFIG = {
|
||||
'CACHE_TYPE': 'RedisCache',
|
||||
'CACHE_DEFAULT_TIMEOUT': 86400,
|
||||
'CACHE_KEY_PREFIX': 'superset_filter_cache',
|
||||
'CACHE_REDIS_URL': 'redis://localhost:6379/0'
|
||||
}
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
In order to use dedicated cache stores, additional python libraries must be installed
|
||||
|
||||
- For Redis: we recommend the [redis](https://pypi.python.org/pypi/redis) Python package
|
||||
- Memcached: we recommend using [pylibmc](https://pypi.org/project/pylibmc/) client library as
|
||||
`python-memcached` does not handle storing binary data correctly.
|
||||
|
||||
These libraries can be installed using pip.
|
||||
|
||||
## Fallback Metastore Cache
|
||||
|
||||
Note, that some form of Filter State and Explore caching are required. If either of these caches
|
||||
are undefined, Superset falls back to using a built-in cache that stores data in the metadata
|
||||
database. While it is recommended to use a dedicated cache, the built-in cache can also be used
|
||||
to cache other data.
|
||||
|
||||
For example, to use the built-in cache to store chart data, use the following config:
|
||||
|
||||
```python
|
||||
DATA_CACHE_CONFIG = {
|
||||
"CACHE_TYPE": "SupersetMetastoreCache",
|
||||
"CACHE_KEY_PREFIX": "superset_results", # make sure this string is unique to avoid collisions
|
||||
"CACHE_DEFAULT_TIMEOUT": 86400, # 60 seconds * 60 minutes * 24 hours
|
||||
}
|
||||
```
|
||||
|
||||
## Chart Cache Timeout
|
||||
|
||||
The cache timeout for charts may be overridden by the settings for an individual chart, dataset, or
|
||||
database. Each of these configurations will be checked in order before falling back to the default
|
||||
value defined in `DATA_CACHE_CONFIG`.
|
||||
|
||||
Note, that by setting the cache timeout to `-1`, caching for charting data can be disabled, either
|
||||
per chart, dataset or database, or by default if set in `DATA_CACHE_CONFIG`.
|
||||
|
||||
## SQL Lab Query Results
|
||||
|
||||
Caching for SQL Lab query results is used when async queries are enabled and is configured using
|
||||
`RESULTS_BACKEND`.
|
||||
|
||||
Note that this configuration does not use a flask-caching dictionary for its configuration, but
|
||||
instead requires a cachelib object.
|
||||
|
||||
See [Async Queries via Celery](/admin-docs/configuration/async-queries-celery) for details.
|
||||
|
||||
## Caching Thumbnails
|
||||
|
||||
This is an optional feature that can be turned on by activating its [feature flag](/admin-docs/configuration/configuring-superset#feature-flags) on config:
|
||||
|
||||
```
|
||||
FEATURE_FLAGS = {
|
||||
"THUMBNAILS": True,
|
||||
"THUMBNAILS_SQLA_LISTENERS": True,
|
||||
}
|
||||
```
|
||||
|
||||
By default thumbnails are rendered per user, and will fall back to the Selenium user for anonymous users.
|
||||
To always render thumbnails as a fixed user (`admin` in this example), use the following configuration:
|
||||
|
||||
```python
|
||||
from superset.tasks.types import FixedExecutor
|
||||
|
||||
THUMBNAIL_EXECUTORS = [FixedExecutor("admin")]
|
||||
```
|
||||
|
||||
For this feature you will need a cache system and celery workers. All thumbnails are stored on cache
|
||||
and are processed asynchronously by the workers.
|
||||
|
||||
An example config where images are stored on S3 could be:
|
||||
|
||||
```python
|
||||
from flask import Flask
|
||||
from s3cache.s3cache import S3Cache
|
||||
|
||||
...
|
||||
|
||||
class CeleryConfig(object):
|
||||
broker_url = "redis://localhost:6379/0"
|
||||
imports = (
|
||||
"superset.sql_lab",
|
||||
"superset.tasks.thumbnails",
|
||||
)
|
||||
result_backend = "redis://localhost:6379/0"
|
||||
worker_prefetch_multiplier = 10
|
||||
task_acks_late = True
|
||||
|
||||
|
||||
CELERY_CONFIG = CeleryConfig
|
||||
|
||||
def init_thumbnail_cache(app: Flask) -> S3Cache:
|
||||
return S3Cache("bucket_name", 'thumbs_cache/')
|
||||
|
||||
|
||||
THUMBNAIL_CACHE_CONFIG = init_thumbnail_cache
|
||||
```
|
||||
|
||||
Using the above example cache keys for dashboards will be `superset_thumb__dashboard__{ID}`. You can
|
||||
override the base URL for Selenium using:
|
||||
|
||||
```
|
||||
WEBDRIVER_BASEURL = "https://superset.company.com"
|
||||
```
|
||||
|
||||
To control which user account is used for rendering thumbnails and warming up caches, configure
|
||||
`THUMBNAIL_EXECUTORS` and `CACHE_WARMUP_EXECUTORS`. Each accepts a list of executor types (which
|
||||
resolve to an owner, creator, modifier, or the currently-logged-in user) and/or a `FixedExecutor`
|
||||
pinned to a specific username. By default, thumbnails render as the current user
|
||||
(`ExecutorType.CURRENT_USER`) and cache warmup runs as the chart/dashboard owner
|
||||
(`ExecutorType.OWNER`).
|
||||
|
||||
To force both to run as a dedicated service account (`admin` in this example):
|
||||
|
||||
```python
|
||||
from superset.tasks.types import ExecutorType, FixedExecutor
|
||||
|
||||
THUMBNAIL_EXECUTORS = [FixedExecutor("admin")]
|
||||
CACHE_WARMUP_EXECUTORS = [FixedExecutor("admin")]
|
||||
```
|
||||
|
||||
Use a dedicated read-only service account here rather than a personal admin account, so that
|
||||
thumbnail rendering and cache warmup tasks don't fail if a specific user's credentials change.
|
||||
|
||||
Additional Selenium WebDriver configuration can be set using `WEBDRIVER_CONFIGURATION`. You can
|
||||
implement a custom function to authenticate Selenium. The default function uses the `flask-login`
|
||||
session cookie. Here's an example of a custom function signature:
|
||||
|
||||
```python
|
||||
def auth_driver(driver: WebDriver, user: "User") -> WebDriver:
|
||||
pass
|
||||
```
|
||||
|
||||
Then on configuration:
|
||||
|
||||
```
|
||||
WEBDRIVER_AUTH_FUNC = auth_driver
|
||||
```
|
||||
|
||||
## ETag Support for Thumbnails
|
||||
|
||||
Thumbnail and screenshot endpoints return `ETag` response headers based on the cached content digest. Clients can use conditional requests to avoid downloading unchanged images:
|
||||
|
||||
```
|
||||
GET /api/v1/chart/42/thumbnail/
|
||||
If-None-Match: "abc123..."
|
||||
|
||||
→ 304 Not Modified (if unchanged)
|
||||
→ 200 OK (with new image if changed)
|
||||
```
|
||||
|
||||
This is particularly useful for embedded dashboards and external integrations that periodically poll for updated screenshots — unchanged thumbnails return immediately with no payload.
|
||||
|
||||
## Distributed Coordination Backend
|
||||
|
||||
Superset supports an optional distributed coordination (`DISTRIBUTED_COORDINATION_CONFIG`) for
|
||||
high-performance distributed operations. This configuration enables:
|
||||
|
||||
- **Distributed locking**: Moves lock operations from the metadata database to Redis, improving
|
||||
performance and reducing metastore load
|
||||
- **Real-time event notifications**: Enables instant pub/sub messaging for task abort signals and
|
||||
completion notifications instead of polling-based approaches
|
||||
|
||||
:::note
|
||||
This requires Redis or Valkey specifically—it uses Redis-specific features (pub/sub, `SET NX EX`)
|
||||
that are not available in general Flask-Caching backends.
|
||||
:::
|
||||
|
||||
### Configuration
|
||||
|
||||
The distributed coordination uses Flask-Caching style configuration for consistency with other cache
|
||||
backends. Configure `DISTRIBUTED_COORDINATION_CONFIG` in `superset_config.py`:
|
||||
|
||||
```python
|
||||
DISTRIBUTED_COORDINATION_CONFIG = {
|
||||
"CACHE_TYPE": "RedisCache",
|
||||
"CACHE_REDIS_HOST": "localhost",
|
||||
"CACHE_REDIS_PORT": 6379,
|
||||
"CACHE_REDIS_DB": 0,
|
||||
"CACHE_REDIS_PASSWORD": "", # Optional
|
||||
}
|
||||
```
|
||||
|
||||
For Redis Sentinel deployments:
|
||||
|
||||
```python
|
||||
DISTRIBUTED_COORDINATION_CONFIG = {
|
||||
"CACHE_TYPE": "RedisSentinelCache",
|
||||
"CACHE_REDIS_SENTINELS": [("sentinel1", 26379), ("sentinel2", 26379)],
|
||||
"CACHE_REDIS_SENTINEL_MASTER": "mymaster",
|
||||
"CACHE_REDIS_SENTINEL_PASSWORD": None, # Sentinel password (if different)
|
||||
"CACHE_REDIS_PASSWORD": "", # Redis password
|
||||
"CACHE_REDIS_DB": 0,
|
||||
}
|
||||
```
|
||||
|
||||
For SSL/TLS connections:
|
||||
|
||||
```python
|
||||
DISTRIBUTED_COORDINATION_CONFIG = {
|
||||
"CACHE_TYPE": "RedisCache",
|
||||
"CACHE_REDIS_HOST": "redis.example.com",
|
||||
"CACHE_REDIS_PORT": 6380,
|
||||
"CACHE_REDIS_SSL": True,
|
||||
"CACHE_REDIS_SSL_CERTFILE": "/path/to/client.crt",
|
||||
"CACHE_REDIS_SSL_KEYFILE": "/path/to/client.key",
|
||||
"CACHE_REDIS_SSL_CA_CERTS": "/path/to/ca.crt",
|
||||
}
|
||||
```
|
||||
|
||||
### Distributed Lock TTL
|
||||
|
||||
You can configure the default lock TTL (time-to-live) in seconds. Locks automatically expire after
|
||||
this duration to prevent deadlocks from crashed processes:
|
||||
|
||||
```python
|
||||
DISTRIBUTED_LOCK_DEFAULT_TTL = 30 # Default: 30 seconds
|
||||
```
|
||||
|
||||
Individual lock acquisitions can override this value when needed.
|
||||
|
||||
### Database-Only Mode
|
||||
|
||||
When `DISTRIBUTED_COORDINATION_CONFIG` is not configured, Superset uses database-backed operations:
|
||||
|
||||
- **Locking**: Uses the KeyValue table with periodic cleanup of expired entries
|
||||
- **Event notifications**: Uses database polling instead of pub/sub
|
||||
|
||||
While database-backed operations work reliably, the Redis backend is recommended for production
|
||||
deployments where low latency and reduced database load are important.
|
||||
|
||||
:::resources
|
||||
- [Blog: The Data Engineer's Guide to Lightning-Fast Superset Dashboards](https://preset.io/blog/the-data-engineers-guide-to-lightning-fast-apache-superset-dashboards/)
|
||||
- [Blog: Accelerating Dashboards with Materialized Views](https://preset.io/blog/accelerating-apache-superset-dashboards-with-materialized-views/)
|
||||
:::
|
||||
@@ -0,0 +1,477 @@
|
||||
---
|
||||
title: Configuring Superset
|
||||
hide_title: true
|
||||
sidebar_position: 1
|
||||
version: 1
|
||||
---
|
||||
|
||||
# Configuring Superset
|
||||
|
||||
## superset_config.py
|
||||
|
||||
Superset exposes hundreds of configurable parameters through its
|
||||
[config.py module](https://github.com/apache/superset/blob/master/superset/config.py). The
|
||||
variables and objects exposed act as a public interface of the bulk of what you may want
|
||||
to configure, alter and interface with. In this python module, you'll find all these
|
||||
parameters, sensible defaults, as well as rich documentation in the form of comments
|
||||
|
||||
To configure your application, you need to create your own configuration module, which
|
||||
will allow you to override few or many of these parameters. Instead of altering the core module,
|
||||
you'll want to define your own module (typically a file named `superset_config.py`).
|
||||
Add this file to your `PYTHONPATH` or create an environment variable
|
||||
`SUPERSET_CONFIG_PATH` specifying the full path of the `superset_config.py`.
|
||||
|
||||
For example, if deploying on Superset directly on a Linux-based system where your
|
||||
`superset_config.py` is under `/app` directory, you can run:
|
||||
|
||||
```bash
|
||||
export SUPERSET_CONFIG_PATH=/app/superset_config.py
|
||||
```
|
||||
|
||||
If you are using your own custom Dockerfile with the official Superset image as base image,
|
||||
then you can add your overrides as shown below:
|
||||
|
||||
```bash
|
||||
COPY --chown=superset superset_config.py /app/
|
||||
ENV SUPERSET_CONFIG_PATH /app/superset_config.py
|
||||
```
|
||||
|
||||
Docker compose deployments handle application configuration differently using specific conventions.
|
||||
Refer to the [docker compose tips & configuration](/admin-docs/installation/docker-compose#docker-compose-tips--configuration)
|
||||
for details.
|
||||
|
||||
The following is an example of just a few of the parameters you can set in your `superset_config.py` file:
|
||||
|
||||
```
|
||||
# Superset specific config
|
||||
ROW_LIMIT = 5000
|
||||
|
||||
# Flask App Builder configuration
|
||||
# Your App secret key will be used for securely signing the session cookie
|
||||
# and encrypting sensitive information on the database
|
||||
# Make sure you are changing this key for your deployment with a strong key.
|
||||
# Alternatively you can set it with `SUPERSET_SECRET_KEY` environment variable.
|
||||
# You MUST set this for production environments or the server will refuse
|
||||
# to start and you will see an error in the logs accordingly.
|
||||
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
|
||||
|
||||
# The SQLAlchemy connection string to your database backend
|
||||
# This connection defines the path to the database that stores your
|
||||
# superset metadata (slices, connections, tables, dashboards, ...).
|
||||
# Note that the connection information to connect to the datasources
|
||||
# you want to explore are managed directly in the web UI
|
||||
# The check_same_thread=false property ensures the sqlite client does not attempt
|
||||
# to enforce single-threaded access, which may be problematic in some edge cases
|
||||
SQLALCHEMY_DATABASE_URI = 'sqlite:////path/to/superset.db?check_same_thread=false'
|
||||
|
||||
# Flask-WTF flag for CSRF
|
||||
WTF_CSRF_ENABLED = True
|
||||
# Add endpoints that need to be exempt from CSRF protection
|
||||
WTF_CSRF_EXEMPT_LIST = []
|
||||
# A CSRF token that expires in 1 year
|
||||
WTF_CSRF_TIME_LIMIT = 60 * 60 * 24 * 365
|
||||
|
||||
# Set this API key to enable Mapbox visualizations
|
||||
MAPBOX_API_KEY = ''
|
||||
```
|
||||
|
||||
:::tip
|
||||
Note that it is typical to copy and paste [only] the portions of the
|
||||
core [superset/config.py](https://github.com/apache/superset/blob/master/superset/config.py) that
|
||||
you want to alter, along with the related comments into your own `superset_config.py` file.
|
||||
:::
|
||||
|
||||
All the parameters and default values defined
|
||||
in [superset/config.py](https://github.com/apache/superset/blob/master/superset/config.py)
|
||||
can be altered in your local `superset_config.py`. Administrators will want to read through the file
|
||||
to understand what can be configured locally as well as the default values in place.
|
||||
|
||||
Since `superset_config.py` acts as a Flask configuration module, it can be used to alter the
|
||||
settings of Flask itself, as well as Flask extensions that Superset bundles like
|
||||
`flask-wtf`, `flask-caching`, `flask-migrate`,
|
||||
and `flask-appbuilder`. Each one of these extensions offers intricate configurability.
|
||||
Flask App Builder, the web framework used by Superset, also offers many
|
||||
configuration settings. Please consult the
|
||||
[Flask App Builder Documentation](https://flask-appbuilder.readthedocs.org/en/latest/config.html)
|
||||
for more information on how to configure it.
|
||||
|
||||
At the very least, you'll want to change `SECRET_KEY` and `SQLALCHEMY_DATABASE_URI`. Continue reading for more about each of these.
|
||||
|
||||
## Specifying a SECRET_KEY
|
||||
|
||||
### Adding an initial SECRET_KEY
|
||||
|
||||
Superset requires a user-specified SECRET_KEY to start up. This requirement was [added in version 2.1.0 to force secure configurations](https://preset.io/blog/superset-security-update-default-secret_key-vulnerability/). Add a strong SECRET_KEY to your `superset_config.py` file like:
|
||||
|
||||
```python
|
||||
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
|
||||
```
|
||||
|
||||
You can generate a strong secure key with `openssl rand -base64 42`.
|
||||
|
||||
Alternatively, you can set the secret key using `SUPERSET_SECRET_KEY` environment variable:
|
||||
|
||||
On a Unix-based system, such as Linux or macOS, you can do so by running the following command in your terminal:
|
||||
|
||||
```bash
|
||||
export SUPERSET_SECRET_KEY=$(openssl rand -base64 42)
|
||||
```
|
||||
|
||||
:::caution Use a strong secret key
|
||||
This key will be used for securely signing session cookies and encrypting sensitive information stored in Superset's application metadata database.
|
||||
Your deployment must use a complex, unique key.
|
||||
:::
|
||||
|
||||
### Rotating to a newer SECRET_KEY
|
||||
|
||||
If you wish to change your existing SECRET_KEY, add the existing SECRET_KEY to your `superset_config.py` file as
|
||||
`PREVIOUS_SECRET_KEY =`and provide your new key as `SECRET_KEY =`. You can find your current SECRET_KEY with these
|
||||
commands - if running Superset with Docker, execute from within the Superset application container:
|
||||
|
||||
```python
|
||||
superset shell
|
||||
from flask import current_app; print(current_app.config["SECRET_KEY"])
|
||||
```
|
||||
|
||||
Save your `superset_config.py` with these values and then run `superset re-encrypt-secrets`.
|
||||
|
||||
## Setting up a production metadata database
|
||||
|
||||
Superset needs a database to store the information it manages, like the definitions of
|
||||
charts, dashboards, and many other things.
|
||||
|
||||
By default, Superset is configured to use [SQLite](https://www.sqlite.org/),
|
||||
a self-contained, single-file database that offers a simple and fast way to get started
|
||||
(without requiring any installation). However, for production environments,
|
||||
using SQLite is highly discouraged due to security, scalability, and data integrity reasons.
|
||||
It's important to use only the supported database engines and consider using a different
|
||||
database engine on a separate host or container.
|
||||
|
||||
Superset supports the following database engines/versions:
|
||||
|
||||
| Database Engine | Supported Versions |
|
||||
| ----------------------------------------- | ---------------------------------------- |
|
||||
| [PostgreSQL](https://www.postgresql.org/) | 10.X, 11.X, 12.X, 13.X, 14.X, 15.X, 16.X |
|
||||
| [MySQL](https://www.mysql.com/) | 5.7, 8.X |
|
||||
|
||||
Use the following database drivers and connection strings:
|
||||
|
||||
| Database | PyPI package | Connection String |
|
||||
| ----------------------------------------- | ------------------------- | ---------------------------------------------------------------------- |
|
||||
| [PostgreSQL](https://www.postgresql.org/) | `pip install psycopg2` | `postgresql://<UserName>:<DBPassword>@<Database Host>/<Database Name>` |
|
||||
| [MySQL](https://www.mysql.com/) | `pip install mysqlclient` | `mysql://<UserName>:<DBPassword>@<Database Host>/<Database Name>` |
|
||||
|
||||
:::tip
|
||||
Properly setting up metadata store is beyond the scope of this documentation. We recommend
|
||||
using a hosted managed service such as [Amazon RDS](https://aws.amazon.com/rds/) or
|
||||
[Google Cloud Databases](https://cloud.google.com/products/databases?hl=en) to handle
|
||||
service and supporting infrastructure and backup strategy.
|
||||
:::
|
||||
|
||||
To configure Superset metastore set `SQLALCHEMY_DATABASE_URI` config key on `superset_config`
|
||||
to the appropriate connection string.
|
||||
|
||||
## Running on a WSGI HTTP Server
|
||||
|
||||
While you can run Superset on NGINX or Apache, we recommend using Gunicorn in async mode. This
|
||||
enables impressive concurrency even and is fairly easy to install and configure. Please refer to the
|
||||
documentation of your preferred technology to set up this Flask WSGI application in a way that works
|
||||
well in your environment. Here’s an async setup known to work well in production:
|
||||
|
||||
```
|
||||
-w 10 \
|
||||
-k gevent \
|
||||
--worker-connections 1000 \
|
||||
--timeout 120 \
|
||||
-b 0.0.0.0:6666 \
|
||||
--limit-request-line 0 \
|
||||
--limit-request-field_size 0 \
|
||||
--statsd-host localhost:8125 \
|
||||
"superset.app:create_app()"
|
||||
```
|
||||
|
||||
Refer to the [Gunicorn documentation](https://docs.gunicorn.org/en/stable/design.html) for more
|
||||
information. _Note that the development web server (`superset run` or `flask run`) is not intended
|
||||
for production use._
|
||||
|
||||
If you're not using Gunicorn, you may want to disable the use of `flask-compress` by setting
|
||||
`COMPRESS_REGISTER = False` in your `superset_config.py`.
|
||||
|
||||
Currently, the Google BigQuery Python SDK is not compatible with `gevent`, due to some dynamic monkeypatching on python core library by `gevent`.
|
||||
So, when you use `BigQuery` datasource on Superset, you have to use `gunicorn` worker type except `gevent`.
|
||||
|
||||
## HTTPS Configuration
|
||||
|
||||
You can configure HTTPS upstream via a load balancer or a reverse proxy (such as nginx) and do SSL/TLS Offloading before traffic reaches the Superset application. In this setup, local traffic from a Celery worker taking a snapshot of a chart for Alerts & Reports can access Superset at a `http://` URL, from behind the ingress point.
|
||||
You can also configure [SSL in Gunicorn](https://docs.gunicorn.org/en/stable/settings.html#ssl) (the Python webserver) if you are using an official Superset Docker image.
|
||||
|
||||
## Configuration Behind a Load Balancer
|
||||
|
||||
If you are running superset behind a load balancer or reverse proxy (e.g. NGINX or ELB on AWS), you
|
||||
may need to utilize a healthcheck endpoint so that your load balancer knows if your superset
|
||||
instance is running. This is provided at `/health` which will return a 200 response containing “OK”
|
||||
if the webserver is running.
|
||||
|
||||
If the load balancer is inserting `X-Forwarded-For/X-Forwarded-Proto` headers, you should set
|
||||
`ENABLE_PROXY_FIX = True` in the superset config file (`superset_config.py`) to extract and use the
|
||||
headers.
|
||||
|
||||
In case the reverse proxy is used for providing SSL encryption, an explicit definition of the
|
||||
`X-Forwarded-Proto` may be required. For the Apache webserver this can be set as follows:
|
||||
|
||||
```
|
||||
RequestHeader set X-Forwarded-Proto "https"
|
||||
```
|
||||
|
||||
## Configuring the application root
|
||||
|
||||
*Please be advised that this feature is in BETA.*
|
||||
|
||||
Superset supports running the application under a non-root path. The root path
|
||||
prefix can be specified in one of three ways:
|
||||
|
||||
- Customizing the [Flask entrypoint](https://github.com/apache/superset/blob/master/superset/app.py#L29)
|
||||
by passing the `superset_app_root` variable; or
|
||||
- Setting the `SUPERSET_APP_ROOT` environment variable to the desired prefix; or
|
||||
- Setting the `APPLICATION_ROOT` config in your `superset_config.py` file.
|
||||
|
||||
Note, the prefix should start with a `/`.
|
||||
|
||||
### Customizing the Flask entrypoint
|
||||
|
||||
To configure a prefix, e.g `/analytics`, pass the `superset_app_root` argument to
|
||||
`create_app` when calling flask run either through the `FLASK_APP`
|
||||
environment variable:
|
||||
|
||||
```sh
|
||||
FLASK_APP="superset:create_app(superset_app_root='/analytics')"
|
||||
```
|
||||
|
||||
or as part of the `--app` argument to `flask run`:
|
||||
|
||||
```sh
|
||||
flask --app "superset.app:create_app(superset_app_root='/analytics')"
|
||||
```
|
||||
|
||||
### Docker builds
|
||||
|
||||
The [docker compose](/admin-docs/installation/docker-compose#configuring-further) developer
|
||||
configuration includes an additional environmental variable,
|
||||
[`SUPERSET_APP_ROOT`](https://github.com/apache/superset/blob/master/docker/.env),
|
||||
to simplify the process of setting up a non-default root path across the services.
|
||||
|
||||
In `docker/.env-local` set `SUPERSET_APP_ROOT` to the desired prefix and then bring the
|
||||
services up with `docker compose up --detach`.
|
||||
|
||||
## Custom OAuth2 Configuration
|
||||
|
||||
Superset is built on Flask-AppBuilder (FAB), which supports many providers out of the box
|
||||
(GitHub, Twitter, LinkedIn, Google, Azure, etc). Beyond those, Superset can be configured to connect
|
||||
with other OAuth2 Authorization Server implementations that support “code” authorization.
|
||||
|
||||
Make sure the pip package [`Authlib`](https://authlib.org/) is installed on the webserver.
|
||||
|
||||
First, configure authorization in Superset `superset_config.py`.
|
||||
|
||||
```python
|
||||
from flask_appbuilder.security.manager import AUTH_OAUTH
|
||||
|
||||
# Set the authentication type to OAuth
|
||||
AUTH_TYPE = AUTH_OAUTH
|
||||
|
||||
OAUTH_PROVIDERS = [
|
||||
{ 'name':'egaSSO',
|
||||
'token_key':'access_token', # Name of the token in the response of access_token_url
|
||||
'icon':'fa-address-card', # Icon for the provider
|
||||
'remote_app': {
|
||||
'client_id':'myClientId', # Client Id (Identify Superset application)
|
||||
'client_secret':'MySecret', # Secret for this Client Id (Identify Superset application)
|
||||
'client_kwargs':{
|
||||
'scope': 'read' # Scope for the Authorization
|
||||
},
|
||||
'access_token_method':'POST', # HTTP Method to call access_token_url
|
||||
'access_token_params':{ # Additional parameters for calls to access_token_url
|
||||
'client_id':'myClientId'
|
||||
},
|
||||
'jwks_uri':'https://myAuthorizationServe/adfs/discovery/keys', # may be required to generate token
|
||||
'access_token_headers':{ # Additional headers for calls to access_token_url
|
||||
'Authorization': 'Basic Base64EncodedClientIdAndSecret'
|
||||
},
|
||||
'api_base_url':'https://myAuthorizationServer/oauth2AuthorizationServer/',
|
||||
'access_token_url':'https://myAuthorizationServer/oauth2AuthorizationServer/token',
|
||||
'authorize_url':'https://myAuthorizationServer/oauth2AuthorizationServer/authorize'
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
# Will allow user self registration, allowing to create Flask users from Authorized User
|
||||
AUTH_USER_REGISTRATION = True
|
||||
|
||||
# The default user self registration role
|
||||
AUTH_USER_REGISTRATION_ROLE = "Public"
|
||||
```
|
||||
|
||||
In case you want to assign the `Admin` role on new user registration, it can be assigned as follows:
|
||||
```python
|
||||
AUTH_USER_REGISTRATION_ROLE = "Admin"
|
||||
```
|
||||
If you encounter the [issue](https://github.com/apache/superset/issues/13243) of not being able to list users from the Superset main page settings, although a newly registered user has an `Admin` role, please re-run `superset init` to sync the required permissions. Below is the command to re-run `superset init` using docker compose.
|
||||
```
|
||||
docker-compose exec superset superset init
|
||||
```
|
||||
|
||||
Then, create a `CustomSsoSecurityManager` that extends `SupersetSecurityManager` and overrides
|
||||
`oauth_user_info`:
|
||||
|
||||
```python
|
||||
import logging
|
||||
from superset.security import SupersetSecurityManager
|
||||
|
||||
class CustomSsoSecurityManager(SupersetSecurityManager):
|
||||
|
||||
def oauth_user_info(self, provider, response=None):
|
||||
logging.debug("Oauth2 provider: {0}.".format(provider))
|
||||
if provider == 'egaSSO':
|
||||
# As example, this line request a GET to base_url + '/' + userDetails with Bearer Authentication,
|
||||
# and expects that authorization server checks the token, and response with user details
|
||||
me = self.appbuilder.sm.oauth_remotes[provider].get('userDetails').data
|
||||
logging.debug("user_data: {0}".format(me))
|
||||
return { 'name' : me['name'], 'email' : me['email'], 'id' : me['user_name'], 'username' : me['user_name'], 'first_name':'', 'last_name':''}
|
||||
...
|
||||
```
|
||||
|
||||
This file must be located in the same directory as `superset_config.py` with the name
|
||||
`custom_sso_security_manager.py`. Finally, add the following 2 lines to `superset_config.py`:
|
||||
|
||||
```
|
||||
from custom_sso_security_manager import CustomSsoSecurityManager
|
||||
CUSTOM_SECURITY_MANAGER = CustomSsoSecurityManager
|
||||
```
|
||||
|
||||
**Notes**
|
||||
|
||||
- The redirect URL will be `https://<superset-webserver>/oauth-authorized/<provider-name>`
|
||||
When configuring an OAuth2 authorization provider if needed. For instance, the redirect URL will
|
||||
be `https://<superset-webserver>/oauth-authorized/egaSSO` for the above configuration.
|
||||
|
||||
- If an OAuth2 authorization server supports OpenID Connect 1.0, you could configure its configuration
|
||||
document URL only without providing `api_base_url`, `access_token_url`, `authorize_url` and other
|
||||
required options like user info endpoint, jwks uri etc. For instance:
|
||||
|
||||
```python
|
||||
OAUTH_PROVIDERS = [
|
||||
{ 'name':'egaSSO',
|
||||
'token_key':'access_token', # Name of the token in the response of access_token_url
|
||||
'icon':'fa-address-card', # Icon for the provider
|
||||
'remote_app': {
|
||||
'client_id':'myClientId', # Client Id (Identify Superset application)
|
||||
'client_secret':'MySecret', # Secret for this Client Id (Identify Superset application)
|
||||
'server_metadata_url': 'https://myAuthorizationServer/.well-known/openid-configuration'
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### PKCE Support
|
||||
|
||||
For public OAuth2 clients that cannot securely store a client secret, enable Proof Key for Code Exchange (PKCE) by adding `code_challenge_method` to the `remote_app` configuration:
|
||||
|
||||
```python
|
||||
OAUTH_PROVIDERS = [
|
||||
{
|
||||
'name': 'myProvider',
|
||||
'remote_app': {
|
||||
'client_id': 'myClientId',
|
||||
'client_secret': 'mySecret', # may be empty for pure public clients
|
||||
'code_challenge_method': 'S256', # enables PKCE
|
||||
'server_metadata_url': 'https://myAuthorizationServer/.well-known/openid-configuration'
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
PKCE (`S256`) is recommended for all OAuth2 flows, even when a client secret is present, as it protects against authorization code interception attacks.
|
||||
|
||||
## LDAP Authentication
|
||||
|
||||
FAB supports authenticating user credentials against an LDAP server.
|
||||
To use LDAP you must install the [python-ldap](https://www.python-ldap.org/en/latest/installing.html) package.
|
||||
See [FAB's LDAP documentation](https://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-ldap)
|
||||
for details.
|
||||
|
||||
## Mapping LDAP or OAUTH groups to Superset roles
|
||||
|
||||
AUTH_ROLES_MAPPING in Flask-AppBuilder is a dictionary that maps from LDAP/OAUTH group names to FAB roles.
|
||||
It is used to assign roles to users who authenticate using LDAP or OAuth.
|
||||
|
||||
### Mapping OAUTH groups to Superset roles
|
||||
|
||||
The following `AUTH_ROLES_MAPPING` dictionary would map the OAUTH group "superset_users" to the Superset roles "Gamma" as well as "Alpha", and the OAUTH group "superset_admins" to the Superset role "Admin".
|
||||
|
||||
```python
|
||||
AUTH_ROLES_MAPPING = {
|
||||
"superset_users": ["Gamma","Alpha"],
|
||||
"superset_admins": ["Admin"],
|
||||
}
|
||||
```
|
||||
|
||||
### Mapping LDAP groups to Superset roles
|
||||
|
||||
The following `AUTH_ROLES_MAPPING` dictionary would map the LDAP DN "cn=superset_users,ou=groups,dc=example,dc=com" to the Superset roles "Gamma" as well as "Alpha", and the LDAP DN "cn=superset_admins,ou=groups,dc=example,dc=com" to the Superset role "Admin".
|
||||
|
||||
```python
|
||||
AUTH_ROLES_MAPPING = {
|
||||
"cn=superset_users,ou=groups,dc=example,dc=com": ["Gamma","Alpha"],
|
||||
"cn=superset_admins,ou=groups,dc=example,dc=com": ["Admin"],
|
||||
}
|
||||
```
|
||||
|
||||
Note: This requires `AUTH_LDAP_SEARCH` to be set. For more details, please see the [FAB Security documentation](https://flask-appbuilder.readthedocs.io/en/latest/security.html).
|
||||
|
||||
### Syncing roles at login
|
||||
|
||||
You can also use the `AUTH_ROLES_SYNC_AT_LOGIN` configuration variable to control how often Flask-AppBuilder syncs the user's roles with the LDAP/OAUTH groups. If `AUTH_ROLES_SYNC_AT_LOGIN` is set to True, Flask-AppBuilder will sync the user's roles each time they log in. If `AUTH_ROLES_SYNC_AT_LOGIN` is set to False, Flask-AppBuilder will only sync the user's roles when they first register.
|
||||
|
||||
## Flask app Configuration Hook
|
||||
|
||||
`FLASK_APP_MUTATOR` is a configuration function that can be provided in your environment, receives
|
||||
the app object and can alter it in any way. For example, add `FLASK_APP_MUTATOR` into your
|
||||
`superset_config.py` to setup session cookie expiration time to 24 hours:
|
||||
|
||||
```python
|
||||
from flask import session
|
||||
from flask import Flask
|
||||
|
||||
|
||||
def make_session_permanent():
|
||||
'''
|
||||
Enable maxAge for the cookie 'session'
|
||||
'''
|
||||
session.permanent = True
|
||||
|
||||
# Set up max age of session to 24 hours
|
||||
PERMANENT_SESSION_LIFETIME = timedelta(hours=24)
|
||||
def FLASK_APP_MUTATOR(app: Flask) -> None:
|
||||
app.before_request_funcs.setdefault(None, []).append(make_session_permanent)
|
||||
```
|
||||
|
||||
## Feature Flags
|
||||
|
||||
To support a diverse set of users, Superset has some features that are not enabled by default. For
|
||||
example, some users have stronger security restrictions, while some others may not. So Superset
|
||||
allows users to enable or disable some features by config. For feature owners, you can add optional
|
||||
functionalities in Superset, but will be only affected by a subset of users.
|
||||
|
||||
You can enable or disable features with flag from `superset_config.py`:
|
||||
|
||||
```python
|
||||
FEATURE_FLAGS = {
|
||||
'PRESTO_EXPAND_DATA': False,
|
||||
}
|
||||
```
|
||||
|
||||
A current list of feature flags can be found in the [Feature Flags](/admin-docs/configuration/feature-flags) documentation.
|
||||
|
||||
:::resources
|
||||
- [Blog: Feature Flags in Apache Superset](https://preset.io/blog/feature-flags-in-apache-superset-and-preset/)
|
||||
:::
|
||||
@@ -0,0 +1,40 @@
|
||||
---
|
||||
title: Country Map Tools
|
||||
sidebar_position: 10
|
||||
version: 1
|
||||
---
|
||||
|
||||
import countriesData from '../../../data/countries.json';
|
||||
|
||||
# The Country Map Visualization
|
||||
|
||||
The Country Map visualization allows you to plot lightweight choropleth maps of
|
||||
your countries by province, states, or other subdivision types. It does not rely
|
||||
on any third-party map services but would require you to provide the
|
||||
[ISO-3166-2](https://en.wikipedia.org/wiki/ISO_3166-2) codes of your country's
|
||||
top-level subdivisions. Comparing to a province or state's full names, the ISO
|
||||
code is less ambiguous and is unique to all regions in the world.
|
||||
|
||||
## Included Maps
|
||||
|
||||
The current list of countries can be found in the src
|
||||
[legacy-plugin-chart-country-map/src/countries.ts](https://github.com/apache/superset/blob/master/superset-frontend/plugins/legacy-plugin-chart-country-map/src/countries.ts)
|
||||
|
||||
The Country Maps visualization already ships with the maps for the following countries:
|
||||
|
||||
<ul style={{columns: 3}}>
|
||||
{countriesData.countries.map((country, index) => (
|
||||
<li key={index}>{country}</li>
|
||||
))}
|
||||
</ul>
|
||||
|
||||
## Adding a New Country
|
||||
|
||||
To add a new country to the list, you'd have to edit files in
|
||||
[@superset-ui/legacy-plugin-chart-country-map](https://github.com/apache/superset/tree/master/superset-frontend/plugins/legacy-plugin-chart-country-map).
|
||||
|
||||
1. Generate a new GeoJSON file for your country following the guide in [this Jupyter notebook](https://github.com/apache/superset/blob/master/superset-frontend/plugins/legacy-plugin-chart-country-map/scripts/Country%20Map%20GeoJSON%20Generator.ipynb).
|
||||
2. Edit the countries list in [legacy-plugin-chart-country-map/src/countries.ts](https://github.com/apache/superset/blob/master/superset-frontend/plugins/legacy-plugin-chart-country-map/src/countries.ts).
|
||||
3. Install superset-frontend dependencies: `cd superset-frontend && npm install`
|
||||
4. Verify your countries in Superset plugins storybook: `npm run plugins:storybook`.
|
||||
5. Build and install Superset from source code.
|
||||
@@ -0,0 +1,62 @@
|
||||
---
|
||||
title: Event Logging
|
||||
sidebar_position: 9
|
||||
version: 1
|
||||
---
|
||||
|
||||
# Logging
|
||||
|
||||
## Event Logging
|
||||
|
||||
Superset by default logs special action events in its internal database (DBEventLogger). These logs can be accessed
|
||||
on the UI by navigating to **Security > Action Log**. You can freely customize these logs by
|
||||
implementing your own event log class.
|
||||
**When custom log class is enabled DBEventLogger is disabled and logs
|
||||
stop being populated in UI logs view.**
|
||||
To achieve both, custom log class should extend built-in DBEventLogger log class.
|
||||
|
||||
Here's an example of a simple JSON-to-stdout class:
|
||||
|
||||
```python
|
||||
def log(self, user_id, action, *args, **kwargs):
|
||||
records = kwargs.get('records', list())
|
||||
dashboard_id = kwargs.get('dashboard_id')
|
||||
slice_id = kwargs.get('slice_id')
|
||||
duration_ms = kwargs.get('duration_ms')
|
||||
referrer = kwargs.get('referrer')
|
||||
|
||||
for record in records:
|
||||
log = dict(
|
||||
action=action,
|
||||
json=record,
|
||||
dashboard_id=dashboard_id,
|
||||
slice_id=slice_id,
|
||||
duration_ms=duration_ms,
|
||||
referrer=referrer,
|
||||
user_id=user_id
|
||||
)
|
||||
print(json.dumps(log))
|
||||
```
|
||||
|
||||
End by updating your config to pass in an instance of the logger you want to use:
|
||||
|
||||
```
|
||||
EVENT_LOGGER = JSONStdOutEventLogger()
|
||||
```
|
||||
|
||||
## StatsD Logging
|
||||
|
||||
Superset can be configured to log events to [StatsD](https://github.com/statsd/statsd)
|
||||
if desired. Most endpoints hit are logged as
|
||||
well as key events like query start and end in SQL Lab.
|
||||
|
||||
To setup StatsD logging, it’s a matter of configuring the logger in your `superset_config.py`.
|
||||
If not already present, you need to ensure that the `statsd`-package is installed in Superset's python environment.
|
||||
|
||||
```python
|
||||
from superset.stats_logger import StatsdStatsLogger
|
||||
STATS_LOGGER = StatsdStatsLogger(host='localhost', port=8125, prefix='superset')
|
||||
```
|
||||
|
||||
Note that it’s also possible to implement your own logger by deriving
|
||||
`superset.stats_logger.BaseStatsLogger`.
|
||||
@@ -0,0 +1,107 @@
|
||||
---
|
||||
title: Feature Flags
|
||||
hide_title: true
|
||||
sidebar_position: 2
|
||||
version: 1
|
||||
---
|
||||
|
||||
import featureFlags from '@site/static/feature-flags.json';
|
||||
|
||||
export const FlagTable = ({flags}) => (
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Flag</th>
|
||||
<th>Default</th>
|
||||
<th>Description</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{flags.map((flag) => (
|
||||
<tr key={flag.name}>
|
||||
<td><code>{flag.name}</code></td>
|
||||
<td><code>{flag.default ? 'True' : 'False'}</code></td>
|
||||
<td>
|
||||
{flag.description}
|
||||
{flag.docs && (
|
||||
<> (<a href={flag.docs}>docs</a>)</>
|
||||
)}
|
||||
</td>
|
||||
</tr>
|
||||
))}
|
||||
</tbody>
|
||||
</table>
|
||||
);
|
||||
|
||||
# Feature Flags
|
||||
|
||||
Superset uses feature flags to control the availability of features. Feature flags allow
|
||||
gradual rollout of new functionality and provide a way to enable experimental features.
|
||||
|
||||
To enable a feature flag, add it to your `superset_config.py`:
|
||||
|
||||
```python
|
||||
FEATURE_FLAGS = {
|
||||
"ENABLE_TEMPLATE_PROCESSING": True,
|
||||
}
|
||||
```
|
||||
|
||||
## Lifecycle
|
||||
|
||||
Feature flags progress through lifecycle stages:
|
||||
|
||||
| Stage | Description |
|
||||
|-------|-------------|
|
||||
| **Development** | Experimental features under active development. May be incomplete or unstable. |
|
||||
| **Testing** | Feature complete but undergoing testing. Usable but may contain bugs. |
|
||||
| **Stable** | Production-ready features. Safe for all deployments. |
|
||||
| **Deprecated** | Features scheduled for removal. Migrate away from these. |
|
||||
|
||||
---
|
||||
|
||||
## Development
|
||||
|
||||
These features are experimental and under active development. Use only in development environments.
|
||||
|
||||
<FlagTable flags={featureFlags.flags.development} />
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
These features are complete but still being tested. They are usable but may have bugs.
|
||||
|
||||
<FlagTable flags={featureFlags.flags.testing} />
|
||||
|
||||
---
|
||||
|
||||
## Stable
|
||||
|
||||
These features are production-ready and safe to enable.
|
||||
|
||||
<FlagTable flags={featureFlags.flags.stable} />
|
||||
|
||||
---
|
||||
|
||||
## Deprecated
|
||||
|
||||
These features are scheduled for removal. Plan to migrate away from them.
|
||||
|
||||
<FlagTable flags={featureFlags.flags.deprecated} />
|
||||
|
||||
---
|
||||
|
||||
## Adding New Feature Flags
|
||||
|
||||
When adding a new feature flag to `superset/config.py`, include the following annotations:
|
||||
|
||||
```python
|
||||
# Description of what the feature does
|
||||
# @lifecycle: development | testing | stable | deprecated
|
||||
# @docs: https://superset.apache.org/docs/... (optional)
|
||||
# @category: runtime_config | path_to_deprecation (optional, for stable flags)
|
||||
"MY_NEW_FEATURE": False,
|
||||
```
|
||||
|
||||
This documentation is auto-generated from the annotations in
|
||||
[config.py](https://github.com/apache/superset/blob/master/superset/config.py).
|
||||
@@ -0,0 +1,157 @@
|
||||
---
|
||||
title: Importing and Exporting Datasources
|
||||
hide_title: true
|
||||
sidebar_position: 11
|
||||
version: 1
|
||||
---
|
||||
|
||||
# Importing and Exporting Datasources
|
||||
|
||||
The superset cli allows you to import and export datasources from and to YAML. Datasources include
|
||||
databases. The data is expected to be organized in the following hierarchy:
|
||||
|
||||
:::info
|
||||
Superset's ZIP-based import/export also covers **dashboards**, **charts**, and **saved queries**, exercised through the UI and REST API. The [Dashboard Import Overwrite Behavior](#dashboard-import-overwrite-behavior) and [UUIDs in API Responses](#uuids-in-api-responses) sections below document the behavior shared across all asset types.
|
||||
:::
|
||||
|
||||
```text
|
||||
├──databases
|
||||
| ├──database_1
|
||||
| | ├──table_1
|
||||
| | | ├──columns
|
||||
| | | | ├──column_1
|
||||
| | | | ├──column_2
|
||||
| | | | └──... (more columns)
|
||||
| | | └──metrics
|
||||
| | | ├──metric_1
|
||||
| | | ├──metric_2
|
||||
| | | └──... (more metrics)
|
||||
| | └── ... (more tables)
|
||||
| └── ... (more databases)
|
||||
```
|
||||
|
||||
:::note
|
||||
When you export a database connection, the `masked_encrypted_extra` field (used for sensitive connection parameters such as service account JSON, OAuth tokens, and other encrypted credentials) is included in the export. When importing on another instance, these values are decrypted and re-encrypted using the destination instance's `SECRET_KEY`. Ensure the receiving instance has a valid `SECRET_KEY` configured before importing.
|
||||
:::
|
||||
|
||||
## Exporting Datasources to YAML
|
||||
|
||||
You can print your current datasources to stdout by running:
|
||||
|
||||
```bash
|
||||
superset export_datasources
|
||||
```
|
||||
|
||||
To save your datasources to a ZIP file run:
|
||||
|
||||
```bash
|
||||
superset export_datasources -f <filename>
|
||||
```
|
||||
|
||||
By default, default (null) values will be omitted. Use the -d flag to include them. If you want back
|
||||
references to be included (e.g. a column to include the table id it belongs to) use the -b flag.
|
||||
|
||||
Alternatively, you can export datasources using the UI:
|
||||
|
||||
1. Open **Sources -> Databases** to export all tables associated to a single or multiple databases.
|
||||
(**Tables** for one or more tables)
|
||||
2. Select the items you would like to export.
|
||||
3. Click **Actions -> Export** to YAML
|
||||
4. If you want to import an item that you exported through the UI, you will need to nest it inside
|
||||
its parent element, e.g. a database needs to be nested under databases a table needs to be nested
|
||||
inside a database element.
|
||||
|
||||
In order to obtain an **exhaustive list of all fields** you can import using the YAML import run:
|
||||
|
||||
```bash
|
||||
superset export_datasource_schema
|
||||
```
|
||||
|
||||
As a reminder, you can use the `-b` flag to include back references.
|
||||
|
||||
## Importing Datasources
|
||||
|
||||
In order to import datasources from a ZIP file, run:
|
||||
|
||||
```bash
|
||||
superset import_datasources -p <path / filename>
|
||||
```
|
||||
|
||||
The optional username flag **-u** sets the user used for the datasource import. The default is 'admin'. Example:
|
||||
|
||||
```bash
|
||||
superset import_datasources -p <path / filename> -u 'admin'
|
||||
```
|
||||
|
||||
## Dashboard Import Overwrite Behavior
|
||||
|
||||
When importing a dashboard ZIP with the **overwrite** option enabled, any existing charts that are part of the dashboard are **replaced** rather than duplicated. This applies to:
|
||||
|
||||
- Charts whose UUID matches a chart already present in the target instance
|
||||
- The full chart configuration (query, visualization type, columns, metrics) is replaced by the imported version
|
||||
|
||||
If you import without the overwrite flag, existing charts with conflicting UUIDs are left unchanged and the import skips those objects. Use overwrite when you want to push a fully updated dashboard (including chart definitions) from a development or staging environment to production.
|
||||
|
||||
## UUIDs in API Responses
|
||||
|
||||
The REST API POST endpoints for **datasets**, **charts**, and **dashboards** include the auto-generated `uuid` field in the response body:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": 42,
|
||||
"uuid": "b8a8d5c3-1234-4abc-8def-0123456789ab",
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
UUIDs remain stable across import/export cycles and can be used for cross-environment workflows — for example, recording a UUID when creating a chart in development and using it to identify the matching chart after importing into production.
|
||||
|
||||
## Legacy Importing Datasources
|
||||
|
||||
### From older versions of Superset to current version
|
||||
|
||||
When using Superset version 4.x.x to import from an older version (2.x.x or 3.x.x) importing is supported as the command `legacy_import_datasources` and expects a JSON or directory of JSONs. The options are `-r` for recursive and `-u` for specifying a user. Example of legacy import without options:
|
||||
|
||||
```bash
|
||||
superset legacy_import_datasources -p <path or filename>
|
||||
```
|
||||
|
||||
### From older versions of Superset to older versions
|
||||
|
||||
When using an older Superset version (2.x.x & 3.x.x) of Superset, the command is `import_datasources`. ZIP and YAML files are supported and to switch between them the feature flag `VERSIONED_EXPORT` is used. When `VERSIONED_EXPORT` is `True`, `import_datasources` expects a ZIP file, otherwise YAML. Example:
|
||||
|
||||
```bash
|
||||
superset import_datasources -p <path or filename>
|
||||
```
|
||||
|
||||
When `VERSIONED_EXPORT` is `False`, if you supply a path all files ending with **yaml** or **yml** will be parsed. You can apply
|
||||
additional flags (e.g. to search the supplied path recursively):
|
||||
|
||||
```bash
|
||||
superset import_datasources -p <path> -r
|
||||
```
|
||||
|
||||
The sync flag **-s** takes parameters in order to sync the supplied elements with your file. Be
|
||||
careful this can delete the contents of your meta database. Example:
|
||||
|
||||
```bash
|
||||
superset import_datasources -p <path / filename> -s columns,metrics
|
||||
```
|
||||
|
||||
This will sync all metrics and columns for all datasources found in the `<path /filename>` in the
|
||||
Superset meta database. This means columns and metrics not specified in YAML will be deleted. If you
|
||||
would add tables to columns,metrics those would be synchronised as well.
|
||||
|
||||
If you don’t supply the sync flag (**-s**) importing will only add and update (override) fields.
|
||||
E.g. you can add a verbose_name to the column ds in the table random_time_series from the example
|
||||
datasets by saving the following YAML to file and then running the **import_datasources** command.
|
||||
|
||||
```yaml
|
||||
databases:
|
||||
- database_name: main
|
||||
tables:
|
||||
- table_name: random_time_series
|
||||
columns:
|
||||
- column_name: ds
|
||||
verbose_name: datetime
|
||||
```
|
||||
@@ -0,0 +1,78 @@
|
||||
---
|
||||
title: Map Tiles
|
||||
sidebar_position: 12
|
||||
version: 1
|
||||
---
|
||||
|
||||
# Map tiles
|
||||
|
||||
Superset uses OSM and Mapbox tiles by default. OSM is free but you still need setting your MAPBOX_API_KEY if you want to use mapbox maps.
|
||||
|
||||
## Setting map tiles
|
||||
|
||||
Map tiles can be set with `DECKGL_BASE_MAP` in your `superset_config.py` or `superset_config_docker.py`
|
||||
For adding your own map tiles, you can use the following format.
|
||||
|
||||
```python
|
||||
DECKGL_BASE_MAP = [
|
||||
['tile://https://your_personal_url/{z}/{x}/{y}.png', 'MyTile']
|
||||
]
|
||||
```
|
||||
Openstreetmap tiles url can be added without prefix.
|
||||
```python
|
||||
DECKGL_BASE_MAP = [
|
||||
['https://c.tile.openstreetmap.org/{z}/{x}/{y}.png', 'OpenStreetMap']
|
||||
]
|
||||
```
|
||||
|
||||
Default values are:
|
||||
```python
|
||||
DECKGL_BASE_MAP = [
|
||||
['https://tile.openstreetmap.org/{z}/{x}/{y}.png', 'Streets (OSM)'],
|
||||
['https://tile.osm.ch/osm-swiss-style/{z}/{x}/{y}.png', 'Topography (OSM)'],
|
||||
['mapbox://styles/mapbox/streets-v9', 'Streets'],
|
||||
['mapbox://styles/mapbox/dark-v9', 'Dark'],
|
||||
['mapbox://styles/mapbox/light-v9', 'Light'],
|
||||
['mapbox://styles/mapbox/satellite-streets-v9', 'Satellite Streets'],
|
||||
['mapbox://styles/mapbox/satellite-v9', 'Satellite'],
|
||||
['mapbox://styles/mapbox/outdoors-v9', 'Outdoors'],
|
||||
]
|
||||
```
|
||||
|
||||
It is possible to set only mapbox by removing osm tiles and other way around.
|
||||
|
||||
:::warning
|
||||
Setting `DECKGL_BASE_MAP` overwrite default values
|
||||
:::
|
||||
|
||||
After defining your map tiles, set them in these variables:
|
||||
- `CORS_OPTIONS`
|
||||
- `connect-src` of `TALISMAN_CONFIG` and `TALISMAN_CONFIG_DEV` variables.
|
||||
|
||||
```python
|
||||
ENABLE_CORS = True
|
||||
CORS_OPTIONS: dict[Any, Any] = {
|
||||
"origins": [
|
||||
"https://tile.openstreetmap.org",
|
||||
"https://tile.osm.ch",
|
||||
"https://your_personal_url/{z}/{x}/{y}.png",
|
||||
]
|
||||
}
|
||||
|
||||
.
|
||||
.
|
||||
|
||||
TALISMAN_CONFIG = {
|
||||
"content_security_policy": {
|
||||
...
|
||||
"connect-src": [
|
||||
"'self'",
|
||||
"https://api.mapbox.com",
|
||||
"https://events.mapbox.com",
|
||||
"https://tile.openstreetmap.org",
|
||||
"https://tile.osm.ch",
|
||||
"https://your_personal_url/{z}/{x}/{y}.png",
|
||||
],
|
||||
...
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,845 @@
|
||||
---
|
||||
title: MCP Server Deployment & Authentication
|
||||
hide_title: true
|
||||
sidebar_position: 14
|
||||
version: 1
|
||||
---
|
||||
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one
|
||||
or more contributor license agreements. See the NOTICE file
|
||||
distributed with this work for additional information
|
||||
regarding copyright ownership. The ASF licenses this file
|
||||
to you under the Apache License, Version 2.0 (the
|
||||
"License"); you may not use this file except in compliance
|
||||
with the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing,
|
||||
software distributed under the License is distributed on an
|
||||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations
|
||||
under the License.
|
||||
-->
|
||||
|
||||
# MCP Server Deployment & Authentication
|
||||
|
||||
Superset includes a built-in [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) server that lets AI assistants -- Claude, ChatGPT, and other MCP-compatible clients -- interact with your Superset instance. Through MCP, clients can list dashboards, query datasets, execute SQL, create charts, and more.
|
||||
|
||||
This guide covers how to run, secure, and deploy the MCP server.
|
||||
|
||||
:::tip Looking for user docs?
|
||||
See **[Using AI with Superset](/user-docs/using-superset/using-ai-with-superset)** for a guide on what AI can do with Superset and how to connect your AI client.
|
||||
:::
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
A["AI Client<br/>(Claude, ChatGPT, etc.)"] -- "MCP protocol<br/>(HTTP + JSON-RPC)" --> B["MCP Server<br/>(:5008/mcp)"]
|
||||
B -- "Superset context<br/>(app, db, RBAC)" --> C["Superset<br/>(:8088)"]
|
||||
C --> D[("Database<br/>(Postgres)")]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
Get the MCP server running locally and connect an AI client in three steps.
|
||||
|
||||
### 1. Start the MCP server
|
||||
|
||||
The MCP server runs as a separate process alongside Superset:
|
||||
|
||||
```bash
|
||||
superset mcp run --host 127.0.0.1 --port 5008
|
||||
```
|
||||
|
||||
| Flag | Default | Description |
|
||||
|------|---------|-------------|
|
||||
| `--host` | `127.0.0.1` | Host to bind to |
|
||||
| `--port` | `5008` | Port to bind to |
|
||||
| `--debug` | off | Enable debug logging |
|
||||
|
||||
The endpoint is available at `http://<host>:<port>/mcp`.
|
||||
|
||||
### 2. Set a development user
|
||||
|
||||
For local development, tell the MCP server which Superset user to impersonate (the user must already exist in your database):
|
||||
|
||||
```python
|
||||
# superset_config.py
|
||||
MCP_DEV_USERNAME = "admin"
|
||||
```
|
||||
|
||||
### 3. Connect an AI client
|
||||
|
||||
Point your MCP client at the server. For **Claude Desktop**, edit the config file:
|
||||
|
||||
- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
|
||||
- **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
|
||||
- **Linux**: `~/.config/Claude/claude_desktop_config.json`
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"superset": {
|
||||
"url": "http://localhost:5008/mcp"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Restart Claude Desktop. The hammer icon in the chat bar confirms the connection.
|
||||
|
||||
See [Connecting AI Clients](#connecting-ai-clients) for Claude Code, Claude Web, ChatGPT, and raw HTTP examples.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Apache Superset 5.0+ running and accessible
|
||||
- Python 3.10+
|
||||
- The `fastmcp` package (`pip install fastmcp`)
|
||||
|
||||
---
|
||||
|
||||
## Authentication
|
||||
|
||||
The MCP server supports multiple authentication methods depending on your deployment scenario.
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
R["Incoming MCP Request"] --> F{"MCP_AUTH_FACTORY<br/>set?"}
|
||||
F -- Yes --> CF["Custom Auth Provider"]
|
||||
F -- No --> AE{"MCP_AUTH_ENABLED?"}
|
||||
AE -- "True" --> JWT["JWT Validation"]
|
||||
AE -- "False" --> DU["Dev Mode<br/>(MCP_DEV_USERNAME)"]
|
||||
|
||||
JWT --> ALG{"MCP_JWT_ALGORITHM"}
|
||||
ALG -- "RS256 + JWKS" --> JWKS["Fetch keys from<br/>MCP_JWKS_URI"]
|
||||
ALG -- "RS256 + static" --> PK["Use<br/>MCP_JWT_PUBLIC_KEY"]
|
||||
ALG -- "HS256" --> SEC["Use<br/>MCP_JWT_SECRET"]
|
||||
|
||||
JWKS --> V["Validate token<br/>(exp, iss, aud, scopes)"]
|
||||
PK --> V
|
||||
SEC --> V
|
||||
V --> UR["Resolve Superset user<br/>from token claims"]
|
||||
UR --> OK["Authenticated request"]
|
||||
CF --> OK
|
||||
DU --> OK
|
||||
```
|
||||
|
||||
### Development Mode (No Auth)
|
||||
|
||||
Disable authentication and use a fixed user:
|
||||
|
||||
```python
|
||||
# superset_config.py
|
||||
MCP_AUTH_ENABLED = False
|
||||
MCP_DEV_USERNAME = "admin"
|
||||
```
|
||||
|
||||
All operations run as the configured user.
|
||||
|
||||
:::warning
|
||||
Never use development mode in production. Always enable authentication for any internet-facing deployment.
|
||||
:::
|
||||
|
||||
### JWT Authentication
|
||||
|
||||
For production, enable JWT-based authentication. The MCP server validates a Bearer token on every request.
|
||||
|
||||
#### Option A: RS256 with JWKS endpoint
|
||||
|
||||
The most common setup for OAuth 2.0 / OIDC providers that publish a JWKS (JSON Web Key Set) endpoint:
|
||||
|
||||
```python
|
||||
# superset_config.py
|
||||
MCP_AUTH_ENABLED = True
|
||||
MCP_JWT_ALGORITHM = "RS256"
|
||||
MCP_JWKS_URI = "https://your-identity-provider.com/.well-known/jwks.json"
|
||||
MCP_JWT_ISSUER = "https://your-identity-provider.com/"
|
||||
MCP_JWT_AUDIENCE = "your-superset-instance"
|
||||
```
|
||||
|
||||
#### Option B: RS256 with static public key
|
||||
|
||||
Use this when you have a fixed RSA key pair (e.g., self-signed tokens):
|
||||
|
||||
```python
|
||||
# superset_config.py
|
||||
MCP_AUTH_ENABLED = True
|
||||
MCP_JWT_ALGORITHM = "RS256"
|
||||
MCP_JWT_PUBLIC_KEY = """-----BEGIN PUBLIC KEY-----
|
||||
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA...
|
||||
-----END PUBLIC KEY-----"""
|
||||
MCP_JWT_ISSUER = "your-issuer"
|
||||
MCP_JWT_AUDIENCE = "your-audience"
|
||||
```
|
||||
|
||||
#### Option C: HS256 with shared secret
|
||||
|
||||
Use this when both the token issuer and the MCP server share a symmetric secret:
|
||||
|
||||
```python
|
||||
# superset_config.py
|
||||
MCP_AUTH_ENABLED = True
|
||||
MCP_JWT_ALGORITHM = "HS256"
|
||||
MCP_JWT_SECRET = "your-shared-secret-key"
|
||||
MCP_JWT_ISSUER = "your-issuer"
|
||||
MCP_JWT_AUDIENCE = "your-audience"
|
||||
```
|
||||
|
||||
:::warning
|
||||
Store `MCP_JWT_SECRET` securely. Never commit it to version control. Use environment variables:
|
||||
```python
|
||||
import os
|
||||
MCP_JWT_SECRET = os.environ.get("MCP_JWT_SECRET")
|
||||
```
|
||||
:::
|
||||
|
||||
#### JWT claims
|
||||
|
||||
The MCP server validates these standard claims:
|
||||
|
||||
| Claim | Config Key | Description |
|
||||
|-------|-----------|-------------|
|
||||
| `exp` | -- | Expiration time (always validated) |
|
||||
| `iss` | `MCP_JWT_ISSUER` | Token issuer (optional but recommended) |
|
||||
| `aud` | `MCP_JWT_AUDIENCE` | Token audience (optional but recommended) |
|
||||
| `sub` | -- | Subject -- primary claim used to resolve the Superset user |
|
||||
|
||||
#### User resolution
|
||||
|
||||
After validating the token, the MCP server resolves a Superset username from the claims. It checks these in order, using the first non-empty value:
|
||||
|
||||
1. `subject` -- the standard `sub` claim (via the access token object)
|
||||
2. `client_id` -- for machine-to-machine tokens
|
||||
3. `payload["sub"]` -- fallback to raw payload
|
||||
4. `payload["email"]` -- email-based lookup
|
||||
5. `payload["username"]` -- explicit username claim
|
||||
|
||||
The resolved value must match a `username` in the Superset `ab_user` table.
|
||||
|
||||
#### Scoped access
|
||||
|
||||
Require specific scopes in the JWT to limit what MCP operations a token can perform:
|
||||
|
||||
```python
|
||||
# superset_config.py
|
||||
MCP_REQUIRED_SCOPES = ["mcp:read", "mcp:write"]
|
||||
```
|
||||
|
||||
Only tokens that include **all** required scopes are accepted.
|
||||
|
||||
### Custom Auth Provider
|
||||
|
||||
For advanced scenarios (e.g., a proprietary auth system), provide a factory function. This takes precedence over all built-in JWT configuration:
|
||||
|
||||
```python
|
||||
# superset_config.py
|
||||
def my_custom_auth_factory(app):
|
||||
"""Return a FastMCP auth provider instance."""
|
||||
from fastmcp.server.auth.providers.jwt import JWTVerifier
|
||||
return JWTVerifier(
|
||||
jwks_uri="https://my-auth.example.com/.well-known/jwks.json",
|
||||
issuer="https://my-auth.example.com/",
|
||||
audience="superset-mcp",
|
||||
)
|
||||
|
||||
MCP_AUTH_FACTORY = my_custom_auth_factory
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Connecting AI Clients
|
||||
|
||||
### Claude Desktop
|
||||
|
||||
**Local development (no auth):**
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"superset": {
|
||||
"url": "http://localhost:5008/mcp"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**With JWT authentication:**
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"superset": {
|
||||
"command": "npx",
|
||||
"args": [
|
||||
"-y",
|
||||
"mcp-remote@latest",
|
||||
"http://your-superset-host:5008/mcp",
|
||||
"--header",
|
||||
"Authorization: Bearer YOUR_TOKEN"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Claude Code (CLI)
|
||||
|
||||
Add to your project's `.mcp.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"superset": {
|
||||
"type": "url",
|
||||
"url": "http://localhost:5008/mcp"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
With authentication:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"superset": {
|
||||
"type": "url",
|
||||
"url": "http://localhost:5008/mcp",
|
||||
"headers": {
|
||||
"Authorization": "Bearer YOUR_TOKEN"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Claude Web (claude.ai)
|
||||
|
||||
1. Open [claude.ai](https://claude.ai)
|
||||
2. Click the **+** button (or your profile icon)
|
||||
3. Select **Connectors**
|
||||
4. Click **Manage Connectors** > **Add custom connector**
|
||||
5. Enter a name and your MCP URL (e.g., `https://your-superset-host/mcp`)
|
||||
6. Click **Add**
|
||||
|
||||
:::info
|
||||
Custom connectors on Claude Web require a Pro, Max, Team, or Enterprise plan.
|
||||
:::
|
||||
|
||||
### ChatGPT
|
||||
|
||||
1. Click your profile icon > **Settings** > **Apps and Connectors**
|
||||
2. Enable **Developer Mode** in Advanced Settings
|
||||
3. In the chat composer, press **+** > **Add sources** > **App** > **Connect more** > **Create app**
|
||||
4. Enter a name and your MCP server URL
|
||||
5. Click **I understand and continue**
|
||||
|
||||
:::info
|
||||
ChatGPT MCP connectors require a Pro, Team, Enterprise, or Edu plan.
|
||||
:::
|
||||
|
||||
### Direct HTTP requests
|
||||
|
||||
Call the MCP server directly with any HTTP client:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:5008/mcp \
|
||||
-H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer YOUR_JWT_TOKEN' \
|
||||
-d '{"jsonrpc": "2.0", "method": "tools/list", "id": 1}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment
|
||||
|
||||
### Single Process
|
||||
|
||||
The simplest setup: run the MCP server alongside Superset on the same host.
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph host["Host / VM"]
|
||||
direction TB
|
||||
S["Superset<br/>:8088"] --> DB[("Postgres")]
|
||||
M["MCP Server<br/>:5008"] --> DB
|
||||
end
|
||||
C["AI Client"] -- "HTTPS" --> P["Reverse Proxy<br/>(Nginx / Caddy)"]
|
||||
U["Browser"] -- "HTTPS" --> P
|
||||
P -- ":8088" --> S
|
||||
P -- ":5008/mcp" --> M
|
||||
```
|
||||
|
||||
**superset_config.py:**
|
||||
|
||||
```python
|
||||
MCP_SERVICE_HOST = "0.0.0.0"
|
||||
MCP_SERVICE_PORT = 5008
|
||||
MCP_DEV_USERNAME = "admin" # or enable JWT auth
|
||||
|
||||
# If behind a reverse proxy, set the public-facing URL so
|
||||
# MCP-generated links (chart previews, SQL Lab URLs) resolve correctly:
|
||||
MCP_SERVICE_URL = "https://superset.example.com"
|
||||
```
|
||||
|
||||
**Start both processes:**
|
||||
|
||||
```bash
|
||||
# Terminal 1 -- Superset web server
|
||||
superset run -h 0.0.0.0 -p 8088
|
||||
|
||||
# Terminal 2 -- MCP server
|
||||
superset mcp run --host 0.0.0.0 --port 5008
|
||||
```
|
||||
|
||||
**Nginx reverse proxy with TLS:**
|
||||
|
||||
```nginx
|
||||
server {
|
||||
listen 443 ssl;
|
||||
server_name superset.example.com;
|
||||
|
||||
ssl_certificate /path/to/cert.pem;
|
||||
ssl_certificate_key /path/to/key.pem;
|
||||
|
||||
# Superset web UI
|
||||
location / {
|
||||
proxy_pass http://127.0.0.1:8088;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
}
|
||||
|
||||
# MCP endpoint
|
||||
location /mcp {
|
||||
proxy_pass http://127.0.0.1:5008/mcp;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header Authorization $http_authorization;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Docker Compose
|
||||
|
||||
Run Superset and the MCP server as separate containers sharing the same config:
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
services:
|
||||
superset:
|
||||
image: apache/superset:latest
|
||||
ports:
|
||||
- "8088:8088"
|
||||
volumes:
|
||||
- ./superset_config.py:/app/superset_config.py
|
||||
environment:
|
||||
- SUPERSET_CONFIG_PATH=/app/superset_config.py
|
||||
|
||||
mcp:
|
||||
image: apache/superset:latest
|
||||
command: ["superset", "mcp", "run", "--host", "0.0.0.0", "--port", "5008"]
|
||||
ports:
|
||||
- "5008:5008"
|
||||
volumes:
|
||||
- ./superset_config.py:/app/superset_config.py
|
||||
environment:
|
||||
- SUPERSET_CONFIG_PATH=/app/superset_config.py
|
||||
depends_on:
|
||||
- superset
|
||||
```
|
||||
|
||||
Both containers share the same `superset_config.py`, so authentication settings, database connections, and feature flags stay in sync.
|
||||
|
||||
### Multi-Pod (Kubernetes)
|
||||
|
||||
For high-availability deployments, configure Redis so that replicas share session state:
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
LB["Load Balancer"] --> M1["MCP Pod 1"]
|
||||
LB --> M2["MCP Pod 2"]
|
||||
LB --> M3["MCP Pod 3"]
|
||||
M1 --> R[("Redis<br/>(session store)")]
|
||||
M2 --> R
|
||||
M3 --> R
|
||||
M1 --> DB[("Postgres")]
|
||||
M2 --> DB
|
||||
M3 --> DB
|
||||
```
|
||||
|
||||
**superset_config.py:**
|
||||
|
||||
```python
|
||||
MCP_STORE_CONFIG = {
|
||||
"enabled": True,
|
||||
"CACHE_REDIS_URL": "redis://redis-host:6379/0",
|
||||
"event_store_max_events": 100,
|
||||
"event_store_ttl": 3600,
|
||||
}
|
||||
```
|
||||
|
||||
When `CACHE_REDIS_URL` is set, the MCP server uses a Redis-backed EventStore for session management, allowing replicas to share state. Without Redis, each pod manages its own in-memory sessions and stateful MCP interactions may fail when requests hit different replicas.
|
||||
|
||||
---
|
||||
|
||||
## Configuration Reference
|
||||
|
||||
All MCP settings go in `superset_config.py`. Defaults are defined in `superset/mcp_service/mcp_config.py`.
|
||||
|
||||
### Core
|
||||
|
||||
| Setting | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| `MCP_SERVICE_HOST` | `"localhost"` | Host the MCP server binds to |
|
||||
| `MCP_SERVICE_PORT` | `5008` | Port the MCP server binds to |
|
||||
| `MCP_SERVICE_URL` | `None` | Public base URL for MCP-generated links (set this when behind a reverse proxy) |
|
||||
| `MCP_DEBUG` | `False` | Enable debug logging |
|
||||
| `MCP_DEV_USERNAME` | -- | Superset username for development mode (no auth) |
|
||||
| `MCP_RBAC_ENABLED` | `True` | Enforce Superset's role-based access control on MCP tool calls. When `True`, each tool checks that the authenticated user has the required FAB permission before executing. Disable only for testing or trusted-network deployments. |
|
||||
|
||||
### Authentication
|
||||
|
||||
| Setting | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| `MCP_AUTH_ENABLED` | `False` | Enable JWT authentication |
|
||||
| `MCP_JWT_ALGORITHM` | `"RS256"` | JWT signing algorithm (`RS256` or `HS256`) |
|
||||
| `MCP_JWKS_URI` | `None` | JWKS endpoint URL (RS256) |
|
||||
| `MCP_JWT_PUBLIC_KEY` | `None` | Static RSA public key string (RS256) |
|
||||
| `MCP_JWT_SECRET` | `None` | Shared secret string (HS256) |
|
||||
| `MCP_JWT_ISSUER` | `None` | Expected `iss` claim |
|
||||
| `MCP_JWT_AUDIENCE` | `None` | Expected `aud` claim |
|
||||
| `MCP_REQUIRED_SCOPES` | `[]` | Required JWT scopes |
|
||||
| `MCP_JWT_DEBUG_ERRORS` | `False` | Log detailed JWT errors server-side (never exposed in HTTP responses per RFC 6750) |
|
||||
| `MCP_AUTH_FACTORY` | `None` | Custom auth provider factory `(flask_app) -> auth_provider`. Takes precedence over built-in JWT |
|
||||
| `MCP_USER_RESOLVER` | `None` | Custom function `(app, access_token) -> username` to extract a Superset username from a validated JWT token. When `None`, the default resolver checks `preferred_username`, `username`, `email`, and `sub` claims in that order. |
|
||||
|
||||
### Response Size Guard
|
||||
|
||||
Limits response sizes to prevent exceeding LLM context windows:
|
||||
|
||||
```python
|
||||
MCP_RESPONSE_SIZE_CONFIG = {
|
||||
"enabled": True,
|
||||
"token_limit": 25000,
|
||||
"warn_threshold_pct": 80,
|
||||
"excluded_tools": [
|
||||
"health_check",
|
||||
"get_chart_preview",
|
||||
"generate_explore_link",
|
||||
"open_sql_lab_with_context",
|
||||
],
|
||||
}
|
||||
```
|
||||
|
||||
| Key | Default | Description |
|
||||
|-----|---------|-------------|
|
||||
| `enabled` | `True` | Enable response size checking |
|
||||
| `token_limit` | `25000` | Maximum estimated token count per response |
|
||||
| `warn_threshold_pct` | `80` | Warn when response exceeds this percentage of the limit |
|
||||
| `excluded_tools` | See above | Tools exempt from size checking (e.g., tools that return URLs, not data) |
|
||||
|
||||
### Caching
|
||||
|
||||
Optional response caching for read-heavy workloads. Requires Redis when used with multiple replicas.
|
||||
|
||||
```python
|
||||
MCP_CACHE_CONFIG = {
|
||||
"enabled": False,
|
||||
"CACHE_KEY_PREFIX": None,
|
||||
"list_tools_ttl": 300, # 5 min
|
||||
"list_resources_ttl": 300,
|
||||
"list_prompts_ttl": 300,
|
||||
"read_resource_ttl": 3600, # 1 hour
|
||||
"get_prompt_ttl": 3600,
|
||||
"call_tool_ttl": 3600,
|
||||
"max_item_size": 1048576, # 1 MB
|
||||
"excluded_tools": [
|
||||
"execute_sql",
|
||||
"generate_dashboard",
|
||||
"generate_chart",
|
||||
"update_chart",
|
||||
],
|
||||
}
|
||||
```
|
||||
|
||||
| Key | Default | Description |
|
||||
|-----|---------|-------------|
|
||||
| `enabled` | `False` | Enable response caching |
|
||||
| `CACHE_KEY_PREFIX` | `None` | Optional prefix for cache keys (useful for shared Redis) |
|
||||
| `list_tools_ttl` | `300` | Cache TTL in seconds for `tools/list` |
|
||||
| `list_resources_ttl` | `300` | Cache TTL for `resources/list` |
|
||||
| `list_prompts_ttl` | `300` | Cache TTL for `prompts/list` |
|
||||
| `read_resource_ttl` | `3600` | Cache TTL for `resources/read` |
|
||||
| `get_prompt_ttl` | `3600` | Cache TTL for `prompts/get` |
|
||||
| `call_tool_ttl` | `3600` | Cache TTL for `tools/call` |
|
||||
| `max_item_size` | `1048576` | Maximum cached item size in bytes (1 MB) |
|
||||
| `excluded_tools` | See above | Tools that are never cached (mutating or non-deterministic) |
|
||||
|
||||
### Redis Store (Multi-Pod)
|
||||
|
||||
Enables Redis-backed session and event storage for multi-replica deployments:
|
||||
|
||||
```python
|
||||
MCP_STORE_CONFIG = {
|
||||
"enabled": False,
|
||||
"CACHE_REDIS_URL": None,
|
||||
"event_store_max_events": 100,
|
||||
"event_store_ttl": 3600,
|
||||
}
|
||||
```
|
||||
|
||||
| Key | Default | Description |
|
||||
|-----|---------|-------------|
|
||||
| `enabled` | `False` | Enable Redis-backed store |
|
||||
| `CACHE_REDIS_URL` | `None` | Redis connection URL (e.g., `redis://redis-host:6379/0`) |
|
||||
| `event_store_max_events` | `100` | Maximum events retained per session |
|
||||
| `event_store_ttl` | `3600` | Event TTL in seconds |
|
||||
|
||||
### Tool Search
|
||||
|
||||
By default the MCP server exposes a lightweight tool-search interface instead of advertising every tool at once. This reduces the initial context sent to the LLM by ~70%, which lowers cost and latency. The AI client discovers tools on demand by calling `search_tools` and then invokes them via `call_tool`.
|
||||
|
||||
```python
|
||||
MCP_TOOL_SEARCH_CONFIG = {
|
||||
"enabled": True,
|
||||
"strategy": "bm25", # "bm25" (natural language) or "regex"
|
||||
"max_results": 5,
|
||||
"always_visible": [ # Tools always listed (pinned)
|
||||
"health_check",
|
||||
"get_instance_info",
|
||||
],
|
||||
"search_tool_name": "search_tools",
|
||||
"call_tool_name": "call_tool",
|
||||
"include_schemas": False, # False=summary mode (name + parameters_hint)
|
||||
"compact_schemas": True, # Strip $defs (only applies when include_schemas=True)
|
||||
"max_description_length": 300,
|
||||
}
|
||||
```
|
||||
|
||||
| Key | Default | Description |
|
||||
|-----|---------|-------------|
|
||||
| `enabled` | `True` | Enable tool search. When `False`, all tools are listed upfront |
|
||||
| `strategy` | `"bm25"` | Search ranking algorithm. `"bm25"` supports natural language; `"regex"` supports pattern matching |
|
||||
| `max_results` | `5` | Maximum tools returned per search query |
|
||||
| `always_visible` | See above | Tools that always appear in `list_tools`, regardless of search |
|
||||
| `include_schemas` | `False` | When `False` (default, "summary mode"), search results omit `inputSchema` entirely and include a lightweight `parameters_hint` listing top-level parameter names. Set to `True` to include the full `inputSchema` in search results. Full schemas are always used when a tool is actually invoked via `call_tool`. |
|
||||
| `compact_schemas` | `True` | Strip `$defs` / `$ref` and replace with `{"type": "object"}` in search results to reduce token cost. Only takes effect when `include_schemas=True` — ignored in summary mode. |
|
||||
| `max_description_length` | `300` | Truncate tool descriptions in search results (0 = no truncation). Applies in both summary and full-schema modes. |
|
||||
|
||||
:::tip
|
||||
Set `enabled: False` to revert to the traditional "show all tools at once" behavior, which some clients or workflows may prefer.
|
||||
:::
|
||||
|
||||
Tool search reduces the initial token cost from ~15–20K tokens (full catalog) down to ~4–5K tokens (pinned tools + search interface) — roughly 85% savings at the start of each conversation.
|
||||
|
||||
### Session & CSRF
|
||||
|
||||
These values are flat-merged into the Flask app config used by the MCP server process:
|
||||
|
||||
```python
|
||||
MCP_SESSION_CONFIG = {
|
||||
"SESSION_COOKIE_HTTPONLY": True,
|
||||
"SESSION_COOKIE_SECURE": False,
|
||||
"SESSION_COOKIE_SAMESITE": "Lax",
|
||||
"SESSION_COOKIE_NAME": "superset_session",
|
||||
"PERMANENT_SESSION_LIFETIME": 86400,
|
||||
}
|
||||
|
||||
MCP_CSRF_CONFIG = {
|
||||
"WTF_CSRF_ENABLED": True,
|
||||
"WTF_CSRF_TIME_LIMIT": None,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Access Control
|
||||
|
||||
### RBAC Enforcement
|
||||
|
||||
The MCP server respects Superset's full role-based access control (RBAC). Every authenticated user can only access the data and operations their Superset roles permit — the same rules that apply in the Superset UI apply through MCP.
|
||||
|
||||
Each tool declares one or more required FAB permissions. The table below maps tool groups to their permission requirements:
|
||||
|
||||
| Tool group | Required FAB permission |
|
||||
|------------|------------------------|
|
||||
| `list_charts`, `get_chart_info`, `get_chart_data`, `get_chart_preview`, `generate_chart`, `update_chart` | `can_read` on `Chart` (read), `can_write` on `Chart` (mutate) |
|
||||
| `list_dashboards`, `get_dashboard_info`, `generate_dashboard`, `add_chart_to_existing_dashboard` | `can_read` on `Dashboard` (read), `can_write` on `Dashboard` (mutate) |
|
||||
| `list_datasets`, `get_dataset_info`, `create_virtual_dataset` | `can_read` on `Dataset` (read), `can_write` on `Dataset` (mutate) |
|
||||
| `list_databases`, `get_database_info` | `can_read` on `Database` |
|
||||
| `execute_sql` | `can_execute_sql_query` on `SQLLab` |
|
||||
| `open_sql_lab_with_context` | `can_read` on `SQLLab` |
|
||||
| `save_sql_query` | `can_write` on `SavedQuery` |
|
||||
| `health_check` | None (public) |
|
||||
|
||||
To disable RBAC checking globally (for trusted-network deployments or testing), set:
|
||||
|
||||
```python
|
||||
# superset_config.py
|
||||
MCP_RBAC_ENABLED = False
|
||||
```
|
||||
|
||||
:::warning
|
||||
Disabling RBAC removes all permission checks from MCP tool calls. Only do this on isolated, internal deployments where all MCP users are trusted admins.
|
||||
:::
|
||||
|
||||
### Audit Log
|
||||
|
||||
All MCP tool calls are recorded in Superset's action log. You can view them at **Settings → Action Log** (admin only). Each log entry records:
|
||||
|
||||
- The tool name (e.g., `mcp.generate_chart.db_write`)
|
||||
- The authenticated user
|
||||
- A timestamp
|
||||
|
||||
This makes MCP activity fully auditable alongside regular Superset activity. The action log uses the same event logger as the rest of Superset, so existing log ingestion pipelines (e.g., sending logs to Elasticsearch or a SIEM) capture MCP events automatically.
|
||||
|
||||
### Middleware Pipeline
|
||||
|
||||
Every MCP request passes through a middleware stack before reaching the tool function. The default stack (assembled in `build_middleware_list()` in `server.py`) is:
|
||||
|
||||
| Middleware | Purpose | Default |
|
||||
|------------|---------|---------|
|
||||
| `StructuredContentStripperMiddleware` | Strips `structuredContent` from responses for Claude.ai bridge compatibility | Enabled |
|
||||
| `LoggingMiddleware` | Logs each tool call with user, parameters, and duration | Enabled |
|
||||
| `GlobalErrorHandlerMiddleware` | Catches unhandled exceptions and sanitizes sensitive data before it reaches the client | Enabled |
|
||||
| `ResponseSizeGuardMiddleware` | Estimates token count, warns at 80% of limit, blocks at limit | Enabled (configurable via `MCP_RESPONSE_SIZE_CONFIG`) |
|
||||
| `ResponseCachingMiddleware` | Caches read-heavy tool responses (in-memory or Redis) | Disabled (enable via `MCP_CACHE_CONFIG`) |
|
||||
|
||||
Additional middleware classes (`RateLimitMiddleware`, `FieldPermissionsMiddleware`, `PrivateToolMiddleware`) are implemented in `superset/mcp_service/middleware.py` but are not added to the default pipeline. They are available for operators who want to layer them in via a custom startup path.
|
||||
|
||||
### Error Sanitization
|
||||
|
||||
The `GlobalErrorHandlerMiddleware` automatically redacts sensitive information from all error messages before they reach the LLM client. The following are replaced with generic messages:
|
||||
|
||||
- **Database connection strings** — replaced with a generic connection error message
|
||||
- **API keys and tokens** — redacted from error traces
|
||||
- **File system paths** — stripped to prevent information disclosure
|
||||
- **IP addresses** — removed from error context
|
||||
|
||||
This ensures that a misconfigured database connection or an unexpected exception never leaks credentials or internal topology to the LLM or its users. All regex patterns used for redaction are bounded to prevent ReDoS attacks.
|
||||
|
||||
---
|
||||
|
||||
## Performance
|
||||
|
||||
### Connection Pooling
|
||||
|
||||
Each MCP server process maintains its own SQLAlchemy connection pool to the database. For multi-worker deployments, total open connections = **workers × pool size**.
|
||||
|
||||
```python
|
||||
# superset_config.py
|
||||
SQLALCHEMY_POOL_SIZE = 5
|
||||
SQLALCHEMY_MAX_OVERFLOW = 10
|
||||
SQLALCHEMY_POOL_TIMEOUT = 30
|
||||
SQLALCHEMY_POOL_RECYCLE = 3600 # Recycle connections after 1 hour
|
||||
```
|
||||
|
||||
For a 3-pod Kubernetes deployment with the defaults above, expect up to 3 × (5 + 10) = 45 connections. Size your database's `max_connections` accordingly.
|
||||
|
||||
### Response Caching
|
||||
|
||||
Enable response caching for read-heavy workloads (dashboards/datasets that don't change frequently). With the in-memory backend (default when `MCP_STORE_CONFIG` is disabled), caching is per-process. Use Redis-backed caching for consistent cache hits across multiple pods:
|
||||
|
||||
```python
|
||||
MCP_CACHE_CONFIG = {"enabled": True, "call_tool_ttl": 3600}
|
||||
MCP_STORE_CONFIG = {"enabled": True, "CACHE_REDIS_URL": "redis://redis:6379/0"}
|
||||
```
|
||||
|
||||
Mutating tools (`generate_chart`, `update_chart`, `execute_sql`, `generate_dashboard`) are always excluded from caching regardless of this setting.
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Server won't start
|
||||
|
||||
- Verify `fastmcp` is installed: `pip install fastmcp`
|
||||
- Check that `MCP_DEV_USERNAME` is set if auth is disabled -- the server requires a user identity
|
||||
- Confirm the port is not already in use: `lsof -i :5008`
|
||||
|
||||
### 401 Unauthorized
|
||||
|
||||
- Verify your JWT token has not expired (`exp` claim)
|
||||
- Check that `MCP_JWT_ISSUER` and `MCP_JWT_AUDIENCE` match the token's `iss` and `aud` claims exactly
|
||||
- For RS256 with JWKS: confirm the JWKS URI is reachable from the MCP server
|
||||
- For RS256 with static key: confirm the public key string includes the `BEGIN`/`END` markers
|
||||
- For HS256: confirm the secret matches between the token issuer and `MCP_JWT_SECRET`
|
||||
- Enable `MCP_JWT_DEBUG_ERRORS = True` for detailed server-side logging (errors are never leaked to the client)
|
||||
|
||||
### Tool not found
|
||||
|
||||
- Ensure the MCP server and Superset share the same `superset_config.py`
|
||||
- Check server logs at startup -- tool registration errors are logged with the tool name and reason
|
||||
|
||||
### Client can't connect
|
||||
|
||||
- Verify the MCP server URL is reachable from the client machine
|
||||
- For Claude Desktop: fully quit the app (not just close the window) and restart after config changes
|
||||
- For remote access: ensure your firewall and reverse proxy allow traffic to the MCP port
|
||||
- Confirm the URL path ends with `/mcp` (e.g., `http://localhost:5008/mcp`)
|
||||
|
||||
### Permission errors on tool calls
|
||||
|
||||
- The MCP server enforces Superset's RBAC permissions -- the authenticated user must have the required roles
|
||||
- In development mode, ensure `MCP_DEV_USERNAME` maps to a user with appropriate roles (e.g., Admin)
|
||||
- Check `superset/security/manager.py` for the specific permission tuples required by each tool domain (e.g., `("can_execute_sql_query", "SQLLab")`)
|
||||
|
||||
### Response too large
|
||||
|
||||
- If a tool call returns an error about exceeding token limits, the response size guard is blocking an oversized result
|
||||
- Reduce `page_size` or `limit` parameters, use `select_columns` to exclude large fields, or add filters to narrow results
|
||||
- To adjust the threshold, change `token_limit` in `MCP_RESPONSE_SIZE_CONFIG`
|
||||
- To disable the guard entirely, set `MCP_RESPONSE_SIZE_CONFIG = {"enabled": False}`
|
||||
|
||||
---
|
||||
|
||||
## Audit Events
|
||||
|
||||
All MCP tool calls are logged to Superset's event logger, the same system used by the web UI (viewable at **Settings → Action Log**). Each event captures:
|
||||
|
||||
- **Action**: `mcp.<tool_name>.<phase>` (e.g., `mcp.list_databases.query`)
|
||||
- **User**: the resolved Superset username from the JWT or dev config
|
||||
- **Timestamp**: when the operation ran
|
||||
|
||||
This means MCP activity is auditable alongside normal user activity. No additional configuration is required — logging is on by default whenever the event logger is enabled in your Superset deployment.
|
||||
|
||||
## Tool Pagination
|
||||
|
||||
MCP list tools (`list_datasets`, `list_charts`, `list_dashboards`, `list_databases`) use **offset pagination** via `page` (1-based) and `page_size` parameters. Responses include `page`, `page_size`, `total_count`, `total_pages`, `has_previous`, and `has_next`. To iterate through all results:
|
||||
|
||||
```python
|
||||
# Example: fetch all charts across pages
|
||||
all_charts = []
|
||||
page = 1
|
||||
while True:
|
||||
result = mcp.list_charts(page=page, page_size=50)
|
||||
all_charts.extend(result["charts"])
|
||||
if not result.get("has_next"):
|
||||
break
|
||||
page += 1
|
||||
```
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
- **Use TLS** for all production MCP endpoints -- place the server behind a reverse proxy with HTTPS
|
||||
- **Enable JWT authentication** for any internet-facing deployment
|
||||
- **RBAC enforcement** -- The MCP server respects Superset's role-based access control. Users can only access data their roles permit
|
||||
- **Secrets management** -- Store `MCP_JWT_SECRET`, database credentials, and API keys in environment variables or a secrets manager, never in config files committed to version control
|
||||
- **Scoped tokens** -- Use `MCP_REQUIRED_SCOPES` to limit what operations a token can perform
|
||||
- **Network isolation** -- In Kubernetes, restrict MCP pod network policies to only allow traffic from your AI client endpoints
|
||||
- Review the **[Security documentation](/developer-docs/extensions/security)** for additional extension security guidance
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Using AI with Superset](/user-docs/using-superset/using-ai-with-superset)** -- What AI can do with Superset and how to get started
|
||||
- **[MCP Integration](/developer-docs/extensions/mcp)** -- Build custom MCP tools and prompts via Superset extensions
|
||||
- **[Security](/developer-docs/extensions/security)** -- Security best practices for extensions
|
||||
- **[Deployment](/developer-docs/extensions/deployment)** -- Package and deploy Superset extensions
|
||||
@@ -0,0 +1,171 @@
|
||||
---
|
||||
title: Network and Security Settings
|
||||
sidebar_position: 7
|
||||
version: 1
|
||||
---
|
||||
|
||||
# Network and Security Settings
|
||||
|
||||
## CORS
|
||||
|
||||
|
||||
:::note
|
||||
In Superset versions prior to `5.x` you have to install to install `flask-cors` with `pip install flask-cors` to enable CORS support.
|
||||
:::
|
||||
|
||||
|
||||
The following keys in `superset_config.py` can be specified to configure CORS:
|
||||
|
||||
- `ENABLE_CORS`: Must be set to `True` in order to enable CORS
|
||||
- `CORS_OPTIONS`: options passed to Flask-CORS
|
||||
([documentation](https://flask-cors.readthedocs.io/en/latest/api.html#extension))
|
||||
|
||||
## HTTP headers
|
||||
|
||||
Note that Superset bundles [flask-talisman](https://pypi.org/project/talisman/)
|
||||
Self-described as a small Flask extension that handles setting HTTP headers that can help
|
||||
protect against a few common web application security issues.
|
||||
|
||||
## HTML Embedding of Dashboards and Charts
|
||||
|
||||
There are two ways to embed a dashboard: Using the [SDK](https://www.npmjs.com/package/@superset-ui/embedded-sdk) or embedding a direct link. Note that in the latter case everybody who knows the link is able to access the dashboard.
|
||||
|
||||
### Embedding a Public Direct Link to a Dashboard
|
||||
|
||||
This works by first changing the content security policy (CSP) of [flask-talisman](https://github.com/GoogleCloudPlatform/flask-talisman) to allow for certain domains to display Superset content. Then a dashboard can be made publicly accessible, i.e. **bypassing authentication**. Once made public, the dashboard's URL can be added to an iframe in another website's HTML code.
|
||||
|
||||
#### Changing flask-talisman CSP
|
||||
|
||||
Add to `superset_config.py` the entire `TALISMAN_CONFIG` section from `config.py` and include a `frame-ancestors` section:
|
||||
|
||||
```python
|
||||
TALISMAN_ENABLED = True
|
||||
TALISMAN_CONFIG = {
|
||||
"content_security_policy": {
|
||||
...
|
||||
"frame-ancestors": ["*.my-domain.com", "*.another-domain.com"],
|
||||
...
|
||||
```
|
||||
|
||||
Restart Superset for this configuration change to take effect.
|
||||
|
||||
#### Making a Dashboard Public
|
||||
|
||||
There are two approaches to making dashboards publicly accessible:
|
||||
|
||||
**Option 1: Dataset-based access (simpler)**
|
||||
1. Set `PUBLIC_ROLE_LIKE = "Public"` in `superset_config.py`
|
||||
2. Grant the Public role access to the relevant datasets (Menu → Security → List Roles → Public)
|
||||
3. All published dashboards using those datasets become visible to anonymous users
|
||||
|
||||
**Option 2: Dashboard-level access (selective control)**
|
||||
1. Set `PUBLIC_ROLE_LIKE = "Public"` in `superset_config.py`
|
||||
2. Add the `'DASHBOARD_RBAC': True` [Feature Flag](/admin-docs/configuration/feature-flags)
|
||||
3. Edit each dashboard's properties and add the "Public" role
|
||||
4. Only dashboards with the Public role explicitly assigned are visible to anonymous users
|
||||
|
||||
See the [Public role documentation](/admin-docs/security/#public) for more details.
|
||||
|
||||
#### Embedding a Public Dashboard
|
||||
|
||||
Now anybody can directly access the dashboard's URL. You can embed it in an iframe like so:
|
||||
|
||||
```html
|
||||
<iframe
|
||||
width="600"
|
||||
height="400"
|
||||
seamless
|
||||
frameBorder="0"
|
||||
scrolling="no"
|
||||
src="https://superset.my-domain.com/superset/dashboard/10/?standalone=1&height=400"
|
||||
>
|
||||
</iframe>
|
||||
```
|
||||
|
||||
#### Embedding a Chart
|
||||
|
||||
A chart's embed code can be generated by going to a chart's edit view and then clicking at the top right on `...` > `Share` > `Embed code`
|
||||
|
||||
### Enabling Embedding via the SDK
|
||||
|
||||
Clicking on `...` next to `EDIT DASHBOARD` on the top right of the dashboard's overview page should yield a drop-down menu including the entry "Embed dashboard".
|
||||
|
||||
To enable this entry, add the following line to the `.env` file:
|
||||
|
||||
```text
|
||||
SUPERSET_FEATURE_EMBEDDED_SUPERSET=true
|
||||
```
|
||||
|
||||
### Hiding the Logout Button in Embedded Contexts
|
||||
|
||||
When Superset is embedded in an application that manages authentication via SSO (OAuth2, SAML, or JWT), the logout button should be hidden since session management is handled by the parent application.
|
||||
|
||||
To hide the logout button in embedded contexts, add to `superset_config.py`:
|
||||
|
||||
```python
|
||||
FEATURE_FLAGS = {
|
||||
"DISABLE_EMBEDDED_SUPERSET_LOGOUT": True,
|
||||
}
|
||||
```
|
||||
|
||||
This flag only hides the logout button when Superset detects it is running inside an iframe. Users accessing Superset directly (not embedded) will still see the logout button regardless of this setting.
|
||||
|
||||
:::note
|
||||
When embedding with SSO, also set `SESSION_COOKIE_SAMESITE = 'None'` and `SESSION_COOKIE_SECURE = True`. See [Security documentation](/admin-docs/security/securing_superset) for details.
|
||||
:::
|
||||
|
||||
## CSRF settings
|
||||
|
||||
Similarly, [flask-wtf](https://flask-wtf.readthedocs.io/en/0.15.x/config/) is used to manage
|
||||
some CSRF configurations. If you need to exempt endpoints from CSRF (e.g. if you are
|
||||
running a custom auth postback endpoint), you can add the endpoints to `WTF_CSRF_EXEMPT_LIST`:
|
||||
|
||||
## SSH Tunneling
|
||||
|
||||
1. Turn on feature flag
|
||||
- Change [`SSH_TUNNELING`](https://github.com/apache/superset/blob/eb8386e3f0647df6d1bbde8b42073850796cc16f/superset/config.py#L489) to `True`
|
||||
- If you want to add more security when establishing the tunnel we allow users to overwrite the `SSHTunnelManager` class [here](https://github.com/apache/superset/blob/eb8386e3f0647df6d1bbde8b42073850796cc16f/superset/config.py#L507)
|
||||
- You can also set the [`SSH_TUNNEL_LOCAL_BIND_ADDRESS`](https://github.com/apache/superset/blob/eb8386e3f0647df6d1bbde8b42073850796cc16f/superset/config.py#L508) this the host address where the tunnel will be accessible on your VPC
|
||||
|
||||
2. Create database w/ ssh tunnel enabled
|
||||
- With the feature flag enabled you should now see ssh tunnel toggle.
|
||||
- Click the toggle to enable SSH tunneling and add your credentials accordingly.
|
||||
- Superset allows for two different types of authentication (Basic + Private Key). These credentials should come from your service provider.
|
||||
|
||||
3. Verify data is flowing
|
||||
- Once SSH tunneling has been enabled, go to SQL Lab and write a query to verify data is properly flowing.
|
||||
|
||||
## Domain Sharding
|
||||
|
||||
:::note
|
||||
Domain Sharding is deprecated as of Superset 5.0.0, and will be removed in Superset 6.0.0. Please Enable HTTP2 to keep more open connections per domain.
|
||||
:::
|
||||
|
||||
Chrome allows up to 6 open connections per domain at a time. When there are more than 6 slices in
|
||||
dashboard, a lot of time fetch requests are queued up and wait for next available socket.
|
||||
[PR 5039](https://github.com/apache/superset/pull/5039) adds domain sharding to Superset,
|
||||
and this feature will be enabled by configuration only (by default Superset doesn’t allow
|
||||
cross-domain request).
|
||||
|
||||
Add the following setting in your `superset_config.py` file:
|
||||
|
||||
- `SUPERSET_WEBSERVER_DOMAINS`: list of allowed hostnames for domain sharding feature.
|
||||
|
||||
Please create your domain shards as subdomains of your main domain for authorization to
|
||||
work properly on new domains. For Example:
|
||||
|
||||
- `SUPERSET_WEBSERVER_DOMAINS=['superset-1.mydomain.com','superset-2.mydomain.com','superset-3.mydomain.com','superset-4.mydomain.com']`
|
||||
|
||||
or add the following setting in your `superset_config.py` file if domain shards are not subdomains of main domain.
|
||||
|
||||
- `SESSION_COOKIE_DOMAIN = '.mydomain.com'`
|
||||
|
||||
## Middleware
|
||||
|
||||
Superset allows you to add your own middleware. To add your own middleware, update the
|
||||
`ADDITIONAL_MIDDLEWARE` key in your `superset_config.py`. `ADDITIONAL_MIDDLEWARE` should be a list
|
||||
of your additional middleware classes.
|
||||
|
||||
For example, to use `AUTH_REMOTE_USER` from behind a proxy server like nginx, you have to add a
|
||||
simple middleware class to add the value of `HTTP_X_PROXY_REMOTE_USER` (or any other custom header
|
||||
from the proxy) to Gunicorn’s `REMOTE_USER` environment variable.
|
||||
@@ -0,0 +1,629 @@
|
||||
---
|
||||
title: SQL Templating
|
||||
hide_title: true
|
||||
sidebar_position: 5
|
||||
version: 1
|
||||
---
|
||||
|
||||
# SQL Templating
|
||||
|
||||
:::tip Looking to use SQL templating?
|
||||
For a user-focused guide on writing Jinja templates in SQL Lab and virtual datasets, see the [SQL Templating User Guide](/user-docs/using-superset/sql-templating). This page covers administrator configuration options.
|
||||
:::
|
||||
|
||||
## Jinja Templates
|
||||
|
||||
SQL Lab and Explore supports [Jinja templating](https://jinja.palletsprojects.com/en/2.11.x/) in queries.
|
||||
To enable templating, the `ENABLE_TEMPLATE_PROCESSING` [feature flag](/admin-docs/configuration/configuring-superset#feature-flags) needs to be enabled in `superset_config.py`.
|
||||
|
||||
:::warning[Security Warning]
|
||||
|
||||
While powerful, this feature executes template code on the server. Within the Superset security model, this is **intended functionality**, as users with permissions to edit charts and virtual datasets are considered **trusted users**.
|
||||
|
||||
If you grant these permissions to untrusted users, this feature can be exploited as a **Server-Side Template Injection (SSTI)** vulnerability. Do not enable `ENABLE_TEMPLATE_PROCESSING` unless you fully understand and accept the associated security risks.
|
||||
|
||||
Additionally:
|
||||
|
||||
- The `url_param()` macro allows URL parameters to influence the rendered SQL. Always validate or restrict `url_param()` values in your templates rather than interpolating them directly.
|
||||
- `filter.get('val')` returns raw filter values without escaping. Use the safe helpers described below (`|where_in`, `| replace("'", "''")`) rather than concatenating values directly into SQL strings.
|
||||
|
||||
:::
|
||||
|
||||
:::tip
|
||||
`ENABLE_TEMPLATE_PROCESSING` defaults to `False`. Only enable it if your deployment requires Jinja templates and all users with dataset/chart edit access are administrators or fully trusted internal users.
|
||||
:::
|
||||
|
||||
When templating is enabled, python code can be embedded in virtual datasets and
|
||||
in Custom SQL in the filter and metric controls in Explore. By default, the following variables are
|
||||
made available in the Jinja context:
|
||||
|
||||
- `columns`: columns which to group by in the query
|
||||
- `filter`: filters applied in the query
|
||||
- `from_dttm`: start `datetime` value from the selected time range (`None` if undefined). **Note:** Only available in virtual datasets when a time range filter is applied in Explore/Chart views—not available in standalone SQL Lab queries. (deprecated beginning in version 5.0, use `get_time_filter` instead)
|
||||
- `to_dttm`: end `datetime` value from the selected time range (`None` if undefined). **Note:** Only available in virtual datasets when a time range filter is applied in Explore/Chart views—not available in standalone SQL Lab queries. (deprecated beginning in version 5.0, use `get_time_filter` instead)
|
||||
- `groupby`: columns which to group by in the query (deprecated)
|
||||
- `metrics`: aggregate expressions in the query
|
||||
- `row_limit`: row limit of the query
|
||||
- `row_offset`: row offset of the query
|
||||
- `table_columns`: columns available in the dataset
|
||||
- `time_column`: temporal column of the query (`None` if undefined)
|
||||
- `time_grain`: selected time grain (`None` if undefined)
|
||||
|
||||
For example, to add a time range to a virtual dataset, you can write the following:
|
||||
|
||||
```sql
|
||||
SELECT *
|
||||
FROM tbl
|
||||
WHERE dttm_col > '{{ from_dttm }}' and dttm_col < '{{ to_dttm }}'
|
||||
```
|
||||
|
||||
You can also use [Jinja's logic](https://jinja.palletsprojects.com/en/2.11.x/templates/#tests)
|
||||
to make your query robust to clearing the timerange filter:
|
||||
|
||||
```sql
|
||||
SELECT *
|
||||
FROM tbl
|
||||
WHERE (
|
||||
{% if from_dttm is not none %}
|
||||
dttm_col > '{{ from_dttm }}' AND
|
||||
{% endif %}
|
||||
{% if to_dttm is not none %}
|
||||
dttm_col < '{{ to_dttm }}' AND
|
||||
{% endif %}
|
||||
1 = 1
|
||||
)
|
||||
```
|
||||
|
||||
The `1 = 1` at the end ensures a value is present for the `WHERE` clause even when
|
||||
the time filter is not set. For many database engines, this could be replaced with `true`.
|
||||
|
||||
Note that the Jinja parameters are called within _double_ brackets in the query and with
|
||||
_single_ brackets in the logic blocks.
|
||||
|
||||
### Understanding Context Availability
|
||||
|
||||
Some Jinja variables like `from_dttm`, `to_dttm`, and `filter` are **only available when a chart or dashboard provides them**. They are populated from:
|
||||
|
||||
- Time range filters applied in Explore/Chart views
|
||||
- Dashboard native filters
|
||||
- Filter components
|
||||
|
||||
**These variables are NOT available in standalone SQL Lab queries** because there's no filter context. If you try to use `{{ from_dttm }}` directly in SQL Lab, you'll get an "undefined parameter" error.
|
||||
|
||||
#### Testing Time-Filtered Queries in SQL Lab
|
||||
|
||||
To test queries that use time variables in SQL Lab, you have several options:
|
||||
|
||||
**Option 1: Use Jinja defaults (recommended)**
|
||||
|
||||
```sql
|
||||
SELECT *
|
||||
FROM tbl
|
||||
WHERE dttm_col > '{{ from_dttm | default("2024-01-01", true) }}'
|
||||
AND dttm_col < '{{ to_dttm | default("2024-12-31", true) }}'
|
||||
```
|
||||
|
||||
**Option 2: Use SQL Lab Parameters**
|
||||
|
||||
Set parameters in the SQL Lab UI (Parameters menu):
|
||||
```json
|
||||
{
|
||||
"from_dttm": "2024-01-01",
|
||||
"to_dttm": "2024-12-31"
|
||||
}
|
||||
```
|
||||
|
||||
**Option 3: Use `{% set %}` for testing**
|
||||
|
||||
```sql
|
||||
{% set from_dttm = "2024-01-01" %}
|
||||
{% set to_dttm = "2024-12-31" %}
|
||||
SELECT *
|
||||
FROM tbl
|
||||
WHERE dttm_col > '{{ from_dttm }}' AND dttm_col < '{{ to_dttm }}'
|
||||
```
|
||||
|
||||
:::tip
|
||||
When you save a SQL Lab query as a virtual dataset and use it in a chart with time filters,
|
||||
the actual filter values will override any defaults or test values you set.
|
||||
:::
|
||||
|
||||
To add custom functionality to the Jinja context, you need to overload the default Jinja
|
||||
context in your environment by defining the `JINJA_CONTEXT_ADDONS` in your superset configuration
|
||||
(`superset_config.py`). Objects referenced in this dictionary are made available for users to use
|
||||
where the Jinja context is made available.
|
||||
|
||||
```python
|
||||
JINJA_CONTEXT_ADDONS = {
|
||||
'my_crazy_macro': lambda x: x*2,
|
||||
}
|
||||
```
|
||||
|
||||
Default values for jinja templates can be specified via `Parameters` menu in the SQL Lab user interface.
|
||||
In the UI you can assign a set of parameters as JSON
|
||||
|
||||
```json
|
||||
{
|
||||
"my_table": "foo"
|
||||
}
|
||||
```
|
||||
|
||||
The parameters become available in your SQL (example: `SELECT * FROM {{ my_table }}` ) by using Jinja templating syntax.
|
||||
SQL Lab template parameters are stored with the dataset as `TEMPLATE PARAMETERS`.
|
||||
|
||||
There is a special ``_filters`` parameter which can be used to test filters used in the jinja template.
|
||||
|
||||
```json
|
||||
{
|
||||
"_filters": [
|
||||
{
|
||||
"col": "action_type",
|
||||
"op": "IN",
|
||||
"val": ["sell", "buy"]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
```sql
|
||||
SELECT action, count(*) as times
|
||||
FROM logs
|
||||
WHERE action in {{ filter_values('action_type')|where_in }}
|
||||
GROUP BY action
|
||||
```
|
||||
|
||||
Note ``_filters`` is not stored with the dataset. It's only used within the SQL Lab UI.
|
||||
|
||||
Besides default Jinja templating, SQL lab also supports self-defined template processor by setting
|
||||
the `CUSTOM_TEMPLATE_PROCESSORS` in your superset configuration. The values in this dictionary
|
||||
overwrite the default Jinja template processors of the specified database engine. The example below
|
||||
configures a custom presto template processor which implements its own logic of processing macro
|
||||
template with regex parsing. It uses the `$` style macro instead of `{{ }}` style in Jinja
|
||||
templating.
|
||||
|
||||
By configuring it with `CUSTOM_TEMPLATE_PROCESSORS`, a SQL template on a presto database is
|
||||
processed by the custom one rather than the default one.
|
||||
|
||||
```python
|
||||
def DATE(
|
||||
ts: datetime, day_offset: SupportsInt = 0, hour_offset: SupportsInt = 0
|
||||
) -> str:
|
||||
"""Current day as a string."""
|
||||
day_offset, hour_offset = int(day_offset), int(hour_offset)
|
||||
offset_day = (ts + timedelta(days=day_offset, hours=hour_offset)).date()
|
||||
return str(offset_day)
|
||||
|
||||
class CustomPrestoTemplateProcessor(PrestoTemplateProcessor):
|
||||
"""A custom presto template processor."""
|
||||
|
||||
engine = "presto"
|
||||
|
||||
def process_template(self, sql: str, **kwargs) -> str:
|
||||
"""Processes a sql template with $ style macro using regex."""
|
||||
# Add custom macros functions.
|
||||
macros = {
|
||||
"DATE": partial(DATE, datetime.utcnow())
|
||||
} # type: Dict[str, Any]
|
||||
# Update with macros defined in context and kwargs.
|
||||
macros.update(self.context)
|
||||
macros.update(kwargs)
|
||||
|
||||
def replacer(match):
|
||||
"""Expand $ style macros with corresponding function calls."""
|
||||
macro_name, args_str = match.groups()
|
||||
args = [a.strip() for a in args_str.split(",")]
|
||||
if args == [""]:
|
||||
args = []
|
||||
f = macros[macro_name[1:]]
|
||||
return f(*args)
|
||||
|
||||
macro_names = ["$" + name for name in macros.keys()]
|
||||
pattern = r"(%s)\s*\(([^()]*)\)" % "|".join(map(re.escape, macro_names))
|
||||
return re.sub(pattern, replacer, sql)
|
||||
|
||||
CUSTOM_TEMPLATE_PROCESSORS = {
|
||||
CustomPrestoTemplateProcessor.engine: CustomPrestoTemplateProcessor
|
||||
}
|
||||
```
|
||||
|
||||
SQL Lab also includes a live query validation feature with pluggable backends. You can configure
|
||||
which validation implementation is used with which database engine by adding a block like the
|
||||
following to your configuration file:
|
||||
|
||||
```python
|
||||
FEATURE_FLAGS = {
|
||||
'SQL_VALIDATORS_BY_ENGINE': {
|
||||
'presto': 'PrestoDBSQLValidator',
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The available validators and names can be found in
|
||||
[sql_validators](https://github.com/apache/superset/tree/master/superset/sql_validators).
|
||||
|
||||
## Available Macros
|
||||
|
||||
In this section, we'll walkthrough the pre-defined Jinja macros in Superset.
|
||||
|
||||
### Current Username
|
||||
|
||||
The `{{ current_username() }}` macro returns the `username` of the currently logged in user.
|
||||
|
||||
If you have caching enabled in your Superset configuration, then by default the `username` value will be used
|
||||
by Superset when calculating the cache key. A cache key is a unique identifier that determines if there's a
|
||||
cache hit in the future and Superset can retrieve cached data.
|
||||
|
||||
You can disable the inclusion of the `username` value in the calculation of the
|
||||
cache key by adding the following parameter to your Jinja code:
|
||||
|
||||
```python
|
||||
{{ current_username(add_to_cache_keys=False) }}
|
||||
```
|
||||
|
||||
### Current User ID
|
||||
|
||||
The `{{ current_user_id() }}` macro returns the account ID of the currently logged in user.
|
||||
|
||||
If you have caching enabled in your Superset configuration, then by default the account `id` value will be used
|
||||
by Superset when calculating the cache key. A cache key is a unique identifier that determines if there's a
|
||||
cache hit in the future and Superset can retrieve cached data.
|
||||
|
||||
You can disable the inclusion of the account `id` value in the calculation of the
|
||||
cache key by adding the following parameter to your Jinja code:
|
||||
|
||||
```python
|
||||
{{ current_user_id(add_to_cache_keys=False) }}
|
||||
```
|
||||
|
||||
### Current User Email
|
||||
|
||||
The `{{ current_user_email() }}` macro returns the email address of the currently logged in user.
|
||||
|
||||
If you have caching enabled in your Superset configuration, then by default the email address value will be used
|
||||
by Superset when calculating the cache key. A cache key is a unique identifier that determines if there's a
|
||||
cache hit in the future and Superset can retrieve cached data.
|
||||
|
||||
You can disable the inclusion of the email value in the calculation of the
|
||||
cache key by adding the following parameter to your Jinja code:
|
||||
|
||||
```python
|
||||
{{ current_user_email(add_to_cache_keys=False) }}
|
||||
```
|
||||
|
||||
### Current User Roles
|
||||
|
||||
The `{{ current_user_roles() }}` macro returns an array of roles for the logged in user.
|
||||
|
||||
If you have caching enabled in your Superset configuration, then by default the roles value will be used
|
||||
by Superset when calculating the cache key. A cache key is a unique identifier that determines if there's a
|
||||
cache hit in the future and Superset can retrieve cached data.
|
||||
|
||||
You can disable the inclusion of the roles value in the calculation of the
|
||||
cache key by adding the following parameter to your Jinja code:
|
||||
|
||||
```python
|
||||
{{ current_user_roles(add_to_cache_keys=False) }}
|
||||
```
|
||||
|
||||
You can json-stringify the array by adding `|tojson` to your Jinja code:
|
||||
```python
|
||||
{{ current_user_roles()|tojson }}
|
||||
```
|
||||
|
||||
You can use the `|where_in` filter to use your roles in a SQL statement. For example, if `current_user_roles()` returns `['admin', 'viewer']`, the following template:
|
||||
```python
|
||||
SELECT * FROM users WHERE role IN {{ current_user_roles()|where_in }}
|
||||
```
|
||||
|
||||
Will be rendered as:
|
||||
```sql
|
||||
SELECT * FROM users WHERE role IN ('admin', 'viewer')
|
||||
```
|
||||
|
||||
### Current User RLS Rules
|
||||
|
||||
The `{{ current_user_rls_rules() }}` macro returns an array of RLS rules applied to the current dataset for the logged in user.
|
||||
|
||||
If you have caching enabled in your Superset configuration, then the list of RLS Rules will be used
|
||||
by Superset when calculating the cache key. A cache key is a unique identifier that determines if there's a
|
||||
cache hit in the future and Superset can retrieve cached data.
|
||||
|
||||
### Custom URL Parameters
|
||||
|
||||
The `{{ url_param('custom_variable') }}` macro lets you define arbitrary URL
|
||||
parameters and reference them in your SQL code.
|
||||
|
||||
:::warning
|
||||
Always treat `url_param()` values as untrusted input. Escaping behaviour varies by context and configuration, so do not rely on it. Restrict values to an explicit allowlist before using them in SQL:
|
||||
|
||||
```sql
|
||||
{% set cc = url_param('countrycode') %}
|
||||
{% if cc not in ('US', 'ES', 'FR') %}{% set cc = 'US' %}{% endif %}
|
||||
WHERE country_code = '{{ cc }}'
|
||||
```
|
||||
:::
|
||||
|
||||
Here's a concrete example:
|
||||
|
||||
- You write the following query in SQL Lab:
|
||||
|
||||
```sql
|
||||
SELECT count(*)
|
||||
FROM ORDERS
|
||||
WHERE country_code = '{{ url_param('countrycode') }}'
|
||||
```
|
||||
|
||||
- You're hosting Superset at the domain www.example.com and you send your
|
||||
coworker in Spain the following SQL Lab URL `www.example.com/superset/sqllab?countrycode=ES`
|
||||
and your coworker in the USA the following SQL Lab URL `www.example.com/superset/sqllab?countrycode=US`
|
||||
- For your coworker in Spain, the SQL Lab query will be rendered as:
|
||||
|
||||
```sql
|
||||
SELECT count(*)
|
||||
FROM ORDERS
|
||||
WHERE country_code = 'ES'
|
||||
```
|
||||
|
||||
- For your coworker in the USA, the SQL Lab query will be rendered as:
|
||||
|
||||
```sql
|
||||
SELECT count(*)
|
||||
FROM ORDERS
|
||||
WHERE country_code = 'US'
|
||||
```
|
||||
|
||||
### Explicitly Including Values in Cache Key
|
||||
|
||||
The `{{ cache_key_wrapper() }}` function explicitly instructs Superset to add a value to the
|
||||
accumulated list of values used in the calculation of the cache key.
|
||||
|
||||
This function is only needed when you want to wrap your own custom function return values
|
||||
in the cache key. You can gain more context
|
||||
[here](https://github.com/apache/superset/blob/efd70077014cbed62e493372d33a2af5237eaadf/superset/jinja_context.py#L133-L148).
|
||||
|
||||
Note that this function powers the caching of the `user_id` and `username` values
|
||||
in the `current_user_id()` and `current_username()` function calls (if you have caching enabled).
|
||||
|
||||
### Filter Values
|
||||
|
||||
You can retrieve the value for a specific filter as a list using `{{ filter_values() }}`.
|
||||
|
||||
This is useful if:
|
||||
|
||||
- You want to use a filter component to filter a query where the name of filter component column doesn't match the one in the select statement
|
||||
- You want to have the ability to filter inside the main query for performance purposes
|
||||
|
||||
Here's a concrete example:
|
||||
|
||||
```sql
|
||||
SELECT action, count(*) as times
|
||||
FROM logs
|
||||
WHERE
|
||||
action in {{ filter_values('action_type')|where_in }}
|
||||
GROUP BY action
|
||||
```
|
||||
|
||||
There `where_in` filter converts the list of values from `filter_values('action_type')` into a string suitable for an `IN` expression.
|
||||
|
||||
### Filters for a Specific Column
|
||||
|
||||
The `{{ get_filters() }}` macro returns the filters applied to a given column. In addition to
|
||||
returning the values (similar to how `filter_values()` does), the `get_filters()` macro
|
||||
returns the operator specified in the Explore UI.
|
||||
|
||||
This is useful if:
|
||||
|
||||
- You want to handle more than the IN operator in your SQL clause
|
||||
- You want to handle generating custom SQL conditions for a filter
|
||||
- You want to have the ability to filter inside the main query for speed purposes
|
||||
|
||||
:::warning
|
||||
`filter.get('val')` returns the raw filter value without escaping. For multi-value filters, use the `|where_in` Jinja filter, which handles quoting safely. For single-value operators like `LIKE`, escape single quotes before interpolating:
|
||||
|
||||
```sql
|
||||
{%- if filter.get('op') == 'LIKE' -%}
|
||||
AND full_name LIKE '{{ filter.get('val') | replace("'", "''") }}'
|
||||
{%- endif -%}
|
||||
```
|
||||
:::
|
||||
|
||||
Here's a concrete example:
|
||||
|
||||
```sql
|
||||
WITH RECURSIVE
|
||||
superiors(employee_id, manager_id, full_name, level, lineage) AS (
|
||||
SELECT
|
||||
employee_id,
|
||||
manager_id,
|
||||
full_name,
|
||||
1 as level,
|
||||
employee_id as lineage
|
||||
FROM
|
||||
employees
|
||||
WHERE
|
||||
1=1
|
||||
|
||||
{# Render a blank line #}
|
||||
{%- for filter in get_filters('full_name', remove_filter=True) -%}
|
||||
|
||||
{%- if filter.get('op') == 'IN' -%}
|
||||
AND
|
||||
full_name IN {{ filter.get('val')|where_in }}
|
||||
{%- endif -%}
|
||||
|
||||
{%- if filter.get('op') == 'LIKE' -%}
|
||||
AND
|
||||
full_name LIKE '{{ filter.get('val') | replace("'", "''") }}'
|
||||
{%- endif -%}
|
||||
|
||||
{%- endfor -%}
|
||||
UNION ALL
|
||||
SELECT
|
||||
e.employee_id,
|
||||
e.manager_id,
|
||||
e.full_name,
|
||||
s.level + 1 as level,
|
||||
s.lineage
|
||||
FROM
|
||||
employees e,
|
||||
superiors s
|
||||
WHERE s.manager_id = e.employee_id
|
||||
)
|
||||
|
||||
SELECT
|
||||
employee_id, manager_id, full_name, level, lineage
|
||||
FROM
|
||||
superiors
|
||||
order by lineage, level
|
||||
```
|
||||
|
||||
### Time Filter
|
||||
|
||||
The `{{ get_time_filter() }}` macro returns the time filter applied to a specific column. This is useful if you want
|
||||
to handle time filters inside the virtual dataset, as by default the time filter is placed on the outer query. This can
|
||||
considerably improve performance, as many databases and query engines are able to optimize the query better
|
||||
if the temporal filter is placed on the inner query, as opposed to the outer query.
|
||||
|
||||
The macro takes the following parameters:
|
||||
|
||||
- `column`: Name of the temporal column. Leave undefined to reference the time range from a Dashboard Native Time Range
|
||||
filter (when present).
|
||||
- `default`: The default value to fall back to if the time filter is not present, or has the value `No filter`
|
||||
- `target_type`: The target temporal type as recognized by the target database (e.g. `TIMESTAMP`, `DATE` or
|
||||
`DATETIME`). If `column` is defined, the format will default to the type of the column. This is used to produce
|
||||
the format of the `from_expr` and `to_expr` properties of the returned `TimeFilter` object.
|
||||
- `strftime`: format using the `strftime` method of `datetime` for custom time formatting.
|
||||
([see docs for valid format codes](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes)).
|
||||
When defined `target_type` will be ignored.
|
||||
- `remove_filter`: When set to true, mark the filter as processed, removing it from the outer query. Useful when a
|
||||
filter should only apply to the inner query.
|
||||
|
||||
The return type has the following properties:
|
||||
|
||||
- `from_expr`: the start of the time filter (if any)
|
||||
- `to_expr`: the end of the time filter (if any)
|
||||
- `time_range`: The applied time range
|
||||
|
||||
Here's a concrete example using the `logs` table from the Superset metastore:
|
||||
|
||||
```
|
||||
{% set time_filter = get_time_filter("dttm", remove_filter=True) %}
|
||||
{% set from_expr = time_filter.from_expr %}
|
||||
{% set to_expr = time_filter.to_expr %}
|
||||
{% set time_range = time_filter.time_range %}
|
||||
SELECT
|
||||
*,
|
||||
'{{ time_range }}' as time_range
|
||||
FROM logs
|
||||
{% if from_expr or to_expr %}WHERE 1 = 1
|
||||
{% if from_expr %}AND dttm >= {{ from_expr }}{% endif %}
|
||||
{% if to_expr %}AND dttm < {{ to_expr }}{% endif %}
|
||||
{% endif %}
|
||||
```
|
||||
|
||||
Assuming we are creating a table chart with a simple `COUNT(*)` as the metric with a time filter `Last week` on the
|
||||
`dttm` column, this would render the following query on Postgres (note the formatting of the temporal filters, and
|
||||
the absence of time filters on the outer query):
|
||||
|
||||
```
|
||||
SELECT COUNT(*) AS count
|
||||
FROM
|
||||
(SELECT *,
|
||||
'Last week' AS time_range
|
||||
FROM public.logs
|
||||
WHERE 1 = 1
|
||||
AND dttm >= TO_TIMESTAMP('2024-08-27 00:00:00.000000', 'YYYY-MM-DD HH24:MI:SS.US')
|
||||
AND dttm < TO_TIMESTAMP('2024-09-03 00:00:00.000000', 'YYYY-MM-DD HH24:MI:SS.US')) AS virtual_table
|
||||
ORDER BY count DESC
|
||||
LIMIT 1000;
|
||||
```
|
||||
|
||||
When using the `default` parameter, the templated query can be simplified, as the endpoints will always be defined
|
||||
(to use a fixed time range, you can also use something like `default="2024-08-27 : 2024-09-03"`)
|
||||
|
||||
```
|
||||
{% set time_filter = get_time_filter("dttm", default="Last week", remove_filter=True) %}
|
||||
SELECT
|
||||
*,
|
||||
'{{ time_filter.time_range }}' as time_range
|
||||
FROM logs
|
||||
WHERE
|
||||
dttm >= {{ time_filter.from_expr }}
|
||||
AND dttm < {{ time_filter.to_expr }}
|
||||
```
|
||||
|
||||
### Datasets
|
||||
|
||||
It's possible to query physical and virtual datasets using the `dataset` macro. This is useful if you've defined computed columns and metrics on your datasets, and want to reuse the definition in adhoc SQL Lab queries.
|
||||
|
||||
To use the macro, first you need to find the ID of the dataset. This can be done by going to the view showing all the datasets, hovering over the dataset you're interested in, and looking at its URL. For example, if the URL for a dataset is https://superset.example.org/explore/?dataset_type=table&dataset_id=42 its ID is 42.
|
||||
|
||||
Once you have the ID you can query it as if it were a table:
|
||||
|
||||
```sql
|
||||
SELECT * FROM {{ dataset(42) }} LIMIT 10
|
||||
```
|
||||
|
||||
If you want to select the metric definitions as well, in addition to the columns, you need to pass an additional keyword argument:
|
||||
|
||||
```sql
|
||||
SELECT * FROM {{ dataset(42, include_metrics=True) }} LIMIT 10
|
||||
```
|
||||
|
||||
Since metrics are aggregations, the resulting SQL expression will be grouped by all non-metric columns. You can specify a subset of columns to group by instead:
|
||||
|
||||
```sql
|
||||
SELECT * FROM {{ dataset(42, include_metrics=True, columns=["ds", "category"]) }} LIMIT 10
|
||||
```
|
||||
|
||||
### Metrics
|
||||
|
||||
The `{{ metric('metric_key', dataset_id) }}` macro can be used to retrieve the metric SQL syntax from a dataset. This can be useful for different purposes:
|
||||
|
||||
- Override the metric label in the chart level
|
||||
- Combine multiple metrics in a calculation
|
||||
- Retrieve a metric syntax in SQL lab
|
||||
- Re-use metrics across datasets
|
||||
|
||||
This macro avoids copy/paste, allowing users to centralize the metric definition in the dataset layer.
|
||||
|
||||
The `dataset_id` parameter is optional, and if not provided Superset will use the current dataset from context (for example, when using this macro in the Chart Builder, by default the `macro_key` will be searched in the dataset powering the chart).
|
||||
The parameter can be used in SQL Lab, or when fetching a metric from another dataset.
|
||||
|
||||
## Available Filters
|
||||
|
||||
Superset supports [builtin filters from the Jinja2 templating package](https://jinja.palletsprojects.com/en/stable/templates/#builtin-filters). Custom filters have also been implemented:
|
||||
|
||||
### Where In
|
||||
Parses a list into a SQL-compatible statement. This is useful with macros that return an array (for example the `filter_values` macro):
|
||||
|
||||
```
|
||||
Dashboard filter with "First", "Second" and "Third" options selected
|
||||
{{ filter_values('column') }} => ["First", "Second", "Third"]
|
||||
{{ filter_values('column')|where_in }} => ('First', 'Second', 'Third')
|
||||
```
|
||||
|
||||
By default, this filter returns `()` (as a string) in case the value is null. The `default_to_none` parameter can be se to `True` to return null in this case:
|
||||
|
||||
```
|
||||
Dashboard filter without any value applied
|
||||
{{ filter_values('column') }} => ()
|
||||
{{ filter_values('column')|where_in(default_to_none=True) }} => None
|
||||
```
|
||||
|
||||
### To Datetime
|
||||
|
||||
Loads a string as a `datetime` object. This is useful when performing date operations. For example:
|
||||
```
|
||||
{% set from_expr = get_time_filter("dttm", strftime="%Y-%m-%d").from_expr %}
|
||||
{% set to_expr = get_time_filter("dttm", strftime="%Y-%m-%d").to_expr %}
|
||||
{% if (to_expr|to_datetime(format="%Y-%m-%d") - from_expr|to_datetime(format="%Y-%m-%d")).days > 100 %}
|
||||
do something
|
||||
{% else %}
|
||||
do something else
|
||||
{% endif %}
|
||||
```
|
||||
|
||||
:::resources
|
||||
- [Blog: Intro to Jinja Templating in Apache Superset](https://preset.io/blog/intro-jinja-templating-apache-superset/)
|
||||
:::
|
||||
@@ -0,0 +1,448 @@
|
||||
---
|
||||
title: Theming
|
||||
hide_title: true
|
||||
sidebar_position: 12
|
||||
version: 1
|
||||
---
|
||||
# Theming Superset
|
||||
|
||||
:::note
|
||||
`apache-superset>=6.0`
|
||||
:::
|
||||
|
||||
Superset now rides on **Ant Design v5's token-based theming**.
|
||||
Every Antd token works, plus a handful of Superset-specific ones for charts and dashboard chrome.
|
||||
|
||||
## Managing Themes via UI
|
||||
|
||||
Superset includes a built-in **Theme Management** interface accessible from the admin menu under **Settings > Themes**.
|
||||
|
||||
### Creating a New Theme
|
||||
|
||||
1. Navigate to **Settings > Themes** in the Superset interface
|
||||
2. Click **+ Theme** to create a new theme
|
||||
3. Use the [Ant Design Theme Editor](https://ant.design/theme-editor) to design your theme:
|
||||
- Design your palette, typography, and component overrides
|
||||
- Open the `CONFIG` modal and copy the JSON configuration
|
||||
4. Paste the JSON into the theme definition field in Superset
|
||||
5. Give your theme a descriptive name and save
|
||||
|
||||
You can also extend with Superset-specific tokens (documented in the default theme object) before you import.
|
||||
|
||||
### System Theme Administration
|
||||
|
||||
When `ENABLE_UI_THEME_ADMINISTRATION = True` is configured, administrators can manage system-wide themes directly from the UI:
|
||||
|
||||
#### Setting System Themes
|
||||
- **System Default Theme**: Click the sun icon on any theme to set it as the system-wide default
|
||||
- **System Dark Theme**: Click the moon icon on any theme to set it as the system dark mode theme
|
||||
- **Automatic OS Detection**: When both default and dark themes are set, Superset automatically detects and applies the appropriate theme based on OS preferences
|
||||
|
||||
#### Managing System Themes
|
||||
- System themes are indicated with special badges in the theme list
|
||||
- Only administrators with write permissions can modify system theme settings
|
||||
- Removing a system theme designation reverts to configuration file defaults
|
||||
|
||||
### Applying Themes to Dashboards
|
||||
|
||||
Once created, themes can be applied to individual dashboards:
|
||||
- Edit any dashboard and select your custom theme from the theme dropdown
|
||||
- Each dashboard can have its own theme, allowing for branded or context-specific styling
|
||||
|
||||
## Configuration Options
|
||||
|
||||
### Python Configuration
|
||||
|
||||
Configure theme behavior via `superset_config.py`:
|
||||
|
||||
```python
|
||||
# Enable UI-based theme administration for admins
|
||||
ENABLE_UI_THEME_ADMINISTRATION = True
|
||||
|
||||
# Optional: Set initial default themes via configuration
|
||||
# These can be overridden via the UI when ENABLE_UI_THEME_ADMINISTRATION = True
|
||||
THEME_DEFAULT = {
|
||||
"token": {
|
||||
"colorPrimary": "#2893B3",
|
||||
"colorSuccess": "#5ac189",
|
||||
# ... your theme JSON configuration
|
||||
}
|
||||
}
|
||||
|
||||
# Optional: Dark theme configuration
|
||||
THEME_DARK = {
|
||||
"algorithm": "dark",
|
||||
"token": {
|
||||
"colorPrimary": "#2893B3",
|
||||
# ... your dark theme overrides
|
||||
}
|
||||
}
|
||||
|
||||
# To force a single theme on all users, set THEME_DARK = None
|
||||
# When both themes are defined (via UI or config):
|
||||
# - Users can manually switch between themes
|
||||
# - OS preference detection is automatically enabled
|
||||
```
|
||||
|
||||
### App Branding
|
||||
|
||||
The application name shown in the browser title bar and navigation can be
|
||||
set via the `brandAppName` theme token:
|
||||
|
||||
```python
|
||||
THEME_DEFAULT = {
|
||||
"token": {
|
||||
"brandAppName": "Acme Analytics",
|
||||
# ... other tokens
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Or in the theme CRUD UI JSON editor:
|
||||
|
||||
```json
|
||||
{
|
||||
"token": {
|
||||
"brandAppName": "Acme Analytics"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The existing `APP_NAME` Python config key continues to work for backward compatibility.
|
||||
`brandAppName` takes precedence when both are set, and allows different themes to carry different brand names.
|
||||
Email and alert/report notification subjects are driven by backend settings such as
|
||||
`EMAIL_REPORTS_SUBJECT_PREFIX` and `APP_NAME`, not by this theme token.
|
||||
|
||||
### Migration from Configuration to UI
|
||||
|
||||
When `ENABLE_UI_THEME_ADMINISTRATION = True`:
|
||||
|
||||
1. System themes set via the UI take precedence over configuration file settings
|
||||
2. The UI shows which themes are currently set as system defaults
|
||||
3. Administrators can change system themes without restarting Superset
|
||||
4. Configuration file themes serve as fallbacks when no UI themes are set
|
||||
|
||||
### Copying Themes Between Systems
|
||||
|
||||
To export a theme for use in configuration files or another instance:
|
||||
|
||||
1. Navigate to **Settings > Themes** and click the export icon on your desired theme
|
||||
2. Extract the JSON configuration from the exported YAML file
|
||||
3. Use this JSON in your `superset_config.py` or import it into another Superset instance
|
||||
|
||||
## Theme Development Workflow
|
||||
|
||||
1. **Design**: Use the [Ant Design Theme Editor](https://ant.design/theme-editor) to iterate on your design
|
||||
2. **Test**: Create themes in Superset's CRUD interface for testing
|
||||
3. **Apply**: Assign themes to specific dashboards or configure instance-wide
|
||||
4. **Iterate**: Modify theme JSON directly in the CRUD interface or re-import from the theme editor
|
||||
|
||||
## Custom Fonts
|
||||
|
||||
Superset supports custom fonts through the theme configuration, allowing you to use branded or custom typefaces without rebuilding the application.
|
||||
|
||||
### Default Fonts
|
||||
|
||||
By default, Superset uses Inter and Fira Code fonts which are bundled with the application via `@fontsource` packages. These fonts work offline and require no external network calls.
|
||||
|
||||
### Configuring Custom Fonts
|
||||
|
||||
To use custom fonts, add font URLs to your theme configuration using the `fontUrls` token:
|
||||
|
||||
```python
|
||||
THEME_DEFAULT = {
|
||||
"token": {
|
||||
# Load fonts from external sources (e.g., Google Fonts, Adobe Fonts)
|
||||
"fontUrls": [
|
||||
"https://fonts.googleapis.com/css2?family=Roboto:wght@400;500;600;700&display=swap",
|
||||
"https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@400;500&display=swap",
|
||||
],
|
||||
# Reference the loaded fonts
|
||||
"fontFamily": "Roboto, -apple-system, BlinkMacSystemFont, sans-serif",
|
||||
"fontFamilyCode": "JetBrains Mono, Monaco, monospace",
|
||||
# ... other theme tokens
|
||||
}
|
||||
}
|
||||
|
||||
# Update CSP to allow font sources
|
||||
TALISMAN_CONFIG = {
|
||||
"content_security_policy": {
|
||||
"font-src": ["'self'", "https://fonts.googleapis.com", "https://fonts.gstatic.com"],
|
||||
"style-src": ["'self'", "'unsafe-inline'", "https://fonts.googleapis.com"],
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Or in the CRUD interface theme JSON:
|
||||
|
||||
```json
|
||||
{
|
||||
"token": {
|
||||
"fontUrls": [
|
||||
"https://fonts.googleapis.com/css2?family=Roboto:wght@400;500;600;700&display=swap"
|
||||
],
|
||||
"fontFamily": "Roboto, -apple-system, BlinkMacSystemFont, sans-serif",
|
||||
"fontFamilyCode": "JetBrains Mono, Monaco, monospace"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
:::note
|
||||
Font URLs are validated against a configurable allowlist. By default, fonts from `fonts.googleapis.com`, `fonts.gstatic.com`, and `use.typekit.net` are allowed. Configure `THEME_FONT_URL_ALLOWED_DOMAINS` to customize the allowed domains.
|
||||
:::
|
||||
|
||||
### Font Sources
|
||||
|
||||
- **Google Fonts**: Free, CDN-hosted fonts with wide variety
|
||||
- **Adobe Fonts**: Premium fonts (requires subscription and kit ID)
|
||||
- **Self-hosted**: Place font files in `/static/assets/fonts/` and reference via CSS
|
||||
|
||||
This feature works with the stock Docker image - no custom build required!
|
||||
|
||||
## ECharts Configuration Overrides
|
||||
|
||||
:::note
|
||||
Available since Superset 6.0
|
||||
:::
|
||||
|
||||
Superset provides fine-grained control over ECharts visualizations through theme-level configuration overrides. This allows you to customize the appearance and behavior of all ECharts-based charts without modifying individual chart configurations.
|
||||
|
||||
### Global ECharts Overrides
|
||||
|
||||
Apply settings to all ECharts visualizations using `echartsOptionsOverrides`:
|
||||
|
||||
```python
|
||||
THEME_DEFAULT = {
|
||||
"token": {
|
||||
"colorPrimary": "#2893B3",
|
||||
# ... other Ant Design tokens
|
||||
},
|
||||
"echartsOptionsOverrides": {
|
||||
"grid": {
|
||||
"left": "10%",
|
||||
"right": "10%",
|
||||
"top": "15%",
|
||||
"bottom": "15%"
|
||||
},
|
||||
"tooltip": {
|
||||
"backgroundColor": "rgba(0, 0, 0, 0.8)",
|
||||
"borderColor": "#ccc",
|
||||
"textStyle": {
|
||||
"color": "#fff"
|
||||
}
|
||||
},
|
||||
"legend": {
|
||||
"textStyle": {
|
||||
"fontSize": 14,
|
||||
"fontWeight": "bold"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Chart-Specific Overrides
|
||||
|
||||
Target specific chart types using `echartsOptionsOverridesByChartType`:
|
||||
|
||||
```python
|
||||
THEME_DEFAULT = {
|
||||
"token": {
|
||||
"colorPrimary": "#2893B3",
|
||||
# ... other tokens
|
||||
},
|
||||
"echartsOptionsOverridesByChartType": {
|
||||
"echarts_pie": {
|
||||
"legend": {
|
||||
"orient": "vertical",
|
||||
"right": 10,
|
||||
"top": "center"
|
||||
}
|
||||
},
|
||||
"echarts_timeseries": {
|
||||
"xAxis": {
|
||||
"axisLabel": {
|
||||
"rotate": 45,
|
||||
"fontSize": 12
|
||||
}
|
||||
},
|
||||
"dataZoom": [{
|
||||
"type": "slider",
|
||||
"show": True,
|
||||
"start": 0,
|
||||
"end": 100
|
||||
}]
|
||||
},
|
||||
"echarts_bubble": {
|
||||
"grid": {
|
||||
"left": "15%",
|
||||
"bottom": "20%"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### UI Configuration
|
||||
|
||||
You can also configure ECharts overrides through the theme CRUD interface:
|
||||
|
||||
```json
|
||||
{
|
||||
"token": {
|
||||
"colorPrimary": "#2893B3"
|
||||
},
|
||||
"echartsOptionsOverrides": {
|
||||
"grid": {
|
||||
"left": "10%",
|
||||
"right": "10%"
|
||||
},
|
||||
"tooltip": {
|
||||
"backgroundColor": "rgba(0, 0, 0, 0.8)"
|
||||
}
|
||||
},
|
||||
"echartsOptionsOverridesByChartType": {
|
||||
"echarts_pie": {
|
||||
"legend": {
|
||||
"orient": "vertical",
|
||||
"right": 10
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Override Precedence
|
||||
|
||||
The system applies overrides in the following order (last wins):
|
||||
|
||||
1. **Base ECharts theme** - Default Superset styling
|
||||
2. **Plugin options** - Chart-specific configurations
|
||||
3. **Global overrides** - `echartsOptionsOverrides`
|
||||
4. **Chart-specific overrides** - `echartsOptionsOverridesByChartType[chartType]`
|
||||
|
||||
This ensures chart-specific overrides take precedence over global ones.
|
||||
|
||||
### Common Chart Types
|
||||
|
||||
Available chart types for `echartsOptionsOverridesByChartType`:
|
||||
|
||||
- `echarts_timeseries` - Time series/line charts
|
||||
- `echarts_pie` - Pie and donut charts
|
||||
- `echarts_bubble` - Bubble/scatter charts
|
||||
- `echarts_funnel` - Funnel charts
|
||||
- `echarts_gauge` - Gauge charts
|
||||
- `echarts_radar` - Radar charts
|
||||
- `echarts_boxplot` - Box plot charts
|
||||
- `echarts_treemap` - Treemap charts
|
||||
- `echarts_sunburst` - Sunburst charts
|
||||
- `echarts_graph` - Network/graph charts
|
||||
- `echarts_sankey` - Sankey diagrams
|
||||
- `echarts_heatmap` - Heatmaps
|
||||
- `echarts_mixed_timeseries` - Mixed time series
|
||||
|
||||
### Array Property Overrides
|
||||
|
||||
Array properties (such as color palettes) are fully supported in overrides. Arrays are **replaced entirely** rather than merged, so specify the complete array:
|
||||
|
||||
```python
|
||||
THEME_DEFAULT = {
|
||||
"token": { ... },
|
||||
"echartsOptionsOverrides": {
|
||||
# Replace the default color palette for all ECharts visualizations
|
||||
"color": ["#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd", "#8c564b"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Best Practices
|
||||
|
||||
1. **Start with global overrides** for consistent styling across all charts
|
||||
2. **Use chart-specific overrides** for unique requirements per visualization type
|
||||
3. **Test thoroughly** as overrides use deep merge for objects, but arrays are completely replaced — always specify the full array value
|
||||
4. **Document your overrides** to help team members understand custom styling
|
||||
5. **Consider performance** - complex overrides may impact chart rendering speed
|
||||
|
||||
### Example: Corporate Branding
|
||||
|
||||
```python
|
||||
# Complete corporate theme with ECharts customization
|
||||
THEME_DEFAULT = {
|
||||
"token": {
|
||||
"colorPrimary": "#1B4D3E",
|
||||
"fontFamily": "Corporate Sans, Arial, sans-serif"
|
||||
},
|
||||
"echartsOptionsOverrides": {
|
||||
"grid": {
|
||||
"left": "8%",
|
||||
"right": "8%",
|
||||
"top": "12%",
|
||||
"bottom": "12%"
|
||||
},
|
||||
"textStyle": {
|
||||
"fontFamily": "Corporate Sans, Arial, sans-serif"
|
||||
},
|
||||
"title": {
|
||||
"textStyle": {
|
||||
"color": "#1B4D3E",
|
||||
"fontSize": 18,
|
||||
"fontWeight": "bold"
|
||||
}
|
||||
}
|
||||
},
|
||||
"echartsOptionsOverridesByChartType": {
|
||||
"echarts_timeseries": {
|
||||
"xAxis": {
|
||||
"axisLabel": {
|
||||
"color": "#666",
|
||||
"fontSize": 11
|
||||
}
|
||||
}
|
||||
},
|
||||
"echarts_pie": {
|
||||
"legend": {
|
||||
"textStyle": {
|
||||
"fontSize": 12
|
||||
},
|
||||
"itemGap": 20
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This feature provides powerful theming capabilities while maintaining the flexibility of ECharts' extensive configuration options.
|
||||
|
||||
## Advanced Features
|
||||
|
||||
- **System Themes**: Manage system-wide default and dark themes via UI or configuration
|
||||
- **Per-Dashboard Theming**: Each dashboard can have its own visual identity
|
||||
- **JSON Editor**: Edit theme configurations directly within Superset's interface
|
||||
- **Custom Fonts**: Load external fonts via configuration without rebuilding
|
||||
- **OS Dark Mode Detection**: Automatically switches themes based on system preferences
|
||||
- **Theme Import/Export**: Share themes between instances via YAML files
|
||||
|
||||
## API Access
|
||||
|
||||
For programmatic theme management, Superset provides REST endpoints:
|
||||
|
||||
- `GET /api/v1/theme/` - List all themes
|
||||
- `POST /api/v1/theme/` - Create a new theme
|
||||
- `PUT /api/v1/theme/{id}` - Update a theme
|
||||
- `DELETE /api/v1/theme/{id}` - Delete a theme
|
||||
- `PUT /api/v1/theme/{id}/set_system_default` - Set as system default theme (admin only)
|
||||
- `PUT /api/v1/theme/{id}/set_system_dark` - Set as system dark theme (admin only)
|
||||
- `DELETE /api/v1/theme/unset_system_default` - Remove system default designation
|
||||
- `DELETE /api/v1/theme/unset_system_dark` - Remove system dark designation
|
||||
- `GET /api/v1/theme/export/` - Export themes as YAML
|
||||
- `POST /api/v1/theme/import/` - Import themes from YAML
|
||||
|
||||
These endpoints require appropriate permissions and are subject to RBAC controls.
|
||||
|
||||
:::resources
|
||||
- [Video: Live Demo — Theming Apache Superset](https://www.youtube.com/watch?v=XsZAsO9tC3o)
|
||||
- [CSS and Theming](https://docs.preset.io/docs/css-and-theming) - Additional theming techniques and CSS customization
|
||||
- [Blog: Customizing Apache Superset Dashboards with CSS](https://preset.io/blog/customizing-superset-dashboards-with-css/)
|
||||
- [Blog: Customizing Dashboards with CSS — Tips and Tricks](https://preset.io/blog/customizing-apache-superset-dashboards-with-css-additional-tips-and-tricks/)
|
||||
- [Blog: Customizing Chart Colors](https://preset.io/blog/customizing-chart-colors-with-superset-and-preset/)
|
||||
:::
|
||||
@@ -0,0 +1,50 @@
|
||||
---
|
||||
title: Timezones
|
||||
hide_title: true
|
||||
sidebar_position: 6
|
||||
version: 1
|
||||
---
|
||||
|
||||
# Timezones
|
||||
|
||||
There are four distinct timezone components which relate to Apache Superset,
|
||||
|
||||
1. The timezone that the underlying data is encoded in.
|
||||
2. The timezone of the database engine.
|
||||
3. The timezone of the Apache Superset backend.
|
||||
4. The timezone of the Apache Superset client.
|
||||
|
||||
where if a temporal field (`DATETIME`, `TIME`, `TIMESTAMP`, etc.) does not explicitly define a timezone it defaults to the underlying timezone of the component.
|
||||
|
||||
To help make the problem somewhat tractable—given that Apache Superset has no control on either how the data is ingested (1) or the timezone of the client (4)—from a consistency standpoint it is highly recommended that both (2) and (3) are configured to use the same timezone with a strong preference given to [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) to ensure temporal fields without an explicit timestamp are not incorrectly coerced into the wrong timezone. Actually Apache Superset currently has implicit assumptions that timestamps are in UTC and thus configuring (3) to a non-UTC timezone could be problematic.
|
||||
|
||||
To strive for data consistency (regardless of the timezone of the client) the Apache Superset backend tries to ensure that any timestamp sent to the client has an explicit (or semi-explicit as in the case with [Epoch time](https://en.wikipedia.org/wiki/Unix_time) which is always in reference to UTC) timezone encoded within.
|
||||
|
||||
The challenge however lies with the slew of [database engines](/user-docs/databases#installing-drivers-in-docker) which Apache Superset supports and various inconsistencies between their [Python Database API (DB-API)](https://www.python.org/dev/peps/pep-0249/) implementations combined with the fact that we use [Pandas](https://pandas.pydata.org/) to read SQL into a DataFrame prior to serializing to JSON. Regrettably Pandas ignores the DB-API [type_code](https://www.python.org/dev/peps/pep-0249/#type-objects) relying by default on the underlying Python type returned by the DB-API. Currently only a subset of the supported database engines work correctly with Pandas, i.e., ensuring timestamps without an explicit timestamp are serialized to JSON with the server timezone, thus guaranteeing the client will display timestamps in a consistent manner irrespective of the client's timezone.
|
||||
|
||||
For example the following is a comparison of MySQL and Presto,
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
from sqlalchemy import create_engine
|
||||
|
||||
pd.read_sql_query(
|
||||
sql="SELECT TIMESTAMP('2022-01-01 00:00:00') AS ts",
|
||||
con=create_engine("mysql://root@localhost:3360"),
|
||||
).to_json()
|
||||
|
||||
pd.read_sql_query(
|
||||
sql="SELECT TIMESTAMP '2022-01-01 00:00:00' AS ts",
|
||||
con=create_engine("presto://localhost:8080"),
|
||||
).to_json()
|
||||
```
|
||||
|
||||
which outputs `{"ts":{"0":1640995200000}}` (which infers the UTC timezone per the Epoch time definition) and `{"ts":{"0":"2022-01-01 00:00:00.000"}}` (without an explicit timezone) respectively and thus are treated differently in JavaScript:
|
||||
|
||||
```js
|
||||
new Date(1640995200000)
|
||||
> Sat Jan 01 2022 13:00:00 GMT+1300 (New Zealand Daylight Time)
|
||||
|
||||
new Date("2022-01-01 00:00:00.000")
|
||||
> Sat Jan 01 2022 00:00:00 GMT+1300 (New Zealand Daylight Time)
|
||||
```
|
||||
Reference in New Issue
Block a user