mirror of
https://github.com/apache/superset.git
synced 2026-05-30 04:39:20 +00:00
docs: Refactor Documentation Structure (#28161)
Co-authored-by: Evan Rusackas <evan@preset.io> Co-authored-by: Sam Firke <sfirke@users.noreply.github.com>
This commit is contained in:
@@ -1,4 +0,0 @@
|
||||
{
|
||||
"label": "Installation and Configuration",
|
||||
"position": 3
|
||||
}
|
||||
@@ -1,407 +0,0 @@
|
||||
---
|
||||
title: Alerts and Reports
|
||||
hide_title: true
|
||||
sidebar_position: 10
|
||||
version: 2
|
||||
---
|
||||
|
||||
## Alerts and Reports
|
||||
|
||||
Users can configure automated alerts and reports to send dashboards or charts to an email recipient or Slack channel.
|
||||
|
||||
- *Alerts* are sent when a SQL condition is reached
|
||||
- *Reports* are sent on a schedule
|
||||
|
||||
Alerts and reports are disabled by default. To turn them on, you need to do some setup, described here.
|
||||
|
||||
### Requirements
|
||||
|
||||
#### Commons
|
||||
|
||||
##### In your `superset_config.py` or `superset_config_docker.py`
|
||||
|
||||
- `"ALERT_REPORTS"` [feature flag](/docs/installation/configuring-superset#feature-flags) must be turned to True.
|
||||
- `beat_schedule` in CeleryConfig must contain schedule for `reports.scheduler`.
|
||||
- At least one of those must be configured, depending on what you want to use:
|
||||
- emails: `SMTP_*` settings
|
||||
- Slack messages: `SLACK_API_TOKEN`
|
||||
|
||||
###### Disable dry-run mode
|
||||
|
||||
Screenshots will be taken but no messages actually sent as long as `ALERT_REPORTS_NOTIFICATION_DRY_RUN = True`, its default value in `docker/pythonpath_dev/superset_config.py`. To disable dry-run mode and start receiving email/Slack notifications, set `ALERT_REPORTS_NOTIFICATION_DRY_RUN` to `False` in [superset config](https://github.com/apache/superset/blob/master/docker/pythonpath_dev/superset_config.py).
|
||||
|
||||
##### In your `Dockerfile`
|
||||
|
||||
- You must install a headless browser, for taking screenshots of the charts and dashboards. Only Firefox and Chrome are currently supported.
|
||||
> If you choose Chrome, you must also change the value of `WEBDRIVER_TYPE` to `"chrome"` in your `superset_config.py`.
|
||||
|
||||
Note: All the components required (Firefox headless browser, Redis, Postgres db, celery worker and celery beat) are present in the *dev* docker image if you are following [Installing Superset Locally](/docs/installation/installing-superset-using-docker-compose/).
|
||||
All you need to do is add the required config variables described in this guide (See `Detailed Config`).
|
||||
|
||||
If you are running a non-dev docker image, e.g., a stable release like `apache/superset:3.1.0`, that image does not include a headless browser. Only the `superset_worker` container needs this headless browser to browse to the target chart or dashboard.
|
||||
You can either install and configure the headless browser - see "Custom Dockerfile" section below - or when deploying via `docker compose`, modify your `docker-compose.yml` file to use a dev image for the worker container and a stable release image for the `superset_app` container.
|
||||
|
||||
*Note*: In this context, a "dev image" is the same application software as its corresponding non-dev image, just bundled with additional tools. So an image like `3.1.0-dev` is identical to `3.1.0` when it comes to stability, functionality, and running in production. The actual "in-development" versions of Superset - cutting-edge and unstable - are not tagged with version numbers on Docker Hub and will display version `0.0.0-dev` within the Superset UI.
|
||||
|
||||
#### Slack integration
|
||||
|
||||
To send alerts and reports to Slack channels, you need to create a new Slack Application on your workspace.
|
||||
|
||||
1. Connect to your Slack workspace, then head to <https://api.slack.com/apps>.
|
||||
2. Create a new app.
|
||||
3. Go to "OAuth & Permissions" section, and give the following scopes to your app:
|
||||
- `incoming-webhook`
|
||||
- `files:write`
|
||||
- `chat:write`
|
||||
4. At the top of the "OAuth and Permissions" section, click "install to workspace".
|
||||
5. Select a default channel for your app and continue.
|
||||
(You can post to any channel by inviting your Superset app into that channel).
|
||||
6. The app should now be installed in your workspace, and a "Bot User OAuth Access Token" should have been created. Copy that token in the `SLACK_API_TOKEN` variable of your `superset_config.py`.
|
||||
7. Restart the service (or run `superset init`) to pull in the new configuration.
|
||||
|
||||
Note: when you configure an alert or a report, the Slack channel list takes channel names without the leading '#' e.g. use `alerts` instead of `#alerts`.
|
||||
|
||||
#### Kubernetes-specific
|
||||
|
||||
- You must have a `celery beat` pod running. If you're using the chart included in the GitHub repository under [helm/superset](https://github.com/apache/superset/tree/master/helm/superset), you need to put `supersetCeleryBeat.enabled = true` in your values override.
|
||||
- You can see the dedicated docs about [Kubernetes installation](/docs/installation/running-on-kubernetes) for more generic details.
|
||||
|
||||
#### Docker Compose specific
|
||||
|
||||
##### You must have in your `docker-compose.yml`
|
||||
|
||||
- A Redis message broker
|
||||
- PostgreSQL DB instead of SQLlite
|
||||
- One or more `celery worker`
|
||||
- A single `celery beat`
|
||||
|
||||
This process also works in a Docker swarm environment, you would just need to add `Deploy:` to the Superset, Redis and Postgres services along with your specific configs for your swarm.
|
||||
|
||||
### Detailed config
|
||||
|
||||
The following configurations need to be added to the `superset_config.py` file. This file is loaded when the image runs, and any configurations in it will override the default configurations found in the `config.py`.
|
||||
|
||||
You can find documentation about each field in the default `config.py` in the GitHub repository under [superset/config.py](https://github.com/apache/superset/blob/master/superset/config.py).
|
||||
|
||||
You need to replace default values with your custom Redis, Slack and/or SMTP config.
|
||||
|
||||
Superset uses Celery beat and Celery worker(s) to send alerts and reports.
|
||||
- The beat is the scheduler that tells the worker when to perform its tasks. This schedule is defined when you create the alert or report.
|
||||
- The worker will process the tasks that need to be performed when an alert or report is fired.
|
||||
|
||||
In the `CeleryConfig`, only the `beat_schedule` is relevant to this feature, the rest of the `CeleryConfig` can be changed for your needs.
|
||||
|
||||
```python
|
||||
from celery.schedules import crontab
|
||||
|
||||
FEATURE_FLAGS = {
|
||||
"ALERT_REPORTS": True
|
||||
}
|
||||
|
||||
REDIS_HOST = "superset_cache"
|
||||
REDIS_PORT = "6379"
|
||||
|
||||
class CeleryConfig:
|
||||
broker_url = f"redis://{REDIS_HOST}:{REDIS_PORT}/0"
|
||||
imports = (
|
||||
"superset.sql_lab",
|
||||
"superset.tasks.scheduler",
|
||||
)
|
||||
result_backend = f"redis://{REDIS_HOST}:{REDIS_PORT}/0"
|
||||
worker_prefetch_multiplier = 10
|
||||
task_acks_late = True
|
||||
task_annotations = {
|
||||
"sql_lab.get_sql_results": {
|
||||
"rate_limit": "100/s",
|
||||
},
|
||||
}
|
||||
beat_schedule = {
|
||||
"reports.scheduler": {
|
||||
"task": "reports.scheduler",
|
||||
"schedule": crontab(minute="*", hour="*"),
|
||||
},
|
||||
"reports.prune_log": {
|
||||
"task": "reports.prune_log",
|
||||
"schedule": crontab(minute=0, hour=0),
|
||||
},
|
||||
}
|
||||
CELERY_CONFIG = CeleryConfig
|
||||
|
||||
SCREENSHOT_LOCATE_WAIT = 100
|
||||
SCREENSHOT_LOAD_WAIT = 600
|
||||
|
||||
# Slack configuration
|
||||
SLACK_API_TOKEN = "xoxb-"
|
||||
|
||||
# Email configuration
|
||||
SMTP_HOST = "smtp.sendgrid.net" # change to your host
|
||||
SMTP_PORT = 2525 # your port, e.g. 587
|
||||
SMTP_STARTTLS = True
|
||||
SMTP_SSL_SERVER_AUTH = True # If your using an SMTP server with a valid certificate
|
||||
SMTP_SSL = False
|
||||
SMTP_USER = "your_user" # use the empty string "" if using an unauthenticated SMTP server
|
||||
SMTP_PASSWORD = "your_password" # use the empty string "" if using an unauthenticated SMTP server
|
||||
SMTP_MAIL_FROM = "noreply@youremail.com"
|
||||
EMAIL_REPORTS_SUBJECT_PREFIX = "[Superset] " # optional - overwrites default value in config.py of "[Report] "
|
||||
|
||||
# WebDriver configuration
|
||||
# If you use Firefox, you can stick with default values
|
||||
# If you use Chrome, then add the following WEBDRIVER_TYPE and WEBDRIVER_OPTION_ARGS
|
||||
WEBDRIVER_TYPE = "chrome"
|
||||
WEBDRIVER_OPTION_ARGS = [
|
||||
"--force-device-scale-factor=2.0",
|
||||
"--high-dpi-support=2.0",
|
||||
"--headless",
|
||||
"--disable-gpu",
|
||||
"--disable-dev-shm-usage",
|
||||
"--no-sandbox",
|
||||
"--disable-setuid-sandbox",
|
||||
"--disable-extensions",
|
||||
]
|
||||
|
||||
# This is for internal use, you can keep http
|
||||
WEBDRIVER_BASEURL = "http://superset:8088"
|
||||
# This is the link sent to the recipient. Change to your domain, e.g. https://superset.mydomain.com
|
||||
WEBDRIVER_BASEURL_USER_FRIENDLY = "http://localhost:8088"
|
||||
```
|
||||
|
||||
You also need
|
||||
to specify on behalf of which username to render the dashboards. In general dashboards and charts
|
||||
are not accessible to unauthorized requests, that is why the worker needs to take over credentials
|
||||
of an existing user to take a snapshot.
|
||||
|
||||
By default, Alerts and Reports are executed as the owner of the alert/report object. To use a fixed user account,
|
||||
just change the config as follows (`admin` in this example):
|
||||
|
||||
```python
|
||||
from superset.tasks.types import ExecutorType
|
||||
|
||||
THUMBNAIL_SELENIUM_USER = 'admin'
|
||||
ALERT_REPORTS_EXECUTE_AS = [ExecutorType.SELENIUM]
|
||||
```
|
||||
|
||||
Please refer to `ExecutorType` in the codebase for other executor types.
|
||||
|
||||
|
||||
**Important notes**
|
||||
|
||||
- Be mindful of the concurrency setting for celery (using `-c 4`). Selenium/webdriver instances can
|
||||
consume a lot of CPU / memory on your servers.
|
||||
- In some cases, if you notice a lot of leaked geckodriver processes, try running your celery
|
||||
processes with `celery worker --pool=prefork --max-tasks-per-child=128 ...`
|
||||
- It is recommended to run separate workers for the `sql_lab` and `email_reports` tasks. This can be
|
||||
done using the `queue` field in `task_annotations`.
|
||||
- Adjust `WEBDRIVER_BASEURL` in your configuration file if celery workers can’t access Superset via
|
||||
its default value of `http://0.0.0.0:8080/`.
|
||||
|
||||
|
||||
### Custom Dockerfile
|
||||
|
||||
If you're running the dev version of a released Superset image, like `apache/superset:3.1.0-dev`, you should be set with the above.
|
||||
|
||||
But if you're building your own image, or starting with a non-dev version, a webdriver (and headless browser) is needed to capture screenshots of the charts and dashboards which are then sent to the recipient.
|
||||
Here's how you can modify your Dockerfile to take the screenshots either with Firefox or Chrome.
|
||||
|
||||
#### Using Firefox
|
||||
|
||||
```docker
|
||||
FROM apache/superset:3.1.0
|
||||
|
||||
USER root
|
||||
|
||||
RUN apt-get update && \
|
||||
apt-get install --no-install-recommends -y firefox-esr
|
||||
|
||||
ENV GECKODRIVER_VERSION=0.29.0
|
||||
RUN wget -q https://github.com/mozilla/geckodriver/releases/download/v${GECKODRIVER_VERSION}/geckodriver-v${GECKODRIVER_VERSION}-linux64.tar.gz && \
|
||||
tar -x geckodriver -zf geckodriver-v${GECKODRIVER_VERSION}-linux64.tar.gz -O > /usr/bin/geckodriver && \
|
||||
chmod 755 /usr/bin/geckodriver && \
|
||||
rm geckodriver-v${GECKODRIVER_VERSION}-linux64.tar.gz
|
||||
|
||||
RUN pip install --no-cache gevent psycopg2 redis
|
||||
|
||||
USER superset
|
||||
```
|
||||
|
||||
#### Using Chrome
|
||||
|
||||
```docker
|
||||
FROM apache/superset:3.1.0
|
||||
|
||||
USER root
|
||||
|
||||
RUN apt-get update && \
|
||||
wget -q https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb && \
|
||||
apt-get install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb && \
|
||||
rm -f google-chrome-stable_current_amd64.deb
|
||||
|
||||
RUN export CHROMEDRIVER_VERSION=$(curl --silent https://chromedriver.storage.googleapis.com/LATEST_RELEASE_102) && \
|
||||
wget -q https://chromedriver.storage.googleapis.com/${CHROMEDRIVER_VERSION}/chromedriver_linux64.zip && \
|
||||
unzip chromedriver_linux64.zip -d /usr/bin && \
|
||||
chmod 755 /usr/bin/chromedriver && \
|
||||
rm -f chromedriver_linux64.zip
|
||||
|
||||
RUN pip install --no-cache gevent psycopg2 redis
|
||||
|
||||
USER superset
|
||||
```
|
||||
|
||||
Don't forget to set `WEBDRIVER_TYPE` and `WEBDRIVER_OPTION_ARGS` in your config if you use Chrome.
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
There are many reasons that reports might not be working. Try these steps to check for specific issues.
|
||||
|
||||
#### Confirm feature flag is enabled and you have sufficient permissions
|
||||
|
||||
If you don't see "Alerts & Reports" under the *Manage* section of the Settings dropdown in the Superset UI, you need to enable the `ALERT_REPORTS` feature flag (see above). Enable another feature flag and check to see that it took effect, to verify that your config file is getting loaded.
|
||||
|
||||
Log in as an admin user to ensure you have adequate permissions.
|
||||
|
||||
#### Check the logs of your Celery worker
|
||||
|
||||
This is the best source of information about the problem. In a docker compose deployment, you can do this with a command like `docker logs superset_worker --since 1h`.
|
||||
|
||||
#### Check web browser and webdriver installation
|
||||
|
||||
To take a screenshot, the worker visits the dashboard or chart using a headless browser, then takes a screenshot. If you are able to send a chart as CSV or text but can't send as PNG, your problem may lie with the browser.
|
||||
|
||||
Superset docker images that have a tag ending with `-dev` have the Firefox headless browser and geckodriver already installed. You can test that these are installed and in the proper path by entering your Superset worker and running `firefox --headless` and then `geckodriver`. Both commands should start those applications.
|
||||
|
||||
If you are handling the installation of that software on your own, or wish to use Chromium instead, do your own verification to ensure that the headless browser opens successfully in the worker environment.
|
||||
|
||||
#### Send a test email
|
||||
|
||||
One symptom of an invalid connection to an email server is receiving an error of `[Errno 110] Connection timed out` in your logs when the report tries to send.
|
||||
|
||||
Confirm via testing that your outbound email configuration is correct. Here is the simplest test, for an un-authenticated email SMTP email service running on port 25. If you are sending over SSL, for instance, study how [Superset's codebase sends emails](https://github.com/apache/superset/blob/master/superset/utils/core.py#L818) and then test with those commands and arguments.
|
||||
|
||||
Start Python in your worker environment, replace all example values, and run:
|
||||
```python
|
||||
import smtplib
|
||||
from email.mime.multipart import MIMEMultipart
|
||||
from email.mime.text import MIMEText
|
||||
|
||||
from_email = 'superset_emails@example.com'
|
||||
to_email = 'your_email@example.com'
|
||||
msg = MIMEMultipart()
|
||||
msg['From'] = from_email
|
||||
msg['To'] = to_email
|
||||
msg['Subject'] = 'Superset SMTP config test'
|
||||
message = 'It worked'
|
||||
msg.attach(MIMEText(message))
|
||||
mailserver = smtplib.SMTP('smtpmail.example.com', 25)
|
||||
mailserver.sendmail(from_email, to_email, msg.as_string())
|
||||
mailserver.quit()
|
||||
```
|
||||
|
||||
This should send an email.
|
||||
|
||||
Possible fixes:
|
||||
- Some cloud hosts disable outgoing unauthenticated SMTP email to prevent spam. For instance, [Azure blocks port 25 by default on some machines](https://learn.microsoft.com/en-us/azure/virtual-network/troubleshoot-outbound-smtp-connectivity). Enable that port or use another sending method.
|
||||
- Use another set of SMTP credentials that you verify works in this setup.
|
||||
|
||||
#### Browse to your report from the worker
|
||||
|
||||
The worker may be unable to reach the report. It will use the value of `WEBDRIVER_BASEURL` to browse to the report. If that route is invalid, or presents an authentication challenge that the worker can't pass, the report screenshot will fail.
|
||||
|
||||
Check this by attempting to `curl` the URL of a report that you see in the error logs of your worker. For instance, from the worker environment, run `curl http://superset_app:8088/superset/dashboard/1/`. You may get different responses depending on whether the dashboard exists - for example, you may need to change the `1` in that URL. If there's a URL in your logs from a failed report screenshot, that's a good place to start. The goal is to determine a valid value for `WEBDRIVER_BASEURL` and determine if an issue like HTTPS or authentication is redirecting your worker.
|
||||
|
||||
In a deployment with authentication measures enabled like HTTPS and Single Sign-On, it may make sense to have the worker navigate directly to the Superset application running in the same location, avoiding the need to sign in. For instance, you could use `WEBDRIVER_BASEURL="http://superset_app:8088"` for a docker compose deployment, and set `"force_https": False,` in your `TALISMAN_CONFIG`.
|
||||
|
||||
### Scheduling Queries as Reports
|
||||
|
||||
You can optionally allow your users to schedule queries directly in SQL Lab. This is done by adding
|
||||
extra metadata to saved queries, which are then picked up by an external scheduled (like
|
||||
[Apache Airflow](https://airflow.apache.org/)).
|
||||
|
||||
To allow scheduled queries, add the following to `SCHEDULED_QUERIES` in your configuration file:
|
||||
|
||||
```python
|
||||
SCHEDULED_QUERIES = {
|
||||
# This information is collected when the user clicks "Schedule query",
|
||||
# and saved into the `extra` field of saved queries.
|
||||
# See: https://github.com/mozilla-services/react-jsonschema-form
|
||||
'JSONSCHEMA': {
|
||||
'title': 'Schedule',
|
||||
'description': (
|
||||
'In order to schedule a query, you need to specify when it '
|
||||
'should start running, when it should stop running, and how '
|
||||
'often it should run. You can also optionally specify '
|
||||
'dependencies that should be met before the query is '
|
||||
'executed. Please read the documentation for best practices '
|
||||
'and more information on how to specify dependencies.'
|
||||
),
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'output_table': {
|
||||
'type': 'string',
|
||||
'title': 'Output table name',
|
||||
},
|
||||
'start_date': {
|
||||
'type': 'string',
|
||||
'title': 'Start date',
|
||||
# date-time is parsed using the chrono library, see
|
||||
# https://www.npmjs.com/package/chrono-node#usage
|
||||
'format': 'date-time',
|
||||
'default': 'tomorrow at 9am',
|
||||
},
|
||||
'end_date': {
|
||||
'type': 'string',
|
||||
'title': 'End date',
|
||||
# date-time is parsed using the chrono library, see
|
||||
# https://www.npmjs.com/package/chrono-node#usage
|
||||
'format': 'date-time',
|
||||
'default': '9am in 30 days',
|
||||
},
|
||||
'schedule_interval': {
|
||||
'type': 'string',
|
||||
'title': 'Schedule interval',
|
||||
},
|
||||
'dependencies': {
|
||||
'type': 'array',
|
||||
'title': 'Dependencies',
|
||||
'items': {
|
||||
'type': 'string',
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
'UISCHEMA': {
|
||||
'schedule_interval': {
|
||||
'ui:placeholder': '@daily, @weekly, etc.',
|
||||
},
|
||||
'dependencies': {
|
||||
'ui:help': (
|
||||
'Check the documentation for the correct format when '
|
||||
'defining dependencies.'
|
||||
),
|
||||
},
|
||||
},
|
||||
'VALIDATION': [
|
||||
# ensure that start_date <= end_date
|
||||
{
|
||||
'name': 'less_equal',
|
||||
'arguments': ['start_date', 'end_date'],
|
||||
'message': 'End date cannot be before start date',
|
||||
# this is where the error message is shown
|
||||
'container': 'end_date',
|
||||
},
|
||||
],
|
||||
# link to the scheduler; this example links to an Airflow pipeline
|
||||
# that uses the query id and the output table as its name
|
||||
'linkback': (
|
||||
'https://airflow.example.com/admin/airflow/tree?'
|
||||
'dag_id=query_${id}_${extra_json.schedule_info.output_table}'
|
||||
),
|
||||
}
|
||||
```
|
||||
|
||||
This configuration is based on
|
||||
[react-jsonschema-form](https://github.com/mozilla-services/react-jsonschema-form) and will add a
|
||||
menu item called “Schedule” to SQL Lab. When the menu item is clicked, a modal will show up where
|
||||
the user can add the metadata required for scheduling the query.
|
||||
|
||||
This information can then be retrieved from the endpoint `/savedqueryviewapi/api/read` and used to
|
||||
schedule the queries that have `scheduled_queries` in their JSON metadata. For schedulers other than
|
||||
Airflow, additional fields can be easily added to the configuration file above.
|
||||
@@ -1,104 +0,0 @@
|
||||
---
|
||||
title: Async Queries via Celery
|
||||
hide_title: true
|
||||
sidebar_position: 9
|
||||
version: 1
|
||||
---
|
||||
|
||||
## Async Queries via Celery
|
||||
|
||||
### Celery
|
||||
|
||||
On large analytic databases, it’s common to run queries that execute for minutes or hours. To enable
|
||||
support for long running queries that execute beyond the typical web request’s timeout (30-60
|
||||
seconds), it is necessary to configure an asynchronous backend for Superset which consists of:
|
||||
|
||||
- one or many Superset workers (which is implemented as a Celery worker), and can be started with
|
||||
the `celery worker` command, run `celery worker --help` to view the related options.
|
||||
- a celery broker (message queue) for which we recommend using Redis or RabbitMQ
|
||||
- a results backend that defines where the worker will persist the query results
|
||||
|
||||
Configuring Celery requires defining a `CELERY_CONFIG` in your `superset_config.py`. Both the worker
|
||||
and web server processes should have the same configuration.
|
||||
|
||||
```python
|
||||
class CeleryConfig(object):
|
||||
broker_url = "redis://localhost:6379/0"
|
||||
imports = (
|
||||
"superset.sql_lab",
|
||||
"superset.tasks.scheduler",
|
||||
)
|
||||
result_backend = "redis://localhost:6379/0"
|
||||
worker_prefetch_multiplier = 10
|
||||
task_acks_late = True
|
||||
task_annotations = {
|
||||
"sql_lab.get_sql_results": {
|
||||
"rate_limit": "100/s",
|
||||
},
|
||||
}
|
||||
|
||||
CELERY_CONFIG = CeleryConfig
|
||||
```
|
||||
|
||||
To start a Celery worker to leverage the configuration, run the following command:
|
||||
|
||||
```
|
||||
celery --app=superset.tasks.celery_app:app worker --pool=prefork -O fair -c 4
|
||||
```
|
||||
|
||||
To start a job which schedules periodic background jobs, run the following command:
|
||||
|
||||
```
|
||||
celery --app=superset.tasks.celery_app:app beat
|
||||
```
|
||||
|
||||
To setup a result backend, you need to pass an instance of a derivative of from
|
||||
from flask_caching.backends.base import BaseCache to the RESULTS_BACKEND configuration key in your superset_config.py. You can
|
||||
use Memcached, Redis, S3 (https://pypi.python.org/pypi/s3werkzeugcache), memory or the file system
|
||||
(in a single server-type setup or for testing), or to write your own caching interface. Your
|
||||
`superset_config.py` may look something like:
|
||||
|
||||
```python
|
||||
# On S3
|
||||
from s3cache.s3cache import S3Cache
|
||||
S3_CACHE_BUCKET = 'foobar-superset'
|
||||
S3_CACHE_KEY_PREFIX = 'sql_lab_result'
|
||||
RESULTS_BACKEND = S3Cache(S3_CACHE_BUCKET, S3_CACHE_KEY_PREFIX)
|
||||
|
||||
# On Redis
|
||||
from flask_caching.backends.rediscache import RedisCache
|
||||
RESULTS_BACKEND = RedisCache(
|
||||
host='localhost', port=6379, key_prefix='superset_results')
|
||||
```
|
||||
|
||||
For performance gains, [MessagePack](https://github.com/msgpack/msgpack-python) and
|
||||
[PyArrow](https://arrow.apache.org/docs/python/) are now used for results serialization. This can be
|
||||
disabled by setting `RESULTS_BACKEND_USE_MSGPACK = False` in your `superset_config.py`, should any
|
||||
issues arise. Please clear your existing results cache store when upgrading an existing environment.
|
||||
|
||||
**Important Notes**
|
||||
|
||||
- It is important that all the worker nodes and web servers in the Superset cluster _share a common
|
||||
metadata database_. This means that SQLite will not work in this context since it has limited
|
||||
support for concurrency and typically lives on the local file system.
|
||||
|
||||
- There should _only be one instance of celery beat running_ in your entire setup. If not,
|
||||
background jobs can get scheduled multiple times resulting in weird behaviors like duplicate
|
||||
delivery of reports, higher than expected load / traffic etc.
|
||||
|
||||
- SQL Lab will _only run your queries asynchronously if_ you enable **Asynchronous Query Execution**
|
||||
in your database settings (Sources > Databases > Edit record).
|
||||
|
||||
### Celery Flower
|
||||
|
||||
Flower is a web based tool for monitoring the Celery cluster which you can install from pip:
|
||||
|
||||
```python
|
||||
pip install flower
|
||||
```
|
||||
|
||||
You can run flower using:
|
||||
|
||||
```
|
||||
celery --app=superset.tasks.celery_app:app flower
|
||||
```
|
||||
@@ -1,157 +0,0 @@
|
||||
---
|
||||
title: Caching
|
||||
hide_title: true
|
||||
sidebar_position: 6
|
||||
version: 1
|
||||
---
|
||||
|
||||
## Caching
|
||||
|
||||
Superset uses [Flask-Caching](https://flask-caching.readthedocs.io/) for caching purposes.
|
||||
Flask-Caching supports various caching backends, including Redis (recommended), Memcached,
|
||||
SimpleCache (in-memory), or the local filesystem.
|
||||
[Custom cache backends](https://flask-caching.readthedocs.io/en/latest/#custom-cache-backends)
|
||||
are also supported.
|
||||
|
||||
Caching can be configured by providing a dictionaries in
|
||||
`superset_config.py` that comply with[the Flask-Caching config specifications](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching).
|
||||
|
||||
The following cache configurations can be customized in this way:
|
||||
- Dashboard filter state (required): `FILTER_STATE_CACHE_CONFIG`.
|
||||
- Explore chart form data (required): `EXPLORE_FORM_DATA_CACHE_CONFIG`
|
||||
- Metadata cache (optional): `CACHE_CONFIG`
|
||||
- Charting data queried from datasets (optional): `DATA_CACHE_CONFIG`
|
||||
|
||||
For example, to configure the filter state cache using redis:
|
||||
|
||||
```python
|
||||
FILTER_STATE_CACHE_CONFIG = {
|
||||
'CACHE_TYPE': 'RedisCache',
|
||||
'CACHE_DEFAULT_TIMEOUT': 86400,
|
||||
'CACHE_KEY_PREFIX': 'superset_filter_cache',
|
||||
'CACHE_REDIS_URL': 'redis://localhost:6379/0'
|
||||
}
|
||||
```
|
||||
|
||||
### Dependencies
|
||||
|
||||
In order to use dedicated cache stores, additional python libraries must be installed
|
||||
|
||||
- For Redis: we recommend the [redis](https://pypi.python.org/pypi/redis) Python package
|
||||
- Memcached: we recommend using [pylibmc](https://pypi.org/project/pylibmc/) client library as
|
||||
`python-memcached` does not handle storing binary data correctly.
|
||||
|
||||
These libraries can be installed using pip.
|
||||
|
||||
### Fallback Metastore Cache
|
||||
|
||||
Note, that some form of Filter State and Explore caching are required. If either of these caches
|
||||
are undefined, Superset falls back to using a built-in cache that stores data in the metadata
|
||||
database. While it is recommended to use a dedicated cache, the built-in cache can also be used
|
||||
to cache other data.
|
||||
|
||||
For example, to use the built-in cache to store chart data, use the following config:
|
||||
|
||||
```python
|
||||
DATA_CACHE_CONFIG = {
|
||||
"CACHE_TYPE": "SupersetMetastoreCache",
|
||||
"CACHE_KEY_PREFIX": "superset_results", # make sure this string is unique to avoid collisions
|
||||
"CACHE_DEFAULT_TIMEOUT": 86400, # 60 seconds * 60 minutes * 24 hours
|
||||
}
|
||||
```
|
||||
|
||||
### Chart Cache Timeout
|
||||
|
||||
The cache timeout for charts may be overridden by the settings for an individual chart, dataset, or
|
||||
database. Each of these configurations will be checked in order before falling back to the default
|
||||
value defined in `DATA_CACHE_CONFIG`.
|
||||
|
||||
Note, that by setting the cache timeout to `-1`, caching for charting data can be disabled, either
|
||||
per chart, dataset or database, or by default if set in `DATA_CACHE_CONFIG`.
|
||||
|
||||
### SQL Lab Query Results
|
||||
|
||||
Caching for SQL Lab query results is used when async queries are enabled and is configured using
|
||||
`RESULTS_BACKEND`.
|
||||
|
||||
Note that this configuration does not use a flask-caching dictionary for its configuration, but
|
||||
instead requires a cachelib object.
|
||||
|
||||
See [Async Queries via Celery](/docs/installation/async-queries-celery) for details.
|
||||
|
||||
### Caching Thumbnails
|
||||
|
||||
This is an optional feature that can be turned on by activating it’s [feature flag](/docs/installation/configuring-superset#feature-flags) on config:
|
||||
|
||||
```
|
||||
FEATURE_FLAGS = {
|
||||
"THUMBNAILS": True,
|
||||
"THUMBNAILS_SQLA_LISTENERS": True,
|
||||
}
|
||||
```
|
||||
|
||||
By default thumbnails are rendered per user, and will fall back to the Selenium user for anonymous users.
|
||||
To always render thumbnails as a fixed user (`admin` in this example), use the following configuration:
|
||||
|
||||
```python
|
||||
from superset.tasks.types import ExecutorType
|
||||
|
||||
THUMBNAIL_SELENIUM_USER = "admin"
|
||||
THUMBNAIL_EXECUTE_AS = [ExecutorType.SELENIUM]
|
||||
```
|
||||
|
||||
|
||||
For this feature you will need a cache system and celery workers. All thumbnails are stored on cache
|
||||
and are processed asynchronously by the workers.
|
||||
|
||||
An example config where images are stored on S3 could be:
|
||||
|
||||
```python
|
||||
from flask import Flask
|
||||
from s3cache.s3cache import S3Cache
|
||||
|
||||
...
|
||||
|
||||
class CeleryConfig(object):
|
||||
broker_url = "redis://localhost:6379/0"
|
||||
imports = (
|
||||
"superset.sql_lab",
|
||||
"superset.tasks.thumbnails",
|
||||
)
|
||||
result_backend = "redis://localhost:6379/0"
|
||||
worker_prefetch_multiplier = 10
|
||||
task_acks_late = True
|
||||
|
||||
|
||||
CELERY_CONFIG = CeleryConfig
|
||||
|
||||
def init_thumbnail_cache(app: Flask) -> S3Cache:
|
||||
return S3Cache("bucket_name", 'thumbs_cache/')
|
||||
|
||||
|
||||
THUMBNAIL_CACHE_CONFIG = init_thumbnail_cache
|
||||
# Async selenium thumbnail task will use the following user
|
||||
THUMBNAIL_SELENIUM_USER = "Admin"
|
||||
```
|
||||
|
||||
Using the above example cache keys for dashboards will be `superset_thumb__dashboard__{ID}`. You can
|
||||
override the base URL for selenium using:
|
||||
|
||||
```
|
||||
WEBDRIVER_BASEURL = "https://superset.company.com"
|
||||
```
|
||||
|
||||
Additional selenium web drive configuration can be set using `WEBDRIVER_CONFIGURATION`. You can
|
||||
implement a custom function to authenticate selenium. The default function uses the `flask-login`
|
||||
session cookie. Here's an example of a custom function signature:
|
||||
|
||||
```python
|
||||
def auth_driver(driver: WebDriver, user: "User") -> WebDriver:
|
||||
pass
|
||||
```
|
||||
|
||||
Then on configuration:
|
||||
|
||||
```
|
||||
WEBDRIVER_AUTH_FUNC = auth_driver
|
||||
```
|
||||
@@ -1,371 +0,0 @@
|
||||
---
|
||||
title: Configuring Superset
|
||||
hide_title: true
|
||||
sidebar_position: 4
|
||||
version: 1
|
||||
---
|
||||
|
||||
## Configuring Superset
|
||||
|
||||
### Configuration
|
||||
|
||||
To configure your application, you need to create a file `superset_config.py`. Add this file to your
|
||||
|
||||
`PYTHONPATH` or create an environment variable `SUPERSET_CONFIG_PATH` specifying the full path of the `superset_config.py`.
|
||||
|
||||
For example, if deploying on Superset directly on a Linux-based system where your `superset_config.py` is under `/app` directory, you can run:
|
||||
|
||||
```bash
|
||||
export SUPERSET_CONFIG_PATH=/app/superset_config.py
|
||||
```
|
||||
|
||||
If you are using your own custom Dockerfile with official Superset image as base image, then you can add your overrides as shown below:
|
||||
|
||||
```bash
|
||||
COPY --chown=superset superset_config.py /app/
|
||||
ENV SUPERSET_CONFIG_PATH /app/superset_config.py
|
||||
```
|
||||
|
||||
Docker compose deployments handle application configuration differently. See [https://github.com/apache/superset/tree/master/docker#readme](https://github.com/apache/superset/tree/master/docker#readme) for details.
|
||||
|
||||
The following is an example of just a few of the parameters you can set in your `superset_config.py` file:
|
||||
|
||||
```
|
||||
# Superset specific config
|
||||
ROW_LIMIT = 5000
|
||||
|
||||
# Flask App Builder configuration
|
||||
# Your App secret key will be used for securely signing the session cookie
|
||||
# and encrypting sensitive information on the database
|
||||
# Make sure you are changing this key for your deployment with a strong key.
|
||||
# Alternatively you can set it with `SUPERSET_SECRET_KEY` environment variable.
|
||||
# You MUST set this for production environments or the server will refuse
|
||||
# to start and you will see an error in the logs accordingly.
|
||||
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
|
||||
|
||||
# The SQLAlchemy connection string to your database backend
|
||||
# This connection defines the path to the database that stores your
|
||||
# superset metadata (slices, connections, tables, dashboards, ...).
|
||||
# Note that the connection information to connect to the datasources
|
||||
# you want to explore are managed directly in the web UI
|
||||
# The check_same_thread=false property ensures the sqlite client does not attempt
|
||||
# to enforce single-threaded access, which may be problematic in some edge cases
|
||||
SQLALCHEMY_DATABASE_URI = 'sqlite:////path/to/superset.db?check_same_thread=false'
|
||||
|
||||
# Flask-WTF flag for CSRF
|
||||
WTF_CSRF_ENABLED = True
|
||||
# Add endpoints that need to be exempt from CSRF protection
|
||||
WTF_CSRF_EXEMPT_LIST = []
|
||||
# A CSRF token that expires in 1 year
|
||||
WTF_CSRF_TIME_LIMIT = 60 * 60 * 24 * 365
|
||||
|
||||
# Set this API key to enable Mapbox visualizations
|
||||
MAPBOX_API_KEY = ''
|
||||
```
|
||||
|
||||
All the parameters and default values defined in
|
||||
[https://github.com/apache/superset/blob/master/superset/config.py](https://github.com/apache/superset/blob/master/superset/config.py)
|
||||
can be altered in your local `superset_config.py`. Administrators will want to read through the file
|
||||
to understand what can be configured locally as well as the default values in place.
|
||||
|
||||
Since `superset_config.py` acts as a Flask configuration module, it can be used to alter the
|
||||
settings Flask itself, as well as Flask extensions like `flask-wtf`, `flask-caching`, `flask-migrate`,
|
||||
and `flask-appbuilder`. Flask App Builder, the web framework used by Superset, offers many
|
||||
configuration settings. Please consult the
|
||||
[Flask App Builder Documentation](https://flask-appbuilder.readthedocs.org/en/latest/config.html)
|
||||
for more information on how to configure it.
|
||||
|
||||
Make sure to change:
|
||||
|
||||
- `SQLALCHEMY_DATABASE_URI`: by default it is stored at ~/.superset/superset.db
|
||||
- `SECRET_KEY`: to a long random string
|
||||
|
||||
If you need to exempt endpoints from CSRF (e.g. if you are running a custom auth postback endpoint),
|
||||
you can add the endpoints to `WTF_CSRF_EXEMPT_LIST`:
|
||||
|
||||
```
|
||||
WTF_CSRF_EXEMPT_LIST = [‘’]
|
||||
```
|
||||
|
||||
### Specifying a SECRET_KEY
|
||||
|
||||
#### Adding an initial SECRET_KEY
|
||||
|
||||
Superset requires a user-specified SECRET_KEY to start up. This requirement was [added in version 2.1.0 to force secure configurations](https://preset.io/blog/superset-security-update-default-secret_key-vulnerability/). Add a strong SECRET_KEY to your `superset_config.py` file like:
|
||||
|
||||
```python
|
||||
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
|
||||
```
|
||||
|
||||
You can generate a strong secure key with `openssl rand -base64 42`.
|
||||
|
||||
:::caution Your secret key will be used for securely signing session cookies
|
||||
and encrypting sensitive information stored in Superset's application metadata database.
|
||||
Make sure you are changing this key for your deployment with a strong key.
|
||||
|
||||
#### Rotating to a newer SECRET_KEY
|
||||
|
||||
If you wish to change your existing SECRET_KEY, add the existing SECRET_KEY to your `superset_config.py` file as
|
||||
`PREVIOUS_SECRET_KEY = `and provide your new key as `SECRET_KEY =`. You can find your current SECRET_KEY with these
|
||||
commands - if running Superset with Docker, execute from within the Superset application container:
|
||||
|
||||
```python
|
||||
superset shell
|
||||
from flask import current_app; print(current_app.config["SECRET_KEY"])
|
||||
```
|
||||
|
||||
Save your `superset_config.py` with these values and then run `superset re-encrypt-secrets`.
|
||||
|
||||
### Using a production metastore
|
||||
|
||||
By default, Superset is configured to use SQLite, which is a simple and fast way to get started
|
||||
(without requiring any installation). However, for production environments,
|
||||
using SQLite is highly discouraged due to security, scalability, and data integrity reasons.
|
||||
It's important to use only the supported database engines and consider using a different
|
||||
database engine on a separate host or container.
|
||||
|
||||
Superset supports the following database engines/versions:
|
||||
|
||||
| Database Engine | Supported Versions |
|
||||
| ----------------------------------------- | ---------------------------------- |
|
||||
| [PostgreSQL](https://www.postgresql.org/) | 10.X, 11.X, 12.X, 13.X, 14.X, 15.X |
|
||||
| [MySQL](https://www.mysql.com/) | 5.7, 8.X |
|
||||
|
||||
Use the following database drivers and connection strings:
|
||||
|
||||
| Database | PyPI package | Connection String |
|
||||
| ----------------------------------------- | ------------------------- | ---------------------------------------------------------------------- |
|
||||
| [PostgreSQL](https://www.postgresql.org/) | `pip install psycopg2` | `postgresql://<UserName>:<DBPassword>@<Database Host>/<Database Name>` |
|
||||
| [MySQL](https://www.mysql.com/) | `pip install mysqlclient` | `mysql://<UserName>:<DBPassword>@<Database Host>/<Database Name>` |
|
||||
|
||||
To configure Superset metastore set `SQLALCHEMY_DATABASE_URI` config key on `superset_config`
|
||||
to the appropriate connection string.
|
||||
|
||||
### Running on a WSGI HTTP Server
|
||||
|
||||
While you can run Superset on NGINX or Apache, we recommend using Gunicorn in async mode. This
|
||||
enables impressive concurrency even and is fairly easy to install and configure. Please refer to the
|
||||
documentation of your preferred technology to set up this Flask WSGI application in a way that works
|
||||
well in your environment. Here’s an async setup known to work well in production:
|
||||
|
||||
```
|
||||
-w 10 \
|
||||
-k gevent \
|
||||
--worker-connections 1000 \
|
||||
--timeout 120 \
|
||||
-b 0.0.0.0:6666 \
|
||||
--limit-request-line 0 \
|
||||
--limit-request-field_size 0 \
|
||||
--statsd-host localhost:8125 \
|
||||
"superset.app:create_app()"
|
||||
```
|
||||
|
||||
Refer to the [Gunicorn documentation](https://docs.gunicorn.org/en/stable/design.html) for more
|
||||
information. _Note that the development web server (`superset run` or `flask run`) is not intended
|
||||
for production use._
|
||||
|
||||
If you're not using Gunicorn, you may want to disable the use of `flask-compress` by setting
|
||||
`COMPRESS_REGISTER = False` in your `superset_config.py`.
|
||||
|
||||
Currently, Google BigQuery python sdk is not compatible with `gevent`, due to some dynamic monkeypatching on python core library by `gevent`.
|
||||
So, when you use `BigQuery` datasource on Superset, you have to use `gunicorn` worker type except `gevent`.
|
||||
|
||||
### HTTPS Configuration
|
||||
|
||||
You can configure HTTPS upstream via a load balancer or a reverse proxy (such as nginx) and do SSL/TLS Offloading before traffic reaches the Superset application. In this setup, local traffic from a Celery worker taking a snapshot of a chart for Alerts & Reports can access Superset at a `http://` URL, from behind the ingress point.
|
||||
You can also configure [SSL in Gunicorn](https://docs.gunicorn.org/en/stable/settings.html#ssl) (the Python webserver) if you are using an official Superset Docker image.
|
||||
|
||||
### Configuration Behind a Load Balancer
|
||||
|
||||
If you are running superset behind a load balancer or reverse proxy (e.g. NGINX or ELB on AWS), you
|
||||
may need to utilize a healthcheck endpoint so that your load balancer knows if your superset
|
||||
instance is running. This is provided at `/health` which will return a 200 response containing “OK”
|
||||
if the webserver is running.
|
||||
|
||||
If the load balancer is inserting `X-Forwarded-For/X-Forwarded-Proto` headers, you should set
|
||||
`ENABLE_PROXY_FIX = True` in the superset config file (`superset_config.py`) to extract and use the
|
||||
headers.
|
||||
|
||||
In case the reverse proxy is used for providing SSL encryption, an explicit definition of the
|
||||
`X-Forwarded-Proto` may be required. For the Apache webserver this can be set as follows:
|
||||
|
||||
```
|
||||
RequestHeader set X-Forwarded-Proto "https"
|
||||
```
|
||||
|
||||
### Custom OAuth2 Configuration
|
||||
|
||||
Superset is built on Flask-AppBuilder (FAB), which supports many providers out of the box
|
||||
(GitHub, Twitter, LinkedIn, Google, Azure, etc). Beyond those, Superset can be configured to connect
|
||||
with other OAuth2 Authorization Server implementations that support “code” authorization.
|
||||
|
||||
Make sure the pip package [`Authlib`](https://authlib.org/) is installed on the webserver.
|
||||
|
||||
First, configure authorization in Superset `superset_config.py`.
|
||||
|
||||
```python
|
||||
from flask_appbuilder.security.manager import AUTH_OAUTH
|
||||
|
||||
# Set the authentication type to OAuth
|
||||
AUTH_TYPE = AUTH_OAUTH
|
||||
|
||||
OAUTH_PROVIDERS = [
|
||||
{ 'name':'egaSSO',
|
||||
'token_key':'access_token', # Name of the token in the response of access_token_url
|
||||
'icon':'fa-address-card', # Icon for the provider
|
||||
'remote_app': {
|
||||
'client_id':'myClientId', # Client Id (Identify Superset application)
|
||||
'client_secret':'MySecret', # Secret for this Client Id (Identify Superset application)
|
||||
'client_kwargs':{
|
||||
'scope': 'read' # Scope for the Authorization
|
||||
},
|
||||
'access_token_method':'POST', # HTTP Method to call access_token_url
|
||||
'access_token_params':{ # Additional parameters for calls to access_token_url
|
||||
'client_id':'myClientId'
|
||||
},
|
||||
'jwks_uri':'https://myAuthorizationServe/adfs/discovery/keys', # may be required to generate token
|
||||
'access_token_headers':{ # Additional headers for calls to access_token_url
|
||||
'Authorization': 'Basic Base64EncodedClientIdAndSecret'
|
||||
},
|
||||
'api_base_url':'https://myAuthorizationServer/oauth2AuthorizationServer/',
|
||||
'access_token_url':'https://myAuthorizationServer/oauth2AuthorizationServer/token',
|
||||
'authorize_url':'https://myAuthorizationServer/oauth2AuthorizationServer/authorize'
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
# Will allow user self registration, allowing to create Flask users from Authorized User
|
||||
AUTH_USER_REGISTRATION = True
|
||||
|
||||
# The default user self registration role
|
||||
AUTH_USER_REGISTRATION_ROLE = "Public"
|
||||
```
|
||||
|
||||
Then, create a `CustomSsoSecurityManager` that extends `SupersetSecurityManager` and overrides
|
||||
`oauth_user_info`:
|
||||
|
||||
```python
|
||||
import logging
|
||||
from superset.security import SupersetSecurityManager
|
||||
|
||||
class CustomSsoSecurityManager(SupersetSecurityManager):
|
||||
|
||||
def oauth_user_info(self, provider, response=None):
|
||||
logging.debug("Oauth2 provider: {0}.".format(provider))
|
||||
if provider == 'egaSSO':
|
||||
# As example, this line request a GET to base_url + '/' + userDetails with Bearer Authentication,
|
||||
# and expects that authorization server checks the token, and response with user details
|
||||
me = self.appbuilder.sm.oauth_remotes[provider].get('userDetails').data
|
||||
logging.debug("user_data: {0}".format(me))
|
||||
return { 'name' : me['name'], 'email' : me['email'], 'id' : me['user_name'], 'username' : me['user_name'], 'first_name':'', 'last_name':''}
|
||||
...
|
||||
```
|
||||
|
||||
This file must be located at the same directory than `superset_config.py` with the name
|
||||
`custom_sso_security_manager.py`. Finally, add the following 2 lines to `superset_config.py`:
|
||||
|
||||
```
|
||||
from custom_sso_security_manager import CustomSsoSecurityManager
|
||||
CUSTOM_SECURITY_MANAGER = CustomSsoSecurityManager
|
||||
```
|
||||
|
||||
**Notes**
|
||||
|
||||
- The redirect URL will be `https://<superset-webserver>/oauth-authorized/<provider-name>`
|
||||
When configuring an OAuth2 authorization provider if needed. For instance, the redirect URL will
|
||||
be `https://<superset-webserver>/oauth-authorized/egaSSO` for the above configuration.
|
||||
|
||||
- If an OAuth2 authorization server supports OpenID Connect 1.0, you could configure its configuration
|
||||
document URL only without providing `api_base_url`, `access_token_url`, `authorize_url` and other
|
||||
required options like user info endpoint, jwks uri etc. For instance:
|
||||
```python
|
||||
OAUTH_PROVIDERS = [
|
||||
{ 'name':'egaSSO',
|
||||
'token_key':'access_token', # Name of the token in the response of access_token_url
|
||||
'icon':'fa-address-card', # Icon for the provider
|
||||
'remote_app': {
|
||||
'client_id':'myClientId', # Client Id (Identify Superset application)
|
||||
'client_secret':'MySecret', # Secret for this Client Id (Identify Superset application)
|
||||
'server_metadata_url': 'https://myAuthorizationServer/.well-known/openid-configuration'
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### LDAP Authentication
|
||||
|
||||
FAB supports authenticating user credentials against an LDAP server.
|
||||
To use LDAP you must install the [python-ldap](https://www.python-ldap.org/en/latest/installing.html) package.
|
||||
See [FAB's LDAP documentation](https://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-ldap)
|
||||
for details.
|
||||
|
||||
### Mapping LDAP or OAUTH groups to Superset roles
|
||||
|
||||
AUTH_ROLES_MAPPING in Flask-AppBuilder is a dictionary that maps from LDAP/OAUTH group names to FAB roles.
|
||||
It is used to assign roles to users who authenticate using LDAP or OAuth.
|
||||
|
||||
#### Mapping OAUTH groups to Superset roles
|
||||
|
||||
The following `AUTH_ROLES_MAPPING` dictionary would map the OAUTH group "superset_users" to the Superset roles "Gamma" as well as "Alpha", and the OAUTH group "superset_admins" to the Superset role "Admin".
|
||||
|
||||
```python
|
||||
AUTH_ROLES_MAPPING = {
|
||||
"superset_users": ["Gamma","Alpha"],
|
||||
"superset_admins": ["Admin"],
|
||||
}
|
||||
```
|
||||
#### Mapping LDAP groups to Superset roles
|
||||
|
||||
The following `AUTH_ROLES_MAPPING` dictionary would map the LDAP DN "cn=superset_users,ou=groups,dc=example,dc=com" to the Superset roles "Gamma" as well as "Alpha", and the LDAP DN "cn=superset_admins,ou=groups,dc=example,dc=com" to the Superset role "Admin".
|
||||
|
||||
```python
|
||||
AUTH_ROLES_MAPPING = {
|
||||
"cn=superset_users,ou=groups,dc=example,dc=com": ["Gamma","Alpha"],
|
||||
"cn=superset_admins,ou=groups,dc=example,dc=com": ["Admin"],
|
||||
}
|
||||
```
|
||||
Note: This requires `AUTH_LDAP_SEARCH` to be set. For more details, please see the [FAB Security documentation](https://flask-appbuilder.readthedocs.io/en/latest/security.html).
|
||||
|
||||
#### Syncing roles at login
|
||||
|
||||
You can also use the `AUTH_ROLES_SYNC_AT_LOGIN` configuration variable to control how often Flask-AppBuilder syncs the user's roles with the LDAP/OAUTH groups. If `AUTH_ROLES_SYNC_AT_LOGIN` is set to True, Flask-AppBuilder will sync the user's roles each time they log in. If `AUTH_ROLES_SYNC_AT_LOGIN` is set to False, Flask-AppBuilder will only sync the user's roles when they first register.
|
||||
|
||||
### Flask app Configuration Hook
|
||||
|
||||
`FLASK_APP_MUTATOR` is a configuration function that can be provided in your environment, receives
|
||||
the app object and can alter it in any way. For example, add `FLASK_APP_MUTATOR` into your
|
||||
`superset_config.py` to setup session cookie expiration time to 24 hours:
|
||||
|
||||
```python
|
||||
from flask import session
|
||||
from flask import Flask
|
||||
|
||||
|
||||
def make_session_permanent():
|
||||
'''
|
||||
Enable maxAge for the cookie 'session'
|
||||
'''
|
||||
session.permanent = True
|
||||
|
||||
# Set up max age of session to 24 hours
|
||||
PERMANENT_SESSION_LIFETIME = timedelta(hours=24)
|
||||
def FLASK_APP_MUTATOR(app: Flask) -> None:
|
||||
app.before_request_funcs.setdefault(None, []).append(make_session_permanent)
|
||||
```
|
||||
|
||||
### Feature Flags
|
||||
|
||||
To support a diverse set of users, Superset has some features that are not enabled by default. For
|
||||
example, some users have stronger security restrictions, while some others may not. So Superset
|
||||
allow users to enable or disable some features by config. For feature owners, you can add optional
|
||||
functionalities in Superset, but will be only affected by a subset of users.
|
||||
|
||||
You can enable or disable features with flag from `superset_config.py`:
|
||||
|
||||
```python
|
||||
FEATURE_FLAGS = {
|
||||
'PRESTO_EXPAND_DATA': False,
|
||||
}
|
||||
```
|
||||
|
||||
A current list of feature flags can be found in [RESOURCES/FEATURE_FLAGS.md](https://github.com/apache/superset/blob/master/RESOURCES/FEATURE_FLAGS.md).
|
||||
@@ -1,3 +1,10 @@
|
||||
---
|
||||
title: Docker Builds
|
||||
hide_title: true
|
||||
sidebar_position: 5
|
||||
version: 1
|
||||
---
|
||||
|
||||
# Docker builds, images and tags
|
||||
|
||||
The Apache Superset community extensively uses Docker for development, release,
|
||||
@@ -1,5 +1,5 @@
|
||||
---
|
||||
title: Installing Locally Using Docker Compose
|
||||
title: Docker Compose
|
||||
hide_title: true
|
||||
sidebar_position: 3
|
||||
version: 1
|
||||
@@ -1,60 +0,0 @@
|
||||
---
|
||||
title: Event Logging
|
||||
hide_title: true
|
||||
sidebar_position: 7
|
||||
version: 1
|
||||
---
|
||||
|
||||
## Logging
|
||||
|
||||
### Event Logging
|
||||
|
||||
Superset by default logs special action events in its internal database (DBEventLogger). These logs can be accessed
|
||||
on the UI by navigating to **Security > Action Log**. You can freely customize these logs by
|
||||
implementing your own event log class.
|
||||
**When custom log class is enabled DBEventLogger is disabled and logs stop being populated in UI logs view.**
|
||||
To achieve both, custom log class should extend built-in DBEventLogger log class.
|
||||
|
||||
Here's an example of a simple JSON-to-stdout class:
|
||||
|
||||
```python
|
||||
def log(self, user_id, action, *args, **kwargs):
|
||||
records = kwargs.get('records', list())
|
||||
dashboard_id = kwargs.get('dashboard_id')
|
||||
slice_id = kwargs.get('slice_id')
|
||||
duration_ms = kwargs.get('duration_ms')
|
||||
referrer = kwargs.get('referrer')
|
||||
|
||||
for record in records:
|
||||
log = dict(
|
||||
action=action,
|
||||
json=record,
|
||||
dashboard_id=dashboard_id,
|
||||
slice_id=slice_id,
|
||||
duration_ms=duration_ms,
|
||||
referrer=referrer,
|
||||
user_id=user_id
|
||||
)
|
||||
print(json.dumps(log))
|
||||
```
|
||||
|
||||
End by updating your config to pass in an instance of the logger you want to use:
|
||||
|
||||
```
|
||||
EVENT_LOGGER = JSONStdOutEventLogger()
|
||||
```
|
||||
|
||||
### StatsD Logging
|
||||
|
||||
Superset can be instrumented to log events to StatsD if desired. Most endpoints hit are logged as
|
||||
well as key events like query start and end in SQL Lab.
|
||||
|
||||
To setup StatsD logging, it’s a matter of configuring the logger in your `superset_config.py`.
|
||||
|
||||
```python
|
||||
from superset.stats_logger import StatsdStatsLogger
|
||||
STATS_LOGGER = StatsdStatsLogger(host='localhost', port=8125, prefix='superset')
|
||||
```
|
||||
|
||||
Note that it’s also possible to implement your own logger by deriving
|
||||
`superset.stats_logger.BaseStatsLogger`.
|
||||
@@ -1,5 +1,5 @@
|
||||
---
|
||||
title: Installing on Kubernetes
|
||||
title: Kubernetes
|
||||
hide_title: true
|
||||
sidebar_position: 1
|
||||
version: 1
|
||||
@@ -304,7 +304,7 @@ configOverrides:
|
||||
|
||||
#### Enable Alerts and Reports
|
||||
|
||||
For this, as per the [Alerts and Reports doc](/docs/installation/email-reports), you will need to:
|
||||
For this, as per the [Alerts and Reports doc](/docs/configuration/alerts-reports), you will need to:
|
||||
|
||||
##### Install a supported webdriver in the Celery worker
|
||||
|
||||
@@ -1,53 +0,0 @@
|
||||
---
|
||||
title: Additional Networking Settings
|
||||
hide_title: true
|
||||
sidebar_position: 5
|
||||
version: 1
|
||||
---
|
||||
|
||||
## Additional Networking Settings
|
||||
|
||||
### CORS
|
||||
|
||||
To configure CORS, or cross-origin resource sharing, the following dependency must be installed:
|
||||
|
||||
```python
|
||||
pip install apache-superset[cors]
|
||||
```
|
||||
|
||||
The following keys in `superset_config.py` can be specified to configure CORS:
|
||||
|
||||
- `ENABLE_CORS`: Must be set to `True` in order to enable CORS
|
||||
- `CORS_OPTIONS`: options passed to Flask-CORS
|
||||
([documentation](https://flask-cors.corydolphin.com/en/latest/api.html#extension))
|
||||
|
||||
### Domain Sharding
|
||||
|
||||
Chrome allows up to 6 open connections per domain at a time. When there are more than 6 slices in
|
||||
dashboard, a lot of time fetch requests are queued up and wait for next available socket.
|
||||
[PR 5039](https://github.com/apache/superset/pull/5039) adds domain sharding to Superset,
|
||||
and this feature will be enabled by configuration only (by default Superset doesn’t allow
|
||||
cross-domain request).
|
||||
|
||||
Add the following setting in your `superset_config.py` file:
|
||||
|
||||
- `SUPERSET_WEBSERVER_DOMAINS`: list of allowed hostnames for domain sharding feature.
|
||||
|
||||
Please create your domain shards as subdomains of your main domain for authorization to
|
||||
work properly on new domains. For Example:
|
||||
|
||||
- `SUPERSET_WEBSERVER_DOMAINS=['superset-1.mydomain.com','superset-2.mydomain.com','superset-3.mydomain.com','superset-4.mydomain.com']`
|
||||
|
||||
or add the following setting in your `superset_config.py` file if domain shards are not subdomains of main domain.
|
||||
|
||||
- `SESSION_COOKIE_DOMAIN = '.mydomain.com'`
|
||||
|
||||
### Middleware
|
||||
|
||||
Superset allows you to add your own middleware. To add your own middleware, update the
|
||||
`ADDITIONAL_MIDDLEWARE` key in your `superset_config.py`. `ADDITIONAL_MIDDLEWARE` should be a list
|
||||
of your additional middleware classes.
|
||||
|
||||
For example, to use `AUTH_REMOTE_USER` from behind a proxy server like nginx, you have to add a
|
||||
simple middleware class to add the value of `HTTP_X_PROXY_REMOTE_USER` (or any other custom header
|
||||
from the proxy) to Gunicorn’s `REMOTE_USER` environment variable:
|
||||
@@ -1,5 +1,5 @@
|
||||
---
|
||||
title: Installing from PyPI
|
||||
title: PyPI
|
||||
hide_title: true
|
||||
sidebar_position: 2
|
||||
version: 1
|
||||
@@ -136,7 +136,7 @@ superset db upgrade
|
||||
```
|
||||
|
||||
:::tip
|
||||
Note that some configuration is mandatory for production instances of Superset. In particular, Superset will not start without a user-specified value of SECRET_KEY. Please see [Configuring Superset](/docs/installation/configuring-superset).
|
||||
Note that some configuration is mandatory for production instances of Superset. In particular, Superset will not start without a user-specified value of SECRET_KEY. Please see [Configuring Superset](/docs/configuration/configuring-superset).
|
||||
:::
|
||||
|
||||
Finish installing by running through the following commands:
|
||||
@@ -1,21 +0,0 @@
|
||||
---
|
||||
title: Setup SSH Tunneling
|
||||
hide_title: true
|
||||
sidebar_position: 12
|
||||
version: 1
|
||||
---
|
||||
|
||||
## SSH Tunneling
|
||||
|
||||
1. Turn on feature flag
|
||||
- Change [`SSH_TUNNELING`](https://github.com/apache/superset/blob/eb8386e3f0647df6d1bbde8b42073850796cc16f/superset/config.py#L489) to `True`
|
||||
- If you want to add more security when establishing the tunnel we allow users to overwrite the `SSHTunnelManager` class [here](https://github.com/apache/superset/blob/eb8386e3f0647df6d1bbde8b42073850796cc16f/superset/config.py#L507)
|
||||
- You can also set the [`SSH_TUNNEL_LOCAL_BIND_ADDRESS`](https://github.com/apache/superset/blob/eb8386e3f0647df6d1bbde8b42073850796cc16f/superset/config.py#L508) this the host address where the tunnel will be accessible on your VPC
|
||||
|
||||
2. Create database w/ ssh tunnel enabled
|
||||
- With the feature flag enabled you should now see ssh tunnel toggle.
|
||||
- Click the toggle to enables ssh tunneling and add your credentials accordingly.
|
||||
- Superset allows for 2 different type authentication (Basic + Private Key). These credentials should come from your service provider.
|
||||
|
||||
3. Verify data is flowing
|
||||
- Once SSH tunneling has been enabled, go to SQL Lab and write a query to verify data is properly flowing.
|
||||
@@ -1,385 +0,0 @@
|
||||
---
|
||||
title: SQL Templating
|
||||
hide_title: true
|
||||
sidebar_position: 11
|
||||
version: 1
|
||||
---
|
||||
|
||||
## SQL Templating
|
||||
|
||||
### Jinja Templates
|
||||
|
||||
SQL Lab and Explore supports [Jinja templating](https://jinja.palletsprojects.com/en/2.11.x/) in queries.
|
||||
To enable templating, the `ENABLE_TEMPLATE_PROCESSING` [feature flag](/docs/installation/configuring-superset#feature-flags) needs to be enabled in
|
||||
`superset_config.py`. When templating is enabled, python code can be embedded in virtual datasets and
|
||||
in Custom SQL in the filter and metric controls in Explore. By default, the following variables are
|
||||
made available in the Jinja context:
|
||||
|
||||
- `columns`: columns which to group by in the query
|
||||
- `filter`: filters applied in the query
|
||||
- `from_dttm`: start `datetime` value from the selected time range (`None` if undefined)
|
||||
- `to_dttm`: end `datetime` value from the selected time range (`None` if undefined)
|
||||
- `groupby`: columns which to group by in the query (deprecated)
|
||||
- `metrics`: aggregate expressions in the query
|
||||
- `row_limit`: row limit of the query
|
||||
- `row_offset`: row offset of the query
|
||||
- `table_columns`: columns available in the dataset
|
||||
- `time_column`: temporal column of the query (`None` if undefined)
|
||||
- `time_grain`: selected time grain (`None` if undefined)
|
||||
|
||||
For example, to add a time range to a virtual dataset, you can write the following:
|
||||
|
||||
```sql
|
||||
SELECT *
|
||||
FROM tbl
|
||||
WHERE dttm_col > '{{ from_dttm }}' and dttm_col < '{{ to_dttm }}'
|
||||
```
|
||||
|
||||
You can also use [Jinja's logic](https://jinja.palletsprojects.com/en/2.11.x/templates/#tests)
|
||||
to make your query robust to clearing the timerange filter:
|
||||
|
||||
```sql
|
||||
SELECT *
|
||||
FROM tbl
|
||||
WHERE (
|
||||
{% if from_dttm is not none %}
|
||||
dttm_col > '{{ from_dttm }}' AND
|
||||
{% endif %}
|
||||
{% if to_dttm is not none %}
|
||||
dttm_col < '{{ to_dttm }}' AND
|
||||
{% endif %}
|
||||
true
|
||||
)
|
||||
```
|
||||
|
||||
Note how the Jinja parameters are called within double brackets in the query, and without in the
|
||||
logic blocks.
|
||||
|
||||
To add custom functionality to the Jinja context, you need to overload the default Jinja
|
||||
context in your environment by defining the `JINJA_CONTEXT_ADDONS` in your superset configuration
|
||||
(`superset_config.py`). Objects referenced in this dictionary are made available for users to use
|
||||
where the Jinja context is made available.
|
||||
|
||||
```python
|
||||
JINJA_CONTEXT_ADDONS = {
|
||||
'my_crazy_macro': lambda x: x*2,
|
||||
}
|
||||
```
|
||||
|
||||
Default values for jinja templates can be specified via `Parameters` menu in the SQL Lab user interface.
|
||||
In the UI you can assign a set of parameters as JSON
|
||||
|
||||
```json
|
||||
{
|
||||
"my_table": "foo"
|
||||
}
|
||||
```
|
||||
The parameters become available in your SQL (example: `SELECT * FROM {{ my_table }}` ) by using Jinja templating syntax.
|
||||
SQL Lab template parameters are stored with the dataset as `TEMPLATE PARAMETERS`.
|
||||
|
||||
There is a special ``_filters`` parameter which can be used to test filters used in the jinja template.
|
||||
|
||||
```json
|
||||
{
|
||||
"_filters": [
|
||||
{
|
||||
"col": "action_type",
|
||||
"op": "IN",
|
||||
"val": ["sell", "buy"]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
```sql
|
||||
SELECT action, count(*) as times
|
||||
FROM logs
|
||||
WHERE action in {{ filter_values('action_type'))|where_in }}
|
||||
GROUP BY action
|
||||
```
|
||||
|
||||
Note ``_filters`` is not stored with the dataset. It's only used within the SQL Lab UI.
|
||||
|
||||
|
||||
Besides default Jinja templating, SQL lab also supports self-defined template processor by setting
|
||||
the `CUSTOM_TEMPLATE_PROCESSORS` in your superset configuration. The values in this dictionary
|
||||
overwrite the default Jinja template processors of the specified database engine. The example below
|
||||
configures a custom presto template processor which implements its own logic of processing macro
|
||||
template with regex parsing. It uses the `$` style macro instead of `{{ }}` style in Jinja
|
||||
templating.
|
||||
|
||||
By configuring it with `CUSTOM_TEMPLATE_PROCESSORS`, a SQL template on a presto database is
|
||||
processed by the custom one rather than the default one.
|
||||
|
||||
```python
|
||||
def DATE(
|
||||
ts: datetime, day_offset: SupportsInt = 0, hour_offset: SupportsInt = 0
|
||||
) -> str:
|
||||
"""Current day as a string."""
|
||||
day_offset, hour_offset = int(day_offset), int(hour_offset)
|
||||
offset_day = (ts + timedelta(days=day_offset, hours=hour_offset)).date()
|
||||
return str(offset_day)
|
||||
|
||||
class CustomPrestoTemplateProcessor(PrestoTemplateProcessor):
|
||||
"""A custom presto template processor."""
|
||||
|
||||
engine = "presto"
|
||||
|
||||
def process_template(self, sql: str, **kwargs) -> str:
|
||||
"""Processes a sql template with $ style macro using regex."""
|
||||
# Add custom macros functions.
|
||||
macros = {
|
||||
"DATE": partial(DATE, datetime.utcnow())
|
||||
} # type: Dict[str, Any]
|
||||
# Update with macros defined in context and kwargs.
|
||||
macros.update(self.context)
|
||||
macros.update(kwargs)
|
||||
|
||||
def replacer(match):
|
||||
"""Expand $ style macros with corresponding function calls."""
|
||||
macro_name, args_str = match.groups()
|
||||
args = [a.strip() for a in args_str.split(",")]
|
||||
if args == [""]:
|
||||
args = []
|
||||
f = macros[macro_name[1:]]
|
||||
return f(*args)
|
||||
|
||||
macro_names = ["$" + name for name in macros.keys()]
|
||||
pattern = r"(%s)\s*\(([^()]*)\)" % "|".join(map(re.escape, macro_names))
|
||||
return re.sub(pattern, replacer, sql)
|
||||
|
||||
CUSTOM_TEMPLATE_PROCESSORS = {
|
||||
CustomPrestoTemplateProcessor.engine: CustomPrestoTemplateProcessor
|
||||
}
|
||||
```
|
||||
|
||||
SQL Lab also includes a live query validation feature with pluggable backends. You can configure
|
||||
which validation implementation is used with which database engine by adding a block like the
|
||||
following to your configuration file:
|
||||
|
||||
```python
|
||||
FEATURE_FLAGS = {
|
||||
'SQL_VALIDATORS_BY_ENGINE': {
|
||||
'presto': 'PrestoDBSQLValidator',
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The available validators and names can be found in
|
||||
[sql_validators](https://github.com/apache/superset/tree/master/superset/sql_validators).
|
||||
|
||||
### Available Macros
|
||||
|
||||
In this section, we'll walkthrough the pre-defined Jinja macros in Superset.
|
||||
|
||||
**Current Username**
|
||||
|
||||
The `{{ current_username() }}` macro returns the `username` of the currently logged in user.
|
||||
|
||||
If you have caching enabled in your Superset configuration, then by default the `username` value will be used
|
||||
by Superset when calculating the cache key. A cache key is a unique identifier that determines if there's a
|
||||
cache hit in the future and Superset can retrieve cached data.
|
||||
|
||||
You can disable the inclusion of the `username` value in the calculation of the
|
||||
cache key by adding the following parameter to your Jinja code:
|
||||
|
||||
```
|
||||
{{ current_username(add_to_cache_keys=False) }}
|
||||
```
|
||||
|
||||
**Current User ID**
|
||||
|
||||
The `{{ current_user_id() }}` macro returns the account ID of the currently logged in user.
|
||||
|
||||
If you have caching enabled in your Superset configuration, then by default the account `id` value will be used
|
||||
by Superset when calculating the cache key. A cache key is a unique identifier that determines if there's a
|
||||
cache hit in the future and Superset can retrieve cached data.
|
||||
|
||||
You can disable the inclusion of the account `id` value in the calculation of the
|
||||
cache key by adding the following parameter to your Jinja code:
|
||||
|
||||
```
|
||||
{{ current_user_id(add_to_cache_keys=False) }}
|
||||
```
|
||||
|
||||
**Current User Email**
|
||||
|
||||
The `{{ current_user_email() }}` macro returns the email address of the currently logged in user.
|
||||
|
||||
If you have caching enabled in your Superset configuration, then by default the email address value will be used
|
||||
by Superset when calculating the cache key. A cache key is a unique identifier that determines if there's a
|
||||
cache hit in the future and Superset can retrieve cached data.
|
||||
|
||||
You can disable the inclusion of the email value in the calculation of the
|
||||
cache key by adding the following parameter to your Jinja code:
|
||||
|
||||
```
|
||||
{{ current_user_email(add_to_cache_keys=False) }}
|
||||
```
|
||||
|
||||
**Custom URL Parameters**
|
||||
|
||||
The `{{ url_param('custom_variable') }}` macro lets you define arbitrary URL
|
||||
parameters and reference them in your SQL code.
|
||||
|
||||
Here's a concrete example:
|
||||
|
||||
- You write the following query in SQL Lab:
|
||||
|
||||
```sql
|
||||
SELECT count(*)
|
||||
FROM ORDERS
|
||||
WHERE country_code = '{{ url_param('countrycode') }}'
|
||||
```
|
||||
|
||||
- You're hosting Superset at the domain www.example.com and you send your
|
||||
coworker in Spain the following SQL Lab URL `www.example.com/superset/sqllab?countrycode=ES`
|
||||
and your coworker in the USA the following SQL Lab URL `www.example.com/superset/sqllab?countrycode=US`
|
||||
- For your coworker in Spain, the SQL Lab query will be rendered as:
|
||||
|
||||
```sql
|
||||
SELECT count(*)
|
||||
FROM ORDERS
|
||||
WHERE country_code = 'ES'
|
||||
```
|
||||
|
||||
- For your coworker in the USA, the SQL Lab query will be rendered as:
|
||||
|
||||
```sql
|
||||
SELECT count(*)
|
||||
FROM ORDERS
|
||||
WHERE country_code = 'US'
|
||||
```
|
||||
|
||||
**Explicitly Including Values in Cache Key**
|
||||
|
||||
The `{{ cache_key_wrapper() }}` function explicitly instructs Superset to add a value to the
|
||||
accumulated list of values used in the calculation of the cache key.
|
||||
|
||||
This function is only needed when you want to wrap your own custom function return values
|
||||
in the cache key. You can gain more context
|
||||
[here](https://github.com/apache/superset/blob/efd70077014cbed62e493372d33a2af5237eaadf/superset/jinja_context.py#L133-L148).
|
||||
|
||||
Note that this function powers the caching of the `user_id` and `username` values
|
||||
in the `current_user_id()` and `current_username()` function calls (if you have caching enabled).
|
||||
|
||||
**Filter Values**
|
||||
|
||||
You can retrieve the value for a specific filter as a list using `{{ filter_values() }}`.
|
||||
|
||||
This is useful if:
|
||||
|
||||
- You want to use a filter component to filter a query where the name of filter component column doesn't match the one in the select statement
|
||||
- You want to have the ability for filter inside the main query for performance purposes
|
||||
|
||||
Here's a concrete example:
|
||||
|
||||
```sql
|
||||
SELECT action, count(*) as times
|
||||
FROM logs
|
||||
WHERE
|
||||
action in {{ filter_values('action_type')|where_in }}
|
||||
GROUP BY action
|
||||
```
|
||||
|
||||
There `where_in` filter converts the list of values from `filter_values('action_type')` into a string suitable for an `IN` expression.
|
||||
|
||||
**Filters for a Specific Column**
|
||||
|
||||
The `{{ get_filters() }}` macro returns the filters applied to a given column. In addition to
|
||||
returning the values (similar to how `filter_values()` does), the `get_filters()` macro
|
||||
returns the operator specified in the Explore UI.
|
||||
|
||||
This is useful if:
|
||||
|
||||
- You want to handle more than the IN operator in your SQL clause
|
||||
- You want to handle generating custom SQL conditions for a filter
|
||||
- You want to have the ability to filter inside the main query for speed purposes
|
||||
|
||||
Here's a concrete example:
|
||||
|
||||
```
|
||||
WITH RECURSIVE
|
||||
superiors(employee_id, manager_id, full_name, level, lineage) AS (
|
||||
SELECT
|
||||
employee_id,
|
||||
manager_id,
|
||||
full_name,
|
||||
1 as level,
|
||||
employee_id as lineage
|
||||
FROM
|
||||
employees
|
||||
WHERE
|
||||
1=1
|
||||
|
||||
{# Render a blank line #}
|
||||
{%- for filter in get_filters('full_name', remove_filter=True) -%}
|
||||
|
||||
{%- if filter.get('op') == 'IN' -%}
|
||||
AND
|
||||
full_name IN {{ filter.get('val')|where_in }}
|
||||
{%- endif -%}
|
||||
|
||||
{%- if filter.get('op') == 'LIKE' -%}
|
||||
AND
|
||||
full_name LIKE {{ "'" + filter.get('val') + "'" }}
|
||||
{%- endif -%}
|
||||
|
||||
{%- endfor -%}
|
||||
UNION ALL
|
||||
SELECT
|
||||
e.employee_id,
|
||||
e.manager_id,
|
||||
e.full_name,
|
||||
s.level + 1 as level,
|
||||
s.lineage
|
||||
FROM
|
||||
employees e,
|
||||
superiors s
|
||||
WHERE s.manager_id = e.employee_id
|
||||
)
|
||||
|
||||
SELECT
|
||||
employee_id, manager_id, full_name, level, lineage
|
||||
FROM
|
||||
superiors
|
||||
order by lineage, level
|
||||
```
|
||||
|
||||
**Datasets**
|
||||
|
||||
It's possible to query physical and virtual datasets using the `dataset` macro. This is useful if you've defined computed columns and metrics on your datasets, and want to reuse the definition in adhoc SQL Lab queries.
|
||||
|
||||
To use the macro, first you need to find the ID of the dataset. This can be done by going to the view showing all the datasets, hovering over the dataset you're interested in, and looking at its URL. For example, if the URL for a dataset is https://superset.example.org/explore/?dataset_type=table&dataset_id=42 its ID is 42.
|
||||
|
||||
Once you have the ID you can query it as if it were a table:
|
||||
|
||||
```
|
||||
SELECT * FROM {{ dataset(42) }} LIMIT 10
|
||||
```
|
||||
|
||||
If you want to select the metric definitions as well, in addition to the columns, you need to pass an additional keyword argument:
|
||||
|
||||
```
|
||||
SELECT * FROM {{ dataset(42, include_metrics=True) }} LIMIT 10
|
||||
```
|
||||
|
||||
Since metrics are aggregations, the resulting SQL expression will be grouped by all non-metric columns. You can specify a subset of columns to group by instead:
|
||||
|
||||
```
|
||||
SELECT * FROM {{ dataset(42, include_metrics=True, columns=["ds", "category"]) }} LIMIT 10
|
||||
```
|
||||
|
||||
**Metrics**
|
||||
|
||||
The `{{ metric('metric_key', dataset_id) }}` macro can be used to retrieve the metric SQL syntax from a dataset. This can be useful for different purposes:
|
||||
|
||||
- Override the metric label in the chart level
|
||||
- Combine multiple metrics in a calculation
|
||||
- Retrieve a metric syntax in SQL lab
|
||||
- Re-use metrics across datasets
|
||||
|
||||
This macro avoids copy/paste, allowing users to centralize the metric definition in the dataset layer.
|
||||
|
||||
The `dataset_id` parameter is optional, and if not provided Superset will use the current dataset from context (for example, when using this macro in the Chart Builder, by default the `macro_key` will be searched in the dataset powering the chart).
|
||||
The parameter can be used in SQL Lab, or when fetching a metric from another dataset.
|
||||
@@ -1,7 +1,7 @@
|
||||
---
|
||||
title: Upgrading Superset
|
||||
hide_title: true
|
||||
sidebar_position: 8
|
||||
sidebar_position: 4
|
||||
version: 1
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user