chore: various markdown warnings resolved (#30657)

Co-authored-by: Evan Rusackas <evan@preset.io>
This commit is contained in:
Emad Rad
2025-03-04 23:15:49 +03:30
committed by GitHub
parent 807dcddc28
commit 2b53b1800e
20 changed files with 158 additions and 159 deletions

View File

@@ -92,6 +92,7 @@ You can find documentation about each field in the default `config.py` in the Gi
You need to replace default values with your custom Redis, Slack and/or SMTP config.
Superset uses Celery beat and Celery worker(s) to send alerts and reports.
- The beat is the scheduler that tells the worker when to perform its tasks. This schedule is defined when you create the alert or report.
- The worker will process the tasks that need to be performed when an alert or report is fired.
@@ -187,7 +188,6 @@ ALERT_REPORTS_EXECUTORS = [FixedExecutor("admin")]
Please refer to `ExecutorType` in the codebase for other executor types.
**Important notes**
- Be mindful of the concurrency setting for celery (using `-c 4`). Selenium/webdriver instances can
@@ -199,7 +199,6 @@ Please refer to `ExecutorType` in the codebase for other executor types.
- Adjust `WEBDRIVER_BASEURL` in your configuration file if celery workers cant access Superset via
its default value of `http://0.0.0.0:8080/`.
It's also possible to specify a minimum interval between each report's execution through the config file:
``` python
@@ -305,6 +304,7 @@ One symptom of an invalid connection to an email server is receiving an error of
Confirm via testing that your outbound email configuration is correct. Here is the simplest test, for an un-authenticated email SMTP email service running on port 25. If you are sending over SSL, for instance, study how [Superset's codebase sends emails](https://github.com/apache/superset/blob/master/superset/utils/core.py#L818) and then test with those commands and arguments.
Start Python in your worker environment, replace all example values, and run:
```python
import smtplib
from email.mime.multipart import MIMEMultipart
@@ -326,6 +326,7 @@ mailserver.quit()
This should send an email.
Possible fixes:
- Some cloud hosts disable outgoing unauthenticated SMTP email to prevent spam. For instance, [Azure blocks port 25 by default on some machines](https://learn.microsoft.com/en-us/azure/virtual-network/troubleshoot-outbound-smtp-connectivity). Enable that port or use another sending method.
- Use another set of SMTP credentials that you verify works in this setup.

View File

@@ -42,13 +42,13 @@ CELERY_CONFIG = CeleryConfig
To start a Celery worker to leverage the configuration, run the following command:
```
```bash
celery --app=superset.tasks.celery_app:app worker --pool=prefork -O fair -c 4
```
To start a job which schedules periodic background jobs, run the following command:
```
```bash
celery --app=superset.tasks.celery_app:app beat
```
@@ -93,12 +93,12 @@ issues arise. Please clear your existing results cache store when upgrading an e
Flower is a web based tool for monitoring the Celery cluster which you can install from pip:
```python
```bash
pip install flower
```
You can run flower using:
```
```bash
celery --app=superset.tasks.celery_app:app flower
```

View File

@@ -17,6 +17,7 @@ Caching can be configured by providing dictionaries in
`superset_config.py` that comply with [the Flask-Caching config specifications](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching).
The following cache configurations can be customized in this way:
- Dashboard filter state (required): `FILTER_STATE_CACHE_CONFIG`.
- Explore chart form data (required): `EXPLORE_FORM_DATA_CACHE_CONFIG`
- Metadata cache (optional): `CACHE_CONFIG`
@@ -99,7 +100,6 @@ from superset.tasks.types import FixedExecutor
THUMBNAIL_EXECUTORS = [FixedExecutor("admin")]
```
For this feature you will need a cache system and celery workers. All thumbnails are stored on cache
and are processed asynchronously by the workers.

View File

@@ -117,7 +117,7 @@ Your deployment must use a complex, unique key.
### Rotating to a newer SECRET_KEY
If you wish to change your existing SECRET_KEY, add the existing SECRET_KEY to your `superset_config.py` file as
`PREVIOUS_SECRET_KEY = `and provide your new key as `SECRET_KEY =`. You can find your current SECRET_KEY with these
`PREVIOUS_SECRET_KEY =`and provide your new key as `SECRET_KEY =`. You can find your current SECRET_KEY with these
commands - if running Superset with Docker, execute from within the Superset application container:
```python
@@ -300,6 +300,7 @@ CUSTOM_SECURITY_MANAGER = CustomSsoSecurityManager
- If an OAuth2 authorization server supports OpenID Connect 1.0, you could configure its configuration
document URL only without providing `api_base_url`, `access_token_url`, `authorize_url` and other
required options like user info endpoint, jwks uri etc. For instance:
```python
OAUTH_PROVIDERS = [
{ 'name':'egaSSO',
@@ -313,12 +314,15 @@ CUSTOM_SECURITY_MANAGER = CustomSsoSecurityManager
}
]
```
### Keycloak-Specific Configuration using Flask-OIDC
If you are using Keycloak as OpenID Connect 1.0 Provider, the above configuration based on [`Authlib`](https://authlib.org/) might not work. In this case using [`Flask-OIDC`](https://pypi.org/project/flask-oidc/) is a viable option.
Make sure the pip package [`Flask-OIDC`](https://pypi.org/project/flask-oidc/) is installed on the webserver. This was successfully tested using version 2.2.0. This package requires [`Flask-OpenID`](https://pypi.org/project/Flask-OpenID/) as a dependency.
The following code defines a new security manager. Add it to a new file named `keycloak_security_manager.py`, placed in the same directory as your `superset_config.py` file.
```python
from flask_appbuilder.security.manager import AUTH_OID
from superset.security import SupersetSecurityManager
@@ -373,7 +377,9 @@ class AuthOIDCView(AuthOIDView):
return redirect(
oidc.client_secrets.get('issuer') + '/protocol/openid-connect/logout?redirect_uri=' + quote(redirect_url))
```
Then add to your `superset_config.py` file:
```python
from keycloak_security_manager import OIDCSecurityManager
from flask_appbuilder.security.manager import AUTH_OID, AUTH_REMOTE_USER, AUTH_DB, AUTH_LDAP, AUTH_OAUTH
@@ -393,7 +399,9 @@ AUTH_USER_REGISTRATION = True
# The default user self registration role
AUTH_USER_REGISTRATION_ROLE = 'Public'
```
Store your client-specific OpenID information in a file called `client_secret.json`. Create this file in the same directory as `superset_config.py`:
```json
{
"<myOpenIDProvider>": {
@@ -410,6 +418,7 @@ Store your client-specific OpenID information in a file called `client_secret.js
}
}
```
## LDAP Authentication
FAB supports authenticating user credentials against an LDAP server.
@@ -432,6 +441,7 @@ AUTH_ROLES_MAPPING = {
"superset_admins": ["Admin"],
}
```
### Mapping LDAP groups to Superset roles
The following `AUTH_ROLES_MAPPING` dictionary would map the LDAP DN "cn=superset_users,ou=groups,dc=example,dc=com" to the Superset roles "Gamma" as well as "Alpha", and the LDAP DN "cn=superset_admins,ou=groups,dc=example,dc=com" to the Superset role "Admin".
@@ -442,6 +452,7 @@ AUTH_ROLES_MAPPING = {
"cn=superset_admins,ou=groups,dc=example,dc=com": ["Admin"],
}
```
Note: This requires `AUTH_LDAP_SEARCH` to be set. For more details, please see the [FAB Security documentation](https://flask-appbuilder.readthedocs.io/en/latest/security.html).
### Syncing roles at login

View File

@@ -31,16 +31,15 @@ install new database drivers into your Superset configuration.
### Supported Databases and Dependencies
Some of the recommended packages are shown below. Please refer to
[pyproject.toml](https://github.com/apache/superset/blob/master/pyproject.toml) for the versions that
are compatible with Superset.
| <div style={{width: '150px'}}>Database</div> | PyPI package | Connection String |
| --------------------------------------------------------- | ---------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
| [AWS Athena](/docs/configuration/databases#aws-athena) | `pip install pyathena[pandas]` , `pip install PyAthenaJDBC` | `awsathena+rest://{access_key_id}:{access_key}@athena.{region}.amazonaws.com/{schema}?s3_staging_dir={s3_staging_dir}&... ` |
| [AWS Athena](/docs/configuration/databases#aws-athena) | `pip install pyathena[pandas]` , `pip install PyAthenaJDBC` | `awsathena+rest://{access_key_id}:{access_key}@athena.{region}.amazonaws.com/{schema}?s3_staging_dir={s3_staging_dir}&...` |
| [AWS DynamoDB](/docs/configuration/databases#aws-dynamodb) | `pip install pydynamodb` | `dynamodb://{access_key_id}:{secret_access_key}@dynamodb.{region_name}.amazonaws.com?connector=superset` |
| [AWS Redshift](/docs/configuration/databases#aws-redshift) | `pip install sqlalchemy-redshift` | ` redshift+psycopg2://<userName>:<DBPassword>@<AWS End Point>:5439/<Database Name>` |
| [AWS Redshift](/docs/configuration/databases#aws-redshift) | `pip install sqlalchemy-redshift` | `redshift+psycopg2://<userName>:<DBPassword>@<AWS End Point>:5439/<Database Name>` |
| [Apache Doris](/docs/configuration/databases#apache-doris) | `pip install pydoris` | `doris://<User>:<Password>@<Host>:<Port>/<Catalog>.<Database>` |
| [Apache Drill](/docs/configuration/databases#apache-drill) | `pip install sqlalchemy-drill` | `drill+sadrill://<username>:<password>@<host>:<port>/<storage_plugin>`, often useful: `?use_ssl=True/False` |
| [Apache Druid](/docs/configuration/databases#apache-druid) | `pip install pydruid` | `druid://<User>:<password>@<Host>:<Port-default-9088>/druid/v2/sql` |
@@ -85,6 +84,7 @@ are compatible with Superset.
| [Vertica](/docs/configuration/databases#vertica) | `pip install sqlalchemy-vertica-python` | `vertica+vertica_python://<UserName>:<DBPassword>@<Database Host>/<Database Name>` |
| [YDB](/docs/configuration/databases#ydb) | `pip install ydb-sqlalchemy` | `ydb://{host}:{port}/{database_name}` |
| [YugabyteDB](/docs/configuration/databases#yugabytedb) | `pip install psycopg2` | `postgresql://<UserName>:<DBPassword>@<Database Host>/<Database Name>` |
---
Note that many other databases are supported, the main criteria being the existence of a functional
@@ -185,7 +185,6 @@ purposes of isolating the problem.
Repeat this process for each type of database you want Superset to connect to.
### Database-specific Instructions
#### Ascend.io
@@ -211,14 +210,12 @@ You'll need the following setting values to form the connection string:
- **Catalog**: Catalog Name
- **Database**: Database Name
Here's what the connection string looks like:
```
doris://<User>:<Password>@<Host>:<Port>/<Catalog>.<Database>
```
#### AWS Athena
##### PyAthenaJDBC
@@ -248,6 +245,7 @@ awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name
```
The PyAthena library also allows to assume a specific IAM role which you can define by adding following parameters in Superset's Athena database connection UI under ADVANCED --> Other --> ENGINE PARAMETERS.
```json
{
"connect_args": {
@@ -270,7 +268,6 @@ dynamodb://{aws_access_key_id}:{aws_secret_access_key}@dynamodb.{region_name}.am
To get more documentation, please visit: [PyDynamoDB WIKI](https://github.com/passren/PyDynamoDB/wiki/5.-Superset).
#### AWS Redshift
The [sqlalchemy-redshift](https://pypi.org/project/sqlalchemy-redshift/) library is the recommended
@@ -286,7 +283,6 @@ You'll need to set the following values to form the connection string:
- **Database Name**: Database Name
- **Port**: default 5439
##### psycopg2
Here's what the SQLALCHEMY URI looks like:
@@ -295,7 +291,6 @@ Here's what the SQLALCHEMY URI looks like:
redshift+psycopg2://<userName>:<DBPassword>@<AWS End Point>:5439/<Database Name>
```
##### redshift_connector
Here's what the SQLALCHEMY URI looks like:
@@ -304,8 +299,7 @@ Here's what the SQLALCHEMY URI looks like:
redshift+redshift_connector://<userName>:<DBPassword>@<AWS End Point>:5439/<Database Name>
```
###### Using IAM-based credentials with Redshift cluster:
###### Using IAM-based credentials with Redshift cluster
[Amazon redshift cluster](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html) also supports generating temporary IAM-based database user credentials.
@@ -316,10 +310,10 @@ You have to define the following arguments in Superset's redshift database conne
```
{"connect_args":{"iam":true,"database":"<database>","cluster_identifier":"<cluster_identifier>","db_user":"<db_user>"}}
```
and SQLALCHEMY URI should be set to `redshift+redshift_connector://`
###### Using IAM-based credentials with Redshift serverless:
###### Using IAM-based credentials with Redshift serverless
[Redshift serverless](https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-whatis.html) supports connection using IAM roles.
@@ -331,8 +325,6 @@ You have to define the following arguments in Superset's redshift database conne
{"connect_args":{"iam":true,"is_serverless":true,"serverless_acct_id":"<aws account number>","serverless_work_group":"<redshift work group>","database":"<database>","user":"IAMR:<superset iam role name>"}}
```
#### ClickHouse
To use ClickHouse with Superset, you will need to install the `clickhouse-connect` Python library:
@@ -365,8 +357,6 @@ uses the default user without a password (and doesn't encrypt the connection):
clickhousedb://localhost/default
```
#### CockroachDB
The recommended connector library for CockroachDB is
@@ -378,13 +368,12 @@ The expected connection string is formatted as follows:
cockroachdb://root@{hostname}:{port}/{database}?sslmode=disable
```
#### Couchbase
The Couchbase's Superset connection is designed to support two services: Couchbase Analytics and Couchbase Columnar.
The recommended connector library for couchbase is
[couchbase-sqlalchemy](https://github.com/couchbase/couchbase-sqlalchemy).
```
pip install couchbase-sqlalchemy
```
@@ -395,22 +384,25 @@ The expected connection string is formatted as follows:
couchbase://{username}:{password}@{hostname}:{port}?truststorepath={certificate path}?ssl={true/false}
```
#### CrateDB
The connector library for CrateDB is [sqlalchemy-cratedb].
We recommend to add the following item to your `requirements.txt` file:
```
sqlalchemy-cratedb>=0.40.1,<1
```
An SQLAlchemy connection string for [CrateDB Self-Managed] on localhost,
for evaluation purposes, looks like this:
```
crate://crate@127.0.0.1:4200
```
An SQLAlchemy connection string for connecting to [CrateDB Cloud] looks like
this:
```
crate://<username>:<password>@<clustername>.cratedb.net:4200/?ssl=true
```
@@ -418,6 +410,7 @@ crate://<username>:<password>@<clustername>.cratedb.net:4200/?ssl=true
Follow the steps [here](/docs/configuration/databases#installing-database-drivers)
to install the CrateDB connector package when setting up Superset locally using
Docker Compose.
```
echo "sqlalchemy-cratedb" >> ./docker/requirements-local.txt
```
@@ -426,7 +419,6 @@ echo "sqlalchemy-cratedb" >> ./docker/requirements-local.txt
[CrateDB Self-Managed]: https://cratedb.com/product/self-managed
[sqlalchemy-cratedb]: https://pypi.org/project/sqlalchemy-cratedb/
#### Databend
The recommended connector library for Databend is [databend-sqlalchemy](https://pypi.org/project/databend-sqlalchemy/).
@@ -444,7 +436,6 @@ Here's a connection string example of Superset connecting to a Databend database
databend://user:password@localhost:8000/default?secure=false
```
#### Databricks
Databricks now offer a native DB API 2.0 driver, `databricks-sql-connector`, that can be used with the `sqlalchemy-databricks` dialect. You can install both with:
@@ -528,7 +519,6 @@ For a connection to a SQL endpoint you need to use the HTTP path from the endpoi
{"connect_args": {"http_path": "/sql/1.0/endpoints/****", "driver_path": "/path/to/odbc/driver"}}
```
#### Denodo
The recommended connector library for Denodo is
@@ -540,7 +530,6 @@ The expected connection string is formatted as follows (default port is 9996):
denodo://{username}:{password}@{hostname}:{port}/{database}
```
#### Dremio
The recommended connector library for Dremio is
@@ -561,7 +550,6 @@ dremio+flight://{username}:{password}@{host}:{port}/dremio
This [blog post by Dremio](https://www.dremio.com/tutorials/dremio-apache-superset/) has some
additional helpful instructions on connecting Superset to Dremio.
#### Apache Drill
##### SQLAlchemy
@@ -603,8 +591,6 @@ We recommend reading the
the [GitHub README](https://github.com/JohnOmernik/sqlalchemy-drill#usage-with-odbc) to learn how to
work with Drill through ODBC.
import useBaseUrl from "@docusaurus/useBaseUrl";
#### Apache Druid
@@ -618,6 +604,7 @@ The connection string looks like:
```
druid://<User>:<password>@<Host>:<Port-default-9088>/druid/v2/sql
```
Here's a breakdown of the key components of this connection string:
- `User`: username portion of the credentials needed to connect to your database
@@ -646,7 +633,7 @@ To disable SSL verification, add the following to the **Extras** field:
```
engine_params:
{"connect_args":
{"scheme": "https", "ssl_verify_cert": false}}
{"scheme": "https", "ssl_verify_cert": false}}
```
##### Aggregations
@@ -670,7 +657,6 @@ much like you would create an aggregation manually, but specify `postagg` as a `
then have to provide a valid json post-aggregation definition (as specified in the Druid docs) in
the JSON field.
#### Elasticsearch
The recommended connector library for Elasticsearch is
@@ -719,7 +705,7 @@ Then register your table with the alias name logstash_all
By default, Superset uses UTC time zone for elasticsearch query. If you need to specify a time zone,
please edit your Database and enter the settings of your specified time zone in the Other > ENGINE PARAMETERS:
```
```json
{
"connect_args": {
"time_zone": "Asia/Shanghai"
@@ -741,8 +727,6 @@ To disable SSL verification, add the following to the **SQLALCHEMY URI** field:
elasticsearch+https://{user}:{password}@{host}:9200/?verify_certs=False
```
#### Exasol
The recommended connector library for Exasol is
@@ -754,7 +738,6 @@ The connection string for Exasol looks like this:
exa+pyodbc://{username}:{password}@{hostname}:{port}/my_schema?CONNECTIONLCALL=en_US.UTF-8&driver=EXAODBC
```
#### Firebird
The recommended connector library for Firebird is [sqlalchemy-firebird](https://pypi.org/project/sqlalchemy-firebird/).
@@ -772,7 +755,6 @@ Here's a connection string example of Superset connecting to a local Firebird da
firebird+fdb://SYSDBA:masterkey@192.168.86.38:3050//Library/Frameworks/Firebird.framework/Versions/A/Resources/examples/empbuild/employee.fdb
```
#### Firebolt
The recommended connector library for Firebolt is [firebolt-sqlalchemy](https://pypi.org/project/firebolt-sqlalchemy/).
@@ -803,7 +785,7 @@ The recommended connector library for BigQuery is
Follow the steps [here](/docs/configuration/databases#installing-drivers-in-docker-images) about how to
install new database drivers when setting up Superset locally via docker compose.
```
```bash
echo "sqlalchemy-bigquery" >> ./docker/requirements-local.txt
```
@@ -816,7 +798,7 @@ credentials file (as a JSON).
appropriate BigQuery datasets, and download the JSON configuration file for the service account.
2. In Superset, you can either upload that JSON or add the JSON blob in the following format (this should be the content of your credential JSON file):
```
```json
{
"type": "service_account",
"project_id": "...",
@@ -844,7 +826,7 @@ credentials file (as a JSON).
Go to the **Advanced** tab, Add a JSON blob to the **Secure Extra** field in the database configuration form with
the following format:
```
```json
{
"credentials_info": <contents of credentials JSON file>
}
@@ -852,7 +834,7 @@ credentials file (as a JSON).
The resulting file should have this structure:
```
```json
{
"credentials_info": {
"type": "service_account",
@@ -879,8 +861,6 @@ To be able to upload CSV or Excel files to BigQuery in Superset, you'll need to
Currently, the Google BigQuery Python SDK is not compatible with `gevent`, due to some dynamic monkeypatching on python core library by `gevent`.
So, when you deploy Superset with `gunicorn` server, you have to use worker type except `gevent`.
#### Google Sheets
Google Sheets has a very limited
@@ -891,7 +871,6 @@ There are a few steps involved in connecting Superset to Google Sheets. This
[tutorial](https://preset.io/blog/2020-06-01-connect-superset-google-sheets/) has the most up to date
instructions on setting up this connection.
#### Hana
The recommended connector library is [sqlalchemy-hana](https://github.com/SAP/sqlalchemy-hana).
@@ -902,7 +881,6 @@ The connection string is formatted as follows:
hana://{username}:{password}@{host}:{port}
```
#### Apache Hive
The [pyhive](https://pypi.org/project/PyHive/) library is the recommended way to connect to Hive through SQLAlchemy.
@@ -913,7 +891,6 @@ The expected connection string is formatted as follows:
hive://hive@{hostname}:{port}/{database}
```
#### Hologres
Hologres is a real-time interactive analytics service developed by Alibaba Cloud. It is fully compatible with PostgreSQL 11 and integrates seamlessly with the big data ecosystem.
@@ -932,7 +909,6 @@ The connection string looks like:
postgresql+psycopg2://{username}:{password}@{host}:{port}/{database}
```
#### IBM DB2
The [IBM_DB_SA](https://github.com/ibmdb/python-ibmdbsa/tree/master/ibm_db_sa) library provides a
@@ -950,7 +926,6 @@ There are two DB2 dialect versions implemented in SQLAlchemy. If you are connect
ibm_db_sa://{username}:{passport}@{hostname}:{port}/{database}
```
#### Apache Impala
The recommended connector library to Apache Impala is [impyla](https://github.com/cloudera/impyla).
@@ -961,7 +936,6 @@ The expected connection string is formatted as follows:
impala://{hostname}:{port}/{database}
```
#### Kusto
The recommended connector library for Kusto is
@@ -982,7 +956,6 @@ kustokql+https://{cluster_url}/{database}?azure_ad_client_id={azure_ad_client_id
Make sure the user has privileges to access and use all required
databases/tables/views.
#### Apache Kylin
The recommended connector library for Apache Kylin is
@@ -994,10 +967,6 @@ The expected connection string is formatted as follows:
kylin://<username>:<password>@<hostname>:<port>/<project>?<param1>=<value1>&<param2>=<value2>
```
#### MySQL
The recommended connector library for MySQL is [mysqlclient](https://pypi.org/project/mysqlclient/).
@@ -1022,7 +991,6 @@ One problem with `mysqlclient` is that it will fail to connect to newer MySQL da
mysql+mysqlconnector://{username}:{password}@{host}/{database}
```
#### IBM Netezza Performance Server
The [nzalchemy](https://pypi.org/project/nzalchemy/) library provides a
@@ -1039,21 +1007,19 @@ netezza+nzpy://{username}:{password}@{hostname}:{port}/{database}
The [sqlalchemy-oceanbase](https://pypi.org/project/oceanbase_py/) library is the recommended
way to connect to OceanBase through SQLAlchemy.
The connection string for OceanBase looks like this:
```
oceanbase://<User>:<Password>@<Host>:<Port>/<Database>
```
#### Ocient DB
The recommended connector library for Ocient is [sqlalchemy-ocient](https://pypi.org/project/sqlalchemy-ocient).
##### Install the Ocient Driver
```
```bash
pip install sqlalchemy-ocient
```
@@ -1094,7 +1060,7 @@ parseable://admin:admin@demo.parseable.com:443/ingress-nginx
Note: The stream_name in the URI represents the Parseable logstream you want to query. You can use both HTTP (port 80) and HTTPS (port 443) connections.
>>>>>>>
#### Apache Pinot
The recommended connector library for Apache Pinot is [pinotdb](https://pypi.org/project/pinotdb/).
@@ -1113,7 +1079,8 @@ pinot://<username>:<password>@<pinot-broker-host>:<pinot-broker-port>/query/sql?
If you want to use explore view or joins, window functions, etc. then enable [multi-stage query engine](https://docs.pinot.apache.org/reference/multi-stage-engine).
Add below argument while creating database connection in Advanced -> Other -> ENGINE PARAMETERS
```
```json
{"connect_args":{"use_multistage_engine":"true"}}
```
@@ -1153,7 +1120,6 @@ More information about PostgreSQL connection options can be found in the
and the
[PostgreSQL docs](https://www.postgresql.org/docs/9.1/libpq-connect.html#LIBPQ-PQCONNECTDBPARAMS).
#### Presto
The [pyhive](https://pypi.org/project/PyHive/) library is the recommended way to connect to Presto through SQLAlchemy.
@@ -1179,7 +1145,7 @@ presto://datascientist:securepassword@presto.example.com:8080/hive
By default Superset assumes the most recent version of Presto is being used when querying the
datasource. If youre using an older version of Presto, you can configure it in the extra parameter:
```
```json
{
"version": "0.123"
}
@@ -1187,7 +1153,7 @@ datasource. If youre using an older version of Presto, you can configure it i
SSL Secure extra add json config to extra connection information.
```
```json
{
"connect_args":
{"protocol": "https",
@@ -1196,8 +1162,6 @@ SSL Secure extra add json config to extra connection information.
}
```
#### RisingWave
The recommended connector library for RisingWave is
@@ -1209,7 +1173,6 @@ The expected connection string is formatted as follows:
risingwave://root@{hostname}:{port}/{database}?sslmode=disable
```
#### Rockset
The connection string for Rockset is:
@@ -1229,7 +1192,6 @@ rockset://{api key}:@{api server}/{VI ID}
For more complete instructions, we recommend the [Rockset documentation](https://docs.rockset.com/apache-superset/).
#### Snowflake
##### Install Snowflake Driver
@@ -1237,7 +1199,7 @@ For more complete instructions, we recommend the [Rockset documentation](https:/
Follow the steps [here](/docs/configuration/databases#installing-database-drivers) about how to
install new database drivers when setting up Superset locally via docker compose.
```
```bash
echo "snowflake-sqlalchemy" >> ./docker/requirements-local.txt
```
@@ -1270,7 +1232,7 @@ To connect Snowflake with Key Pair Authentication, you need to add the following
***Please note that you need to merge multi-line private key content to one line and insert `\n` between each line***
```
```json
{
"auth_method": "keypair",
"auth_params": {
@@ -1282,7 +1244,7 @@ To connect Snowflake with Key Pair Authentication, you need to add the following
If your private key is stored on server, you can replace "privatekey_body" with “privatekey_path” in parameter.
```
```json
{
"auth_method": "keypair",
"auth_params": {
@@ -1303,7 +1265,6 @@ The connection string for Solr looks like this:
solr://{username}:{password}@{host}:{port}/{server_path}/{collection}[/?use_ssl=true|false]
```
#### Apache Spark SQL
The recommended connector library for Apache Spark SQL [pyhive](https://pypi.org/project/PyHive/).
@@ -1327,6 +1288,7 @@ mssql+pymssql://<Username>:<Password>@<Host>:<Port-default:1433>/<Database Name>
It is also possible to connect using [pyodbc](https://pypi.org/project/pyodbc) with the parameter [odbc_connect](https://docs.sqlalchemy.org/en/14/dialects/mssql.html#pass-through-exact-pyodbc-string)
The connection string for SQL Server looks like this:
```
mssql+pyodbc:///?odbc_connect=Driver%3D%7BODBC+Driver+17+for+SQL+Server%7D%3BServer%3Dtcp%3A%3Cmy_server%3E%2C1433%3BDatabase%3Dmy_database%3BUid%3Dmy_user_name%3BPwd%3Dmy_password%3BEncrypt%3Dyes%3BConnection+Timeout%3D30
```
@@ -1394,7 +1356,7 @@ here: https://downloads.teradata.com/download/connectivity/odbc-driver/linux
Here are the required environment variables:
```
```bash
export ODBCINI=/.../teradata/client/ODBC_64/odbc.ini
export ODBCINST=/.../teradata/client/ODBC_64/odbcinst.ini
```
@@ -1403,8 +1365,8 @@ We recommend using the first library because of the
lack of requirement around ODBC drivers and
because it's more regularly updated.
#### TimescaleDB
[TimescaleDB](https://www.timescale.com) is the open-source relational database for time-series and analytics to build powerful data-intensive applications.
TimescaleDB is a PostgreSQL extension, and you can use the standard PostgreSQL connector library, [psycopg2](https://www.psycopg.org/docs/), to connect to the database.
@@ -1436,31 +1398,38 @@ postgresql://{username}:{password}@{host}:{port}/{database name}?sslmode=require
[Learn more about TimescaleDB!](https://docs.timescale.com/)
#### Trino
Supported trino version 352 and higher
##### Connection String
The connection string format is as follows:
```
trino://{username}:{password}@{hostname}:{port}/{catalog}
```
If you are running Trino with docker on local machine, please use the following connection URL
```
trino://trino@host.docker.internal:8080
```
##### Authentications
###### 1. Basic Authentication
You can provide `username`/`password` in the connection string or in the `Secure Extra` field at `Advanced / Security`
* In Connection String
- In Connection String
```
trino://{username}:{password}@{hostname}:{port}/{catalog}
```
* In `Secure Extra` field
- In `Secure Extra` field
```json
{
"auth_method": "basic",
@@ -1474,7 +1443,9 @@ You can provide `username`/`password` in the connection string or in the `Secure
NOTE: if both are provided, `Secure Extra` always takes higher priority.
###### 2. Kerberos Authentication
In `Secure Extra` field, config as following example:
```json
{
"auth_method": "kerberos",
@@ -1491,7 +1462,9 @@ All fields in `auth_params` are passed directly to the [`KerberosAuthentication`
NOTE: Kerberos authentication requires installing the [`trino-python-client`](https://github.com/trinodb/trino-python-client) locally with either the `all` or `kerberos` optional features, i.e., installing `trino[all]` or `trino[kerberos]` respectively.
###### 3. Certificate Authentication
In `Secure Extra` field, config as following example:
```json
{
"auth_method": "certificate",
@@ -1505,7 +1478,9 @@ In `Secure Extra` field, config as following example:
All fields in `auth_params` are passed directly to the [`CertificateAuthentication`](https://github.com/trinodb/trino-python-client/blob/0.315.0/trino/auth.py#L416) class.
###### 4. JWT Authentication
Config `auth_method` and provide token in `Secure Extra` field
```json
{
"auth_method": "jwt",
@@ -1516,8 +1491,10 @@ Config `auth_method` and provide token in `Secure Extra` field
```
###### 5. Custom Authentication
To use custom authentication, first you need to add it into
`ALLOWED_EXTRA_AUTHENTICATIONS` allow list in Superset config file:
```python
from your.module import AuthClass
from another.extra import auth_method
@@ -1531,6 +1508,7 @@ ALLOWED_EXTRA_AUTHENTICATIONS: Dict[str, Dict[str, Callable[..., Any]]] = {
```
Then in `Secure Extra` field:
```json
{
"auth_method": "custom_auth",
@@ -1546,8 +1524,8 @@ or factory function (which returns an `Authentication` instance) to `auth_method
All fields in `auth_params` are passed directly to your class/function.
**Reference**:
* [Trino-Superset-Podcast](https://trino.io/episodes/12.html)
- [Trino-Superset-Podcast](https://trino.io/episodes/12.html)
#### Vertica
@@ -1574,8 +1552,6 @@ Other parameters:
- Load Balancer - Backup Host
#### YDB
The recommended connector library for [YDB](https://ydb.tech/) is
@@ -1590,6 +1566,7 @@ ydb://{host}:{port}/{database_name}
```
##### Protocol
You can specify `protocol` in the `Secure Extra` field at `Advanced / Security`:
```
@@ -1600,9 +1577,10 @@ You can specify `protocol` in the `Secure Extra` field at `Advanced / Security`:
Default is `grpc`.
##### Authentication Methods
###### Static Credentials
To use `Static Credentials` you should provide `username`/`password` in the `Secure Extra` field at `Advanced / Security`:
```
@@ -1614,8 +1592,8 @@ To use `Static Credentials` you should provide `username`/`password` in the `Sec
}
```
###### Access Token Credentials
To use `Access Token Credentials` you should provide `token` in the `Secure Extra` field at `Advanced / Security`:
```
@@ -1626,8 +1604,8 @@ To use `Access Token Credentials` you should provide `token` in the `Secure Extr
}
```
##### Service Account Credentials
To use Service Account Credentials, you should provide `service_account_json` in the `Secure Extra` field at `Advanced / Security`:
```
@@ -1645,8 +1623,6 @@ To use Service Account Credentials, you should provide `service_account_json` in
}
```
#### YugabyteDB
[YugabyteDB](https://www.yugabyte.com/) is a distributed SQL database built on top of PostgreSQL.
@@ -1661,8 +1637,6 @@ The connection string looks like:
postgresql://{username}:{password}@{host}:{port}/{database}
```
## Connecting through the UI
Here is the documentation on how to leverage the new DB Connection UI. This will provide admins the ability to enhance the UX for users who want to connect to new databases.
@@ -1735,9 +1709,6 @@ For databases like MySQL and Postgres that use the standard format of `engine+dr
For other databases you need to implement these methods yourself. The BigQuery DB engine spec is a good example of how to do that.
### Extra Database Settings
##### Deeper SQLAlchemy Integration
@@ -1801,9 +1772,7 @@ You can use the `Extra` field in the **Edit Databases** form to configure SSL:
}
```
## Misc.
## Misc
### Querying across databases

View File

@@ -10,7 +10,7 @@ version: 1
The superset cli allows you to import and export datasources from and to YAML. Datasources include
databases. The data is expected to be organized in the following hierarchy:
```
```text
├──databases
| ├──database_1
| | ├──table_1
@@ -30,13 +30,13 @@ databases. The data is expected to be organized in the following hierarchy:
You can print your current datasources to stdout by running:
```
```bash
superset export_datasources
```
To save your datasources to a ZIP file run:
```
```bash
superset export_datasources -f <filename>
```
@@ -55,7 +55,7 @@ Alternatively, you can export datasources using the UI:
In order to obtain an **exhaustive list of all fields** you can import using the YAML import run:
```
```bash
superset export_datasource_schema
```
@@ -65,13 +65,13 @@ As a reminder, you can use the `-b` flag to include back references.
In order to import datasources from a ZIP file, run:
```
```bash
superset import_datasources -p <path / filename>
```
The optional username flag **-u** sets the user used for the datasource import. The default is 'admin'. Example:
```
```bash
superset import_datasources -p <path / filename> -u 'admin'
```
@@ -81,7 +81,7 @@ superset import_datasources -p <path / filename> -u 'admin'
When using Superset version 4.x.x to import from an older version (2.x.x or 3.x.x) importing is supported as the command `legacy_import_datasources` and expects a JSON or directory of JSONs. The options are `-r` for recursive and `-u` for specifying a user. Example of legacy import without options:
```
```bash
superset legacy_import_datasources -p <path or filename>
```
@@ -89,21 +89,21 @@ superset legacy_import_datasources -p <path or filename>
When using an older Superset version (2.x.x & 3.x.x) of Superset, the command is `import_datasources`. ZIP and YAML files are supported and to switch between them the feature flag `VERSIONED_EXPORT` is used. When `VERSIONED_EXPORT` is `True`, `import_datasources` expects a ZIP file, otherwise YAML. Example:
```
```bash
superset import_datasources -p <path or filename>
```
When `VERSIONED_EXPORT` is `False`, if you supply a path all files ending with **yaml** or **yml** will be parsed. You can apply
additional flags (e.g. to search the supplied path recursively):
```
```bash
superset import_datasources -p <path> -r
```
The sync flag **-s** takes parameters in order to sync the supplied elements with your file. Be
careful this can delete the contents of your meta database. Example:
```
```bash
superset import_datasources -p <path / filename> -s columns,metrics
```
@@ -115,7 +115,7 @@ If you dont supply the sync flag (**-s**) importing will only add and update
E.g. you can add a verbose_name to the column ds in the table random_time_series from the example
datasets by saving the following YAML to file and then running the **import_datasources** command.
```
```yaml
databases:
- database_name: main
tables:

View File

@@ -20,14 +20,12 @@ The following keys in `superset_config.py` can be specified to configure CORS:
- `CORS_OPTIONS`: options passed to Flask-CORS
([documentation](https://flask-cors.readthedocs.io/en/latest/api.html#extension))
## HTTP headers
Note that Superset bundles [flask-talisman](https://pypi.org/project/talisman/)
Self-described as a small Flask extension that handles setting HTTP headers that can help
protect against a few common web application security issues.
## HTML Embedding of Dashboards and Charts
There are two ways to embed a dashboard: Using the [SDK](https://www.npmjs.com/package/@superset-ui/embedded-sdk) or embedding a direct link. Note that in the latter case everybody who knows the link is able to access the dashboard.
@@ -39,14 +37,16 @@ This works by first changing the content security policy (CSP) of [flask-talisma
#### Changing flask-talisman CSP
Add to `superset_config.py` the entire `TALISMAN_CONFIG` section from `config.py` and include a `frame-ancestors` section:
```python
TALISMAN_ENABLED = True
TALISMAN_CONFIG = {
"content_security_policy": {
...
"frame-ancestors": ["*.my-domain.com", "*.another-domain.com"],
"frame-ancestors": ["*.my-domain.com", "*.another-domain.com"],
...
```
Restart Superset for this configuration change to take effect.
#### Making a Dashboard Public
@@ -69,6 +69,7 @@ Now anybody can directly access the dashboard's URL. You can embed it in an ifra
>
</iframe>
```
#### Embedding a Chart
A chart's embed code can be generated by going to a chart's edit view and then clicking at the top right on `...` > `Share` > `Embed code`
@@ -89,7 +90,6 @@ Similarly, [flask-wtf](https://flask-wtf.readthedocs.io/en/0.15.x/config/) is us
some CSRF configurations. If you need to exempt endpoints from CSRF (e.g. if you are
running a custom auth postback endpoint), you can add the endpoints to `WTF_CSRF_EXEMPT_LIST`:
## SSH Tunneling
1. Turn on feature flag
@@ -105,7 +105,6 @@ running a custom auth postback endpoint), you can add the endpoints to `WTF_CSRF
3. Verify data is flowing
- Once SSH tunneling has been enabled, go to SQL Lab and write a query to verify data is properly flowing.
## Domain Sharding
:::note

View File

@@ -77,6 +77,7 @@ In the UI you can assign a set of parameters as JSON
"my_table": "foo"
}
```
The parameters become available in your SQL (example: `SELECT * FROM {{ my_table }}` ) by using Jinja templating syntax.
SQL Lab template parameters are stored with the dataset as `TEMPLATE PARAMETERS`.
@@ -103,7 +104,6 @@ GROUP BY action
Note ``_filters`` is not stored with the dataset. It's only used within the SQL Lab UI.
Besides default Jinja templating, SQL lab also supports self-defined template processor by setting
the `CUSTOM_TEMPLATE_PROCESSORS` in your superset configuration. The values in this dictionary
overwrite the default Jinja template processors of the specified database engine. The example below
@@ -186,7 +186,7 @@ cache hit in the future and Superset can retrieve cached data.
You can disable the inclusion of the `username` value in the calculation of the
cache key by adding the following parameter to your Jinja code:
```
```python
{{ current_username(add_to_cache_keys=False) }}
```
@@ -201,7 +201,7 @@ cache hit in the future and Superset can retrieve cached data.
You can disable the inclusion of the account `id` value in the calculation of the
cache key by adding the following parameter to your Jinja code:
```
```python
{{ current_user_id(add_to_cache_keys=False) }}
```
@@ -216,7 +216,7 @@ cache hit in the future and Superset can retrieve cached data.
You can disable the inclusion of the email value in the calculation of the
cache key by adding the following parameter to your Jinja code:
```
```python
{{ current_user_email(add_to_cache_keys=False) }}
```
@@ -301,7 +301,7 @@ This is useful if:
Here's a concrete example:
```
```sql
WITH RECURSIVE
superiors(employee_id, manager_id, full_name, level, lineage) AS (
SELECT
@@ -357,6 +357,7 @@ considerably improve performance, as many databases and query engines are able t
if the temporal filter is placed on the inner query, as opposed to the outer query.
The macro takes the following parameters:
- `column`: Name of the temporal column. Leave undefined to reference the time range from a Dashboard Native Time Range
filter (when present).
- `default`: The default value to fall back to if the time filter is not present, or has the value `No filter`
@@ -370,6 +371,7 @@ The macro takes the following parameters:
filter should only apply to the inner query.
The return type has the following properties:
- `from_expr`: the start of the time filter (if any)
- `to_expr`: the end of the time filter (if any)
- `time_range`: The applied time range
@@ -410,6 +412,7 @@ LIMIT 1000;
When using the `default` parameter, the templated query can be simplified, as the endpoints will always be defined
(to use a fixed time range, you can also use something like `default="2024-08-27 : 2024-09-03"`)
```
{% set time_filter = get_time_filter("dttm", default="Last week", remove_filter=True) %}
SELECT
@@ -429,19 +432,19 @@ To use the macro, first you need to find the ID of the dataset. This can be done
Once you have the ID you can query it as if it were a table:
```
```sql
SELECT * FROM {{ dataset(42) }} LIMIT 10
```
If you want to select the metric definitions as well, in addition to the columns, you need to pass an additional keyword argument:
```
```sql
SELECT * FROM {{ dataset(42, include_metrics=True) }} LIMIT 10
```
Since metrics are aggregations, the resulting SQL expression will be grouped by all non-metric columns. You can specify a subset of columns to group by instead:
```
```sql
SELECT * FROM {{ dataset(42, include_metrics=True, columns=["ds", "category"]) }} LIMIT 10
```

View File

@@ -24,7 +24,7 @@ The challenge however lies with the slew of [database engines](/docs/configurati
For example the following is a comparison of MySQL and Presto,
```
```python
import pandas as pd
from sqlalchemy import create_engine
@@ -41,7 +41,7 @@ pd.read_sql_query(
which outputs `{"ts":{"0":1640995200000}}` (which infers the UTC timezone per the Epoch time definition) and `{"ts":{"0":"2022-01-01 00:00:00.000"}}` (without an explicit timezone) respectively and thus are treated differently in JavaScript:
```
```js
new Date(1640995200000)
> Sat Jan 01 2022 13:00:00 GMT+1300 (New Zealand Daylight Time)