mirror of
https://github.com/apache/superset.git
synced 2026-04-19 08:04:53 +00:00
[docs] Add SSL config options for postgres (#9767)
* [docs] add postgres SSL documentation * move caching section to where it makes more sense
This commit is contained in:
@@ -333,6 +333,144 @@ auth postback endpoint, you can add them to *WTF_CSRF_EXEMPT_LIST*
|
||||
|
||||
.. _ref_database_deps:
|
||||
|
||||
Caching
|
||||
-------
|
||||
|
||||
Superset uses `Flask-Cache <https://pythonhosted.org/Flask-Cache/>`_ for
|
||||
caching purpose. Configuring your caching backend is as easy as providing
|
||||
a ``CACHE_CONFIG``, constant in your ``superset_config.py`` that
|
||||
complies with the Flask-Cache specifications.
|
||||
|
||||
Flask-Cache supports multiple caching backends (Redis, Memcached,
|
||||
SimpleCache (in-memory), or the local filesystem). If you are going to use
|
||||
Memcached please use the `pylibmc` client library as `python-memcached` does
|
||||
not handle storing binary data correctly. If you use Redis, please install
|
||||
the `redis <https://pypi.python.org/pypi/redis>`_ Python package: ::
|
||||
|
||||
pip install redis
|
||||
|
||||
For setting your timeouts, this is done in the Superset metadata and goes
|
||||
up the "timeout searchpath", from your slice configuration, to your
|
||||
data source's configuration, to your database's and ultimately falls back
|
||||
into your global default defined in ``CACHE_CONFIG``.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
CACHE_CONFIG = {
|
||||
'CACHE_TYPE': 'redis',
|
||||
'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
|
||||
'CACHE_KEY_PREFIX': 'superset_results',
|
||||
'CACHE_REDIS_URL': 'redis://localhost:6379/0',
|
||||
}
|
||||
|
||||
It is also possible to pass a custom cache initialization function in the
|
||||
config to handle additional caching use cases. The function must return an
|
||||
object that is compatible with the `Flask-Cache <https://pythonhosted.org/Flask-Cache/>`_ API.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from custom_caching import CustomCache
|
||||
|
||||
def init_cache(app):
|
||||
"""Takes an app instance and returns a custom cache backend"""
|
||||
config = {
|
||||
'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
|
||||
'CACHE_KEY_PREFIX': 'superset_results',
|
||||
}
|
||||
return CustomCache(app, config)
|
||||
|
||||
CACHE_CONFIG = init_cache
|
||||
|
||||
Superset has a Celery task that will periodically warm up the cache based on
|
||||
different strategies. To use it, add the following to the `CELERYBEAT_SCHEDULE`
|
||||
section in `config.py`:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
CELERYBEAT_SCHEDULE = {
|
||||
'cache-warmup-hourly': {
|
||||
'task': 'cache-warmup',
|
||||
'schedule': crontab(minute=0, hour='*'), # hourly
|
||||
'kwargs': {
|
||||
'strategy_name': 'top_n_dashboards',
|
||||
'top_n': 5,
|
||||
'since': '7 days ago',
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
This will cache all the charts in the top 5 most popular dashboards every hour.
|
||||
For other strategies, check the `superset/tasks/cache.py` file.
|
||||
|
||||
Caching Thumbnails
|
||||
------------------
|
||||
|
||||
This is an optional feature that can be turned on by activating it's feature flag on config:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
FEATURE_FLAGS = {
|
||||
"THUMBNAILS": True,
|
||||
"THUMBNAILS_SQLA_LISTENERS": True,
|
||||
}
|
||||
|
||||
|
||||
For this feature you will need a cache system and celery workers. All thumbnails are store on cache and are processed
|
||||
asynchronously by the workers.
|
||||
|
||||
An example config where images are stored on S3 could be:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from flask import Flask
|
||||
from s3cache.s3cache import S3Cache
|
||||
|
||||
...
|
||||
|
||||
class CeleryConfig(object):
|
||||
BROKER_URL = "redis://localhost:6379/0"
|
||||
CELERY_IMPORTS = ("superset.sql_lab", "superset.tasks", "superset.tasks.thumbnails")
|
||||
CELERY_RESULT_BACKEND = "redis://localhost:6379/0"
|
||||
CELERYD_PREFETCH_MULTIPLIER = 10
|
||||
CELERY_ACKS_LATE = True
|
||||
|
||||
|
||||
CELERY_CONFIG = CeleryConfig
|
||||
|
||||
def init_thumbnail_cache(app: Flask) -> S3Cache:
|
||||
return S3Cache("bucket_name", 'thumbs_cache/')
|
||||
|
||||
|
||||
THUMBNAIL_CACHE_CONFIG = init_thumbnail_cache
|
||||
# Async selenium thumbnail task will use the following user
|
||||
THUMBNAIL_SELENIUM_USER = "Admin"
|
||||
|
||||
Using the above example cache keys for dashboards will be `superset_thumb__dashboard__{ID}`
|
||||
|
||||
You can override the base URL for selenium using:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
WEBDRIVER_BASEURL = "https://superset.company.com"
|
||||
|
||||
|
||||
Additional selenium web drive config can be set using `WEBDRIVER_CONFIGURATION`
|
||||
|
||||
You can implement a custom function to authenticate selenium, the default uses flask-login session cookie.
|
||||
An example of a custom function signature:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def auth_driver(driver: WebDriver, user: "User") -> WebDriver:
|
||||
pass
|
||||
|
||||
|
||||
Then on config:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
WEBDRIVER_AUTH_FUNC = auth_driver
|
||||
|
||||
Database dependencies
|
||||
---------------------
|
||||
|
||||
@@ -424,8 +562,40 @@ The connection string for PostgreSQL looks like this ::
|
||||
|
||||
postgresql+psycopg2://{username}:{password}@{host}:{port}/{database}
|
||||
|
||||
See `psycopg2 SQLAlchemy <https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#module-sqlalchemy.dialects.postgresql.psycopg2>`_.
|
||||
Additional may be configured via the ``extra`` field under ``engine_params``.
|
||||
If you would like to enable mutual SSL here is a sample configuration:
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{
|
||||
"metadata_params": {},
|
||||
"engine_params": {
|
||||
"connect_args":{
|
||||
"sslmode": "require",
|
||||
"sslrootcert": "/path/to/root_cert"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
If the key ``sslrootcert`` is present the server's certificate will be verified to be signed by the same Certificate Authority (CA).
|
||||
|
||||
If you would like to enable mutual SSL here is a sample configuration:
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{
|
||||
"metadata_params": {},
|
||||
"engine_params": {
|
||||
"connect_args":{
|
||||
"sslmode": "require",
|
||||
"sslcert": "/path/to/client_cert",
|
||||
"sslkey": "/path/to/client_key",
|
||||
"sslrootcert": "/path/to/root_cert"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
See `psycopg2 SQLAlchemy <https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#module-sqlalchemy.dialects.postgresql.psycopg2>`_.
|
||||
|
||||
Hana
|
||||
------------
|
||||
@@ -588,144 +758,6 @@ If you are using JDBC to connect to Drill, the connection string looks like this
|
||||
For a complete tutorial about how to use Apache Drill with Superset, see this tutorial:
|
||||
`Visualize Anything with Superset and Drill <http://thedataist.com/visualize-anything-with-superset-and-drill/>`_
|
||||
|
||||
Caching
|
||||
-------
|
||||
|
||||
Superset uses `Flask-Cache <https://pythonhosted.org/Flask-Cache/>`_ for
|
||||
caching purpose. Configuring your caching backend is as easy as providing
|
||||
a ``CACHE_CONFIG``, constant in your ``superset_config.py`` that
|
||||
complies with the Flask-Cache specifications.
|
||||
|
||||
Flask-Cache supports multiple caching backends (Redis, Memcached,
|
||||
SimpleCache (in-memory), or the local filesystem). If you are going to use
|
||||
Memcached please use the `pylibmc` client library as `python-memcached` does
|
||||
not handle storing binary data correctly. If you use Redis, please install
|
||||
the `redis <https://pypi.python.org/pypi/redis>`_ Python package: ::
|
||||
|
||||
pip install redis
|
||||
|
||||
For setting your timeouts, this is done in the Superset metadata and goes
|
||||
up the "timeout searchpath", from your slice configuration, to your
|
||||
data source's configuration, to your database's and ultimately falls back
|
||||
into your global default defined in ``CACHE_CONFIG``.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
CACHE_CONFIG = {
|
||||
'CACHE_TYPE': 'redis',
|
||||
'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
|
||||
'CACHE_KEY_PREFIX': 'superset_results',
|
||||
'CACHE_REDIS_URL': 'redis://localhost:6379/0',
|
||||
}
|
||||
|
||||
It is also possible to pass a custom cache initialization function in the
|
||||
config to handle additional caching use cases. The function must return an
|
||||
object that is compatible with the `Flask-Cache <https://pythonhosted.org/Flask-Cache/>`_ API.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from custom_caching import CustomCache
|
||||
|
||||
def init_cache(app):
|
||||
"""Takes an app instance and returns a custom cache backend"""
|
||||
config = {
|
||||
'CACHE_DEFAULT_TIMEOUT': 60 * 60 * 24, # 1 day default (in secs)
|
||||
'CACHE_KEY_PREFIX': 'superset_results',
|
||||
}
|
||||
return CustomCache(app, config)
|
||||
|
||||
CACHE_CONFIG = init_cache
|
||||
|
||||
Superset has a Celery task that will periodically warm up the cache based on
|
||||
different strategies. To use it, add the following to the `CELERYBEAT_SCHEDULE`
|
||||
section in `config.py`:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
CELERYBEAT_SCHEDULE = {
|
||||
'cache-warmup-hourly': {
|
||||
'task': 'cache-warmup',
|
||||
'schedule': crontab(minute=0, hour='*'), # hourly
|
||||
'kwargs': {
|
||||
'strategy_name': 'top_n_dashboards',
|
||||
'top_n': 5,
|
||||
'since': '7 days ago',
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
This will cache all the charts in the top 5 most popular dashboards every hour.
|
||||
For other strategies, check the `superset/tasks/cache.py` file.
|
||||
|
||||
Caching Thumbnails
|
||||
------------------
|
||||
|
||||
This is an optional feature that can be turned on by activating it's feature flag on config:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
FEATURE_FLAGS = {
|
||||
"THUMBNAILS": True,
|
||||
"THUMBNAILS_SQLA_LISTENERS": True,
|
||||
}
|
||||
|
||||
|
||||
For this feature you will need a cache system and celery workers. All thumbnails are store on cache and are processed
|
||||
asynchronously by the workers.
|
||||
|
||||
An example config where images are stored on S3 could be:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from flask import Flask
|
||||
from s3cache.s3cache import S3Cache
|
||||
|
||||
...
|
||||
|
||||
class CeleryConfig(object):
|
||||
BROKER_URL = "redis://localhost:6379/0"
|
||||
CELERY_IMPORTS = ("superset.sql_lab", "superset.tasks", "superset.tasks.thumbnails")
|
||||
CELERY_RESULT_BACKEND = "redis://localhost:6379/0"
|
||||
CELERYD_PREFETCH_MULTIPLIER = 10
|
||||
CELERY_ACKS_LATE = True
|
||||
|
||||
|
||||
CELERY_CONFIG = CeleryConfig
|
||||
|
||||
def init_thumbnail_cache(app: Flask) -> S3Cache:
|
||||
return S3Cache("bucket_name", 'thumbs_cache/')
|
||||
|
||||
|
||||
THUMBNAIL_CACHE_CONFIG = init_thumbnail_cache
|
||||
# Async selenium thumbnail task will use the following user
|
||||
THUMBNAIL_SELENIUM_USER = "Admin"
|
||||
|
||||
Using the above example cache keys for dashboards will be `superset_thumb__dashboard__{ID}`
|
||||
|
||||
You can override the base URL for selenium using:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
WEBDRIVER_BASEURL = "https://superset.company.com"
|
||||
|
||||
|
||||
Additional selenium web drive config can be set using `WEBDRIVER_CONFIGURATION`
|
||||
|
||||
You can implement a custom function to authenticate selenium, the default uses flask-login session cookie.
|
||||
An example of a custom function signature:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def auth_driver(driver: WebDriver, user: "User") -> WebDriver:
|
||||
pass
|
||||
|
||||
|
||||
Then on config:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
WEBDRIVER_AUTH_FUNC = auth_driver
|
||||
|
||||
Deeper SQLAlchemy integration
|
||||
-----------------------------
|
||||
|
||||
|
||||
Reference in New Issue
Block a user