Files
superset2/docs/docs/installation/cache.mdx
2022-03-21 19:46:56 +02:00

127 lines
4.5 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: Caching
hide_title: true
sidebar_position: 5
version: 1
---
## Caching
Superset uses [Flask-Caching](https://flask-caching.readthedocs.io/) for caching purposes. Configuring caching is as easy as providing a custom cache config in your
`superset_config.py` that complies with [the Flask-Caching specifications](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching).
Flask-Caching supports various caching backends, including Redis, Memcached, SimpleCache (in-memory), or the
local filesystem. Custom cache backends are also supported. See [here](https://flask-caching.readthedocs.io/en/latest/#custom-cache-backends) for specifics.
The following cache configurations can be customized:
- Metadata cache (optional): `CACHE_CONFIG`
- Charting data queried from datasets (optional): `DATA_CACHE_CONFIG`
- SQL Lab query results (optional): `RESULTS_BACKEND`. See [Async Queries via Celery](/docs/installation/async-queries-celery) for details
- Dashboard filter state (required): `FILTER_STATE_CACHE_CONFIG`.
- Explore chart form data (required): `EXPLORE_FORM_DATA_CACHE_CONFIG`
Please note, that Dashboard and Explore caching is required. If these caches are undefined, Superset falls back to using a built-in cache that stores data
in the metadata database. While it is recommended to use a dedicated cache, the built-in cache can also be used to cache other data.
For example, to use the built-in cache to store chart data, use the following config:
```python
DATA_CACHE_CONFIG = {
"CACHE_TYPE": "SupersetMetastoreCache",
"CACHE_KEY_PREFIX": "superset_results", # make sure this string is unique to avoid collisions
"CACHE_DEFAULT_TIMEOUT": 86400, # 60 seconds * 60 minutes * 24 hours
}
```
- Redis (recommended): we recommend the [redis](https://pypi.python.org/pypi/redis) Python package
- Memcached: we recommend using [pylibmc](https://pypi.org/project/pylibmc/) client library as
`python-memcached` does not handle storing binary data correctly.
Both of these libraries can be installed using pip.
For chart data, Superset goes up a “timeout search path”, from a slice's configuration
to the datasources, the databases, then ultimately falls back to the global default
defined in `DATA_CACHE_CONFIG`.
## Celery beat
Superset has a Celery task that will periodically warm up the cache based on different strategies.
To use it, add the following to the `CELERYBEAT_SCHEDULE` section in `config.py`:
```python
CELERYBEAT_SCHEDULE = {
'cache-warmup-hourly': {
'task': 'cache-warmup',
'schedule': crontab(minute=0, hour='*'), # hourly
'kwargs': {
'strategy_name': 'top_n_dashboards',
'top_n': 5,
'since': '7 days ago',
},
},
}
```
This will cache all the charts in the top 5 most popular dashboards every hour. For other
strategies, check the `superset/tasks/cache.py` file.
### Caching Thumbnails
This is an optional feature that can be turned on by activating its feature flag on config:
```
FEATURE_FLAGS = {
"THUMBNAILS": True,
"THUMBNAILS_SQLA_LISTENERS": True,
}
```
For this feature you will need a cache system and celery workers. All thumbnails are stored on cache
and are processed asynchronously by the workers.
An example config where images are stored on S3 could be:
```python
from flask import Flask
from s3cache.s3cache import S3Cache
...
class CeleryConfig(object):
BROKER_URL = "redis://localhost:6379/0"
CELERY_IMPORTS = ("superset.sql_lab", "superset.tasks", "superset.tasks.thumbnails")
CELERY_RESULT_BACKEND = "redis://localhost:6379/0"
CELERYD_PREFETCH_MULTIPLIER = 10
CELERY_ACKS_LATE = True
CELERY_CONFIG = CeleryConfig
def init_thumbnail_cache(app: Flask) -> S3Cache:
return S3Cache("bucket_name", 'thumbs_cache/')
THUMBNAIL_CACHE_CONFIG = init_thumbnail_cache
# Async selenium thumbnail task will use the following user
THUMBNAIL_SELENIUM_USER = "Admin"
```
Using the above example cache keys for dashboards will be `superset_thumb__dashboard__{ID}`. You can
override the base URL for selenium using:
```
WEBDRIVER_BASEURL = "https://superset.company.com"
```
Additional selenium web drive configuration can be set using `WEBDRIVER_CONFIGURATION`. You can
implement a custom function to authenticate selenium. The default function uses the `flask-login`
session cookie. Here's an example of a custom function signature:
```python
def auth_driver(driver: WebDriver, user: "User") -> WebDriver:
pass
```
Then on configuration:
```
WEBDRIVER_AUTH_FUNC = auth_driver
```