mirror of
https://github.com/apache/superset.git
synced 2026-04-07 18:35:15 +00:00
244 lines
8.1 KiB
Plaintext
244 lines
8.1 KiB
Plaintext
---
|
|
title: Caching
|
|
hide_title: true
|
|
sidebar_position: 3
|
|
version: 1
|
|
---
|
|
|
|
# Caching
|
|
|
|
:::note
|
|
When a cache backend is configured, Superset expects it to remain available. Operations will
|
|
fail if the configured backend becomes unavailable rather than silently degrading. This
|
|
fail-fast behavior ensures operators are immediately aware of infrastructure issues.
|
|
:::
|
|
|
|
Superset uses [Flask-Caching](https://flask-caching.readthedocs.io/) for caching purposes.
|
|
Flask-Caching supports various caching backends, including Redis (recommended), Memcached,
|
|
SimpleCache (in-memory), or the local filesystem.
|
|
[Custom cache backends](https://flask-caching.readthedocs.io/en/latest/#custom-cache-backends)
|
|
are also supported.
|
|
|
|
Caching can be configured by providing dictionaries in
|
|
`superset_config.py` that comply with [the Flask-Caching config specifications](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching).
|
|
|
|
The following cache configurations can be customized in this way:
|
|
|
|
- Dashboard filter state (required): `FILTER_STATE_CACHE_CONFIG`.
|
|
- Explore chart form data (required): `EXPLORE_FORM_DATA_CACHE_CONFIG`
|
|
- Metadata cache (optional): `CACHE_CONFIG`
|
|
- Charting data queried from datasets (optional): `DATA_CACHE_CONFIG`
|
|
|
|
For example, to configure the filter state cache using Redis:
|
|
|
|
```python
|
|
FILTER_STATE_CACHE_CONFIG = {
|
|
'CACHE_TYPE': 'RedisCache',
|
|
'CACHE_DEFAULT_TIMEOUT': 86400,
|
|
'CACHE_KEY_PREFIX': 'superset_filter_cache',
|
|
'CACHE_REDIS_URL': 'redis://localhost:6379/0'
|
|
}
|
|
```
|
|
|
|
## Dependencies
|
|
|
|
In order to use dedicated cache stores, additional python libraries must be installed
|
|
|
|
- For Redis: we recommend the [redis](https://pypi.python.org/pypi/redis) Python package
|
|
- Memcached: we recommend using [pylibmc](https://pypi.org/project/pylibmc/) client library as
|
|
`python-memcached` does not handle storing binary data correctly.
|
|
|
|
These libraries can be installed using pip.
|
|
|
|
## Fallback Metastore Cache
|
|
|
|
Note, that some form of Filter State and Explore caching are required. If either of these caches
|
|
are undefined, Superset falls back to using a built-in cache that stores data in the metadata
|
|
database. While it is recommended to use a dedicated cache, the built-in cache can also be used
|
|
to cache other data.
|
|
|
|
For example, to use the built-in cache to store chart data, use the following config:
|
|
|
|
```python
|
|
DATA_CACHE_CONFIG = {
|
|
"CACHE_TYPE": "SupersetMetastoreCache",
|
|
"CACHE_KEY_PREFIX": "superset_results", # make sure this string is unique to avoid collisions
|
|
"CACHE_DEFAULT_TIMEOUT": 86400, # 60 seconds * 60 minutes * 24 hours
|
|
}
|
|
```
|
|
|
|
## Chart Cache Timeout
|
|
|
|
The cache timeout for charts may be overridden by the settings for an individual chart, dataset, or
|
|
database. Each of these configurations will be checked in order before falling back to the default
|
|
value defined in `DATA_CACHE_CONFIG`.
|
|
|
|
Note, that by setting the cache timeout to `-1`, caching for charting data can be disabled, either
|
|
per chart, dataset or database, or by default if set in `DATA_CACHE_CONFIG`.
|
|
|
|
## SQL Lab Query Results
|
|
|
|
Caching for SQL Lab query results is used when async queries are enabled and is configured using
|
|
`RESULTS_BACKEND`.
|
|
|
|
Note that this configuration does not use a flask-caching dictionary for its configuration, but
|
|
instead requires a cachelib object.
|
|
|
|
See [Async Queries via Celery](/admin-docs/configuration/async-queries-celery) for details.
|
|
|
|
## Caching Thumbnails
|
|
|
|
This is an optional feature that can be turned on by activating its [feature flag](/admin-docs/configuration/configuring-superset#feature-flags) on config:
|
|
|
|
```
|
|
FEATURE_FLAGS = {
|
|
"THUMBNAILS": True,
|
|
"THUMBNAILS_SQLA_LISTENERS": True,
|
|
}
|
|
```
|
|
|
|
By default thumbnails are rendered per user, and will fall back to the Selenium user for anonymous users.
|
|
To always render thumbnails as a fixed user (`admin` in this example), use the following configuration:
|
|
|
|
```python
|
|
from superset.tasks.types import FixedExecutor
|
|
|
|
THUMBNAIL_EXECUTORS = [FixedExecutor("admin")]
|
|
```
|
|
|
|
For this feature you will need a cache system and celery workers. All thumbnails are stored on cache
|
|
and are processed asynchronously by the workers.
|
|
|
|
An example config where images are stored on S3 could be:
|
|
|
|
```python
|
|
from flask import Flask
|
|
from s3cache.s3cache import S3Cache
|
|
|
|
...
|
|
|
|
class CeleryConfig(object):
|
|
broker_url = "redis://localhost:6379/0"
|
|
imports = (
|
|
"superset.sql_lab",
|
|
"superset.tasks.thumbnails",
|
|
)
|
|
result_backend = "redis://localhost:6379/0"
|
|
worker_prefetch_multiplier = 10
|
|
task_acks_late = True
|
|
|
|
|
|
CELERY_CONFIG = CeleryConfig
|
|
|
|
def init_thumbnail_cache(app: Flask) -> S3Cache:
|
|
return S3Cache("bucket_name", 'thumbs_cache/')
|
|
|
|
|
|
THUMBNAIL_CACHE_CONFIG = init_thumbnail_cache
|
|
```
|
|
|
|
Using the above example cache keys for dashboards will be `superset_thumb__dashboard__{ID}`. You can
|
|
override the base URL for selenium using:
|
|
|
|
```
|
|
WEBDRIVER_BASEURL = "https://superset.company.com"
|
|
```
|
|
|
|
Additional selenium web drive configuration can be set using `WEBDRIVER_CONFIGURATION`. You can
|
|
implement a custom function to authenticate selenium. The default function uses the `flask-login`
|
|
session cookie. Here's an example of a custom function signature:
|
|
|
|
```python
|
|
def auth_driver(driver: WebDriver, user: "User") -> WebDriver:
|
|
pass
|
|
```
|
|
|
|
Then on configuration:
|
|
|
|
```
|
|
WEBDRIVER_AUTH_FUNC = auth_driver
|
|
```
|
|
|
|
## Distributed Coordination Backend
|
|
|
|
Superset supports an optional distributed coordination (`DISTRIBUTED_COORDINATION_CONFIG`) for
|
|
high-performance distributed operations. This configuration enables:
|
|
|
|
- **Distributed locking**: Moves lock operations from the metadata database to Redis, improving
|
|
performance and reducing metastore load
|
|
- **Real-time event notifications**: Enables instant pub/sub messaging for task abort signals and
|
|
completion notifications instead of polling-based approaches
|
|
|
|
:::note
|
|
This requires Redis or Valkey specifically—it uses Redis-specific features (pub/sub, `SET NX EX`)
|
|
that are not available in general Flask-Caching backends.
|
|
:::
|
|
|
|
### Configuration
|
|
|
|
The distributed coordination uses Flask-Caching style configuration for consistency with other cache
|
|
backends. Configure `DISTRIBUTED_COORDINATION_CONFIG` in `superset_config.py`:
|
|
|
|
```python
|
|
DISTRIBUTED_COORDINATION_CONFIG = {
|
|
"CACHE_TYPE": "RedisCache",
|
|
"CACHE_REDIS_HOST": "localhost",
|
|
"CACHE_REDIS_PORT": 6379,
|
|
"CACHE_REDIS_DB": 0,
|
|
"CACHE_REDIS_PASSWORD": "", # Optional
|
|
}
|
|
```
|
|
|
|
For Redis Sentinel deployments:
|
|
|
|
```python
|
|
DISTRIBUTED_COORDINATION_CONFIG = {
|
|
"CACHE_TYPE": "RedisSentinelCache",
|
|
"CACHE_REDIS_SENTINELS": [("sentinel1", 26379), ("sentinel2", 26379)],
|
|
"CACHE_REDIS_SENTINEL_MASTER": "mymaster",
|
|
"CACHE_REDIS_SENTINEL_PASSWORD": None, # Sentinel password (if different)
|
|
"CACHE_REDIS_PASSWORD": "", # Redis password
|
|
"CACHE_REDIS_DB": 0,
|
|
}
|
|
```
|
|
|
|
For SSL/TLS connections:
|
|
|
|
```python
|
|
DISTRIBUTED_COORDINATION_CONFIG = {
|
|
"CACHE_TYPE": "RedisCache",
|
|
"CACHE_REDIS_HOST": "redis.example.com",
|
|
"CACHE_REDIS_PORT": 6380,
|
|
"CACHE_REDIS_SSL": True,
|
|
"CACHE_REDIS_SSL_CERTFILE": "/path/to/client.crt",
|
|
"CACHE_REDIS_SSL_KEYFILE": "/path/to/client.key",
|
|
"CACHE_REDIS_SSL_CA_CERTS": "/path/to/ca.crt",
|
|
}
|
|
```
|
|
|
|
### Distributed Lock TTL
|
|
|
|
You can configure the default lock TTL (time-to-live) in seconds. Locks automatically expire after
|
|
this duration to prevent deadlocks from crashed processes:
|
|
|
|
```python
|
|
DISTRIBUTED_LOCK_DEFAULT_TTL = 30 # Default: 30 seconds
|
|
```
|
|
|
|
Individual lock acquisitions can override this value when needed.
|
|
|
|
### Database-Only Mode
|
|
|
|
When `DISTRIBUTED_COORDINATION_CONFIG` is not configured, Superset uses database-backed operations:
|
|
|
|
- **Locking**: Uses the KeyValue table with periodic cleanup of expired entries
|
|
- **Event notifications**: Uses database polling instead of pub/sub
|
|
|
|
While database-backed operations work reliably, the Redis backend is recommended for production
|
|
deployments where low latency and reduced database load are important.
|
|
|
|
:::resources
|
|
- [Blog: The Data Engineer's Guide to Lightning-Fast Superset Dashboards](https://preset.io/blog/the-data-engineers-guide-to-lightning-fast-apache-superset-dashboards/)
|
|
- [Blog: Accelerating Dashboards with Materialized Views](https://preset.io/blog/accelerating-apache-superset-dashboards-with-materialized-views/)
|
|
:::
|