mirror of
https://github.com/apache/superset.git
synced 2026-05-22 00:05:15 +00:00
Snapshots all four versioned Docusaurus sections at v6.1.0, cut from master after the version-cutting tooling (#39837) and broken-internal- links fixes (#40102) landed. Captures fresh auto-generated content and freezes data dependencies so the historical snapshot stays correct. Versioning behavior: lastVersion stays at current for every section, so the canonical URLs (/docs/..., /admin-docs/..., /developer-docs/..., /components/...) continue to render content from master. The current version is consistently labeled "Next" with an unreleased banner, and 6.1.0 is a historical pin accessible only via its explicit version segment. Component playground: previously disabled: true in versions-config.json, now enabled and versioned. The plugin block in docusaurus.config.ts was already gated only by the disabled flag, so no other code changes were needed to bring it back online. Snapshot includes: - All MDX content for the four sections. - Auto-gen captured fresh: 74 database pages (engine spec metadata), ~1,800 API reference files (openapi.json), 59 component pages (Storybook stories). - Data imports frozen at cut time into snapshot-local _versioned_data/ dirs: versioned_docs/version-6.1.0/_versioned_data/src/data/databases.json (canonical 80-database diagnostics from master, preserved by the generator's input-hash cache) admin_docs_versioned_docs/version-6.1.0/_versioned_data/data/countries.json admin_docs_versioned_docs/version-6.1.0/_versioned_data/static/feature-flags.json developer_docs_versioned_docs/version-6.1.0/_versioned_data/static/data/components.json - Import paths in deeply-nested files rewritten so they still resolve from one directory deeper inside the snapshot. Verified via full yarn build: exit 0, no broken links surfaced by onBrokenLinks: throw. Anchor warnings present are pre-existing on master (community#superset-community-calendar) and unrelated.
277 lines
9.6 KiB
Plaintext
277 lines
9.6 KiB
Plaintext
---
|
|
title: Caching
|
|
hide_title: true
|
|
sidebar_position: 3
|
|
version: 1
|
|
---
|
|
|
|
# Caching
|
|
|
|
:::note
|
|
When a cache backend is configured, Superset expects it to remain available. Operations will
|
|
fail if the configured backend becomes unavailable rather than silently degrading. This
|
|
fail-fast behavior ensures operators are immediately aware of infrastructure issues.
|
|
:::
|
|
|
|
Superset uses [Flask-Caching](https://flask-caching.readthedocs.io/) for caching purposes.
|
|
Flask-Caching supports various caching backends, including Redis (recommended), Memcached,
|
|
SimpleCache (in-memory), or the local filesystem.
|
|
[Custom cache backends](https://flask-caching.readthedocs.io/en/latest/#custom-cache-backends)
|
|
are also supported.
|
|
|
|
Caching can be configured by providing dictionaries in
|
|
`superset_config.py` that comply with [the Flask-Caching config specifications](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching).
|
|
|
|
The following cache configurations can be customized in this way:
|
|
|
|
- Dashboard filter state (required): `FILTER_STATE_CACHE_CONFIG`.
|
|
- Explore chart form data (required): `EXPLORE_FORM_DATA_CACHE_CONFIG`
|
|
- Metadata cache (optional): `CACHE_CONFIG`
|
|
- Charting data queried from datasets (optional): `DATA_CACHE_CONFIG`
|
|
|
|
For example, to configure the filter state cache using Redis:
|
|
|
|
```python
|
|
FILTER_STATE_CACHE_CONFIG = {
|
|
'CACHE_TYPE': 'RedisCache',
|
|
'CACHE_DEFAULT_TIMEOUT': 86400,
|
|
'CACHE_KEY_PREFIX': 'superset_filter_cache',
|
|
'CACHE_REDIS_URL': 'redis://localhost:6379/0'
|
|
}
|
|
```
|
|
|
|
## Dependencies
|
|
|
|
In order to use dedicated cache stores, additional python libraries must be installed
|
|
|
|
- For Redis: we recommend the [redis](https://pypi.python.org/pypi/redis) Python package
|
|
- Memcached: we recommend using [pylibmc](https://pypi.org/project/pylibmc/) client library as
|
|
`python-memcached` does not handle storing binary data correctly.
|
|
|
|
These libraries can be installed using pip.
|
|
|
|
## Fallback Metastore Cache
|
|
|
|
Note, that some form of Filter State and Explore caching are required. If either of these caches
|
|
are undefined, Superset falls back to using a built-in cache that stores data in the metadata
|
|
database. While it is recommended to use a dedicated cache, the built-in cache can also be used
|
|
to cache other data.
|
|
|
|
For example, to use the built-in cache to store chart data, use the following config:
|
|
|
|
```python
|
|
DATA_CACHE_CONFIG = {
|
|
"CACHE_TYPE": "SupersetMetastoreCache",
|
|
"CACHE_KEY_PREFIX": "superset_results", # make sure this string is unique to avoid collisions
|
|
"CACHE_DEFAULT_TIMEOUT": 86400, # 60 seconds * 60 minutes * 24 hours
|
|
}
|
|
```
|
|
|
|
## Chart Cache Timeout
|
|
|
|
The cache timeout for charts may be overridden by the settings for an individual chart, dataset, or
|
|
database. Each of these configurations will be checked in order before falling back to the default
|
|
value defined in `DATA_CACHE_CONFIG`.
|
|
|
|
Note, that by setting the cache timeout to `-1`, caching for charting data can be disabled, either
|
|
per chart, dataset or database, or by default if set in `DATA_CACHE_CONFIG`.
|
|
|
|
## SQL Lab Query Results
|
|
|
|
Caching for SQL Lab query results is used when async queries are enabled and is configured using
|
|
`RESULTS_BACKEND`.
|
|
|
|
Note that this configuration does not use a flask-caching dictionary for its configuration, but
|
|
instead requires a cachelib object.
|
|
|
|
See [Async Queries via Celery](/admin-docs/configuration/async-queries-celery) for details.
|
|
|
|
## Caching Thumbnails
|
|
|
|
This is an optional feature that can be turned on by activating its [feature flag](/admin-docs/configuration/configuring-superset#feature-flags) on config:
|
|
|
|
```
|
|
FEATURE_FLAGS = {
|
|
"THUMBNAILS": True,
|
|
"THUMBNAILS_SQLA_LISTENERS": True,
|
|
}
|
|
```
|
|
|
|
By default thumbnails are rendered per user, and will fall back to the Selenium user for anonymous users.
|
|
To always render thumbnails as a fixed user (`admin` in this example), use the following configuration:
|
|
|
|
```python
|
|
from superset.tasks.types import FixedExecutor
|
|
|
|
THUMBNAIL_EXECUTORS = [FixedExecutor("admin")]
|
|
```
|
|
|
|
For this feature you will need a cache system and celery workers. All thumbnails are stored on cache
|
|
and are processed asynchronously by the workers.
|
|
|
|
An example config where images are stored on S3 could be:
|
|
|
|
```python
|
|
from flask import Flask
|
|
from s3cache.s3cache import S3Cache
|
|
|
|
...
|
|
|
|
class CeleryConfig(object):
|
|
broker_url = "redis://localhost:6379/0"
|
|
imports = (
|
|
"superset.sql_lab",
|
|
"superset.tasks.thumbnails",
|
|
)
|
|
result_backend = "redis://localhost:6379/0"
|
|
worker_prefetch_multiplier = 10
|
|
task_acks_late = True
|
|
|
|
|
|
CELERY_CONFIG = CeleryConfig
|
|
|
|
def init_thumbnail_cache(app: Flask) -> S3Cache:
|
|
return S3Cache("bucket_name", 'thumbs_cache/')
|
|
|
|
|
|
THUMBNAIL_CACHE_CONFIG = init_thumbnail_cache
|
|
```
|
|
|
|
Using the above example cache keys for dashboards will be `superset_thumb__dashboard__{ID}`. You can
|
|
override the base URL for Selenium using:
|
|
|
|
```
|
|
WEBDRIVER_BASEURL = "https://superset.company.com"
|
|
```
|
|
|
|
To control which user account is used for rendering thumbnails and warming up caches, configure
|
|
`THUMBNAIL_EXECUTORS` and `CACHE_WARMUP_EXECUTORS`. Each accepts a list of executor types (which
|
|
resolve to an owner, creator, modifier, or the currently-logged-in user) and/or a `FixedExecutor`
|
|
pinned to a specific username. By default, thumbnails render as the current user
|
|
(`ExecutorType.CURRENT_USER`) and cache warmup runs as the chart/dashboard owner
|
|
(`ExecutorType.OWNER`).
|
|
|
|
To force both to run as a dedicated service account (`admin` in this example):
|
|
|
|
```python
|
|
from superset.tasks.types import ExecutorType, FixedExecutor
|
|
|
|
THUMBNAIL_EXECUTORS = [FixedExecutor("admin")]
|
|
CACHE_WARMUP_EXECUTORS = [FixedExecutor("admin")]
|
|
```
|
|
|
|
Use a dedicated read-only service account here rather than a personal admin account, so that
|
|
thumbnail rendering and cache warmup tasks don't fail if a specific user's credentials change.
|
|
|
|
Additional Selenium WebDriver configuration can be set using `WEBDRIVER_CONFIGURATION`. You can
|
|
implement a custom function to authenticate Selenium. The default function uses the `flask-login`
|
|
session cookie. Here's an example of a custom function signature:
|
|
|
|
```python
|
|
def auth_driver(driver: WebDriver, user: "User") -> WebDriver:
|
|
pass
|
|
```
|
|
|
|
Then on configuration:
|
|
|
|
```
|
|
WEBDRIVER_AUTH_FUNC = auth_driver
|
|
```
|
|
|
|
## ETag Support for Thumbnails
|
|
|
|
Thumbnail and screenshot endpoints return `ETag` response headers based on the cached content digest. Clients can use conditional requests to avoid downloading unchanged images:
|
|
|
|
```
|
|
GET /api/v1/chart/42/thumbnail/
|
|
If-None-Match: "abc123..."
|
|
|
|
→ 304 Not Modified (if unchanged)
|
|
→ 200 OK (with new image if changed)
|
|
```
|
|
|
|
This is particularly useful for embedded dashboards and external integrations that periodically poll for updated screenshots — unchanged thumbnails return immediately with no payload.
|
|
|
|
## Distributed Coordination Backend
|
|
|
|
Superset supports an optional distributed coordination (`DISTRIBUTED_COORDINATION_CONFIG`) for
|
|
high-performance distributed operations. This configuration enables:
|
|
|
|
- **Distributed locking**: Moves lock operations from the metadata database to Redis, improving
|
|
performance and reducing metastore load
|
|
- **Real-time event notifications**: Enables instant pub/sub messaging for task abort signals and
|
|
completion notifications instead of polling-based approaches
|
|
|
|
:::note
|
|
This requires Redis or Valkey specifically—it uses Redis-specific features (pub/sub, `SET NX EX`)
|
|
that are not available in general Flask-Caching backends.
|
|
:::
|
|
|
|
### Configuration
|
|
|
|
The distributed coordination uses Flask-Caching style configuration for consistency with other cache
|
|
backends. Configure `DISTRIBUTED_COORDINATION_CONFIG` in `superset_config.py`:
|
|
|
|
```python
|
|
DISTRIBUTED_COORDINATION_CONFIG = {
|
|
"CACHE_TYPE": "RedisCache",
|
|
"CACHE_REDIS_HOST": "localhost",
|
|
"CACHE_REDIS_PORT": 6379,
|
|
"CACHE_REDIS_DB": 0,
|
|
"CACHE_REDIS_PASSWORD": "", # Optional
|
|
}
|
|
```
|
|
|
|
For Redis Sentinel deployments:
|
|
|
|
```python
|
|
DISTRIBUTED_COORDINATION_CONFIG = {
|
|
"CACHE_TYPE": "RedisSentinelCache",
|
|
"CACHE_REDIS_SENTINELS": [("sentinel1", 26379), ("sentinel2", 26379)],
|
|
"CACHE_REDIS_SENTINEL_MASTER": "mymaster",
|
|
"CACHE_REDIS_SENTINEL_PASSWORD": None, # Sentinel password (if different)
|
|
"CACHE_REDIS_PASSWORD": "", # Redis password
|
|
"CACHE_REDIS_DB": 0,
|
|
}
|
|
```
|
|
|
|
For SSL/TLS connections:
|
|
|
|
```python
|
|
DISTRIBUTED_COORDINATION_CONFIG = {
|
|
"CACHE_TYPE": "RedisCache",
|
|
"CACHE_REDIS_HOST": "redis.example.com",
|
|
"CACHE_REDIS_PORT": 6380,
|
|
"CACHE_REDIS_SSL": True,
|
|
"CACHE_REDIS_SSL_CERTFILE": "/path/to/client.crt",
|
|
"CACHE_REDIS_SSL_KEYFILE": "/path/to/client.key",
|
|
"CACHE_REDIS_SSL_CA_CERTS": "/path/to/ca.crt",
|
|
}
|
|
```
|
|
|
|
### Distributed Lock TTL
|
|
|
|
You can configure the default lock TTL (time-to-live) in seconds. Locks automatically expire after
|
|
this duration to prevent deadlocks from crashed processes:
|
|
|
|
```python
|
|
DISTRIBUTED_LOCK_DEFAULT_TTL = 30 # Default: 30 seconds
|
|
```
|
|
|
|
Individual lock acquisitions can override this value when needed.
|
|
|
|
### Database-Only Mode
|
|
|
|
When `DISTRIBUTED_COORDINATION_CONFIG` is not configured, Superset uses database-backed operations:
|
|
|
|
- **Locking**: Uses the KeyValue table with periodic cleanup of expired entries
|
|
- **Event notifications**: Uses database polling instead of pub/sub
|
|
|
|
While database-backed operations work reliably, the Redis backend is recommended for production
|
|
deployments where low latency and reduced database load are important.
|
|
|
|
:::resources
|
|
- [Blog: The Data Engineer's Guide to Lightning-Fast Superset Dashboards](https://preset.io/blog/the-data-engineers-guide-to-lightning-fast-apache-superset-dashboards/)
|
|
- [Blog: Accelerating Dashboards with Materialized Views](https://preset.io/blog/accelerating-apache-superset-dashboards-with-materialized-views/)
|
|
:::
|