Files
superset2/docs/admin_docs_versioned_docs/version-6.1.0/configuration/cache.mdx
Claude Code 4266d36cf4 docs: cut 6.1.0 versions for docs, admin_docs, developer_docs, components
Snapshots all four versioned Docusaurus sections at v6.1.0, cut from
master after the version-cutting tooling (#39837) and broken-internal-
links fixes (#40102) landed. Captures fresh auto-generated content and
freezes data dependencies so the historical snapshot stays correct.

Versioning behavior: lastVersion stays at current for every section,
so the canonical URLs (/docs/..., /admin-docs/..., /developer-docs/...,
/components/...) continue to render content from master. The current
version is consistently labeled "Next" with an unreleased banner, and
6.1.0 is a historical pin accessible only via its explicit version
segment.

Component playground: previously disabled: true in versions-config.json,
now enabled and versioned. The plugin block in docusaurus.config.ts
was already gated only by the disabled flag, so no other code changes
were needed to bring it back online.

Snapshot includes:
- All MDX content for the four sections.
- Auto-gen captured fresh: 74 database pages (engine spec metadata),
  ~1,800 API reference files (openapi.json), 59 component pages
  (Storybook stories).
- Data imports frozen at cut time into snapshot-local _versioned_data/
  dirs:
    versioned_docs/version-6.1.0/_versioned_data/src/data/databases.json
      (canonical 80-database diagnostics from master, preserved by the
      generator's input-hash cache)
    admin_docs_versioned_docs/version-6.1.0/_versioned_data/data/countries.json
    admin_docs_versioned_docs/version-6.1.0/_versioned_data/static/feature-flags.json
    developer_docs_versioned_docs/version-6.1.0/_versioned_data/static/data/components.json
- Import paths in deeply-nested files rewritten so they still resolve
  from one directory deeper inside the snapshot.

Verified via full yarn build: exit 0, no broken links surfaced by
onBrokenLinks: throw. Anchor warnings present are pre-existing on
master (community#superset-community-calendar) and unrelated.
2026-05-14 13:55:27 -07:00

277 lines
9.6 KiB
Plaintext

---
title: Caching
hide_title: true
sidebar_position: 3
version: 1
---
# Caching
:::note
When a cache backend is configured, Superset expects it to remain available. Operations will
fail if the configured backend becomes unavailable rather than silently degrading. This
fail-fast behavior ensures operators are immediately aware of infrastructure issues.
:::
Superset uses [Flask-Caching](https://flask-caching.readthedocs.io/) for caching purposes.
Flask-Caching supports various caching backends, including Redis (recommended), Memcached,
SimpleCache (in-memory), or the local filesystem.
[Custom cache backends](https://flask-caching.readthedocs.io/en/latest/#custom-cache-backends)
are also supported.
Caching can be configured by providing dictionaries in
`superset_config.py` that comply with [the Flask-Caching config specifications](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching).
The following cache configurations can be customized in this way:
- Dashboard filter state (required): `FILTER_STATE_CACHE_CONFIG`.
- Explore chart form data (required): `EXPLORE_FORM_DATA_CACHE_CONFIG`
- Metadata cache (optional): `CACHE_CONFIG`
- Charting data queried from datasets (optional): `DATA_CACHE_CONFIG`
For example, to configure the filter state cache using Redis:
```python
FILTER_STATE_CACHE_CONFIG = {
'CACHE_TYPE': 'RedisCache',
'CACHE_DEFAULT_TIMEOUT': 86400,
'CACHE_KEY_PREFIX': 'superset_filter_cache',
'CACHE_REDIS_URL': 'redis://localhost:6379/0'
}
```
## Dependencies
In order to use dedicated cache stores, additional python libraries must be installed
- For Redis: we recommend the [redis](https://pypi.python.org/pypi/redis) Python package
- Memcached: we recommend using [pylibmc](https://pypi.org/project/pylibmc/) client library as
`python-memcached` does not handle storing binary data correctly.
These libraries can be installed using pip.
## Fallback Metastore Cache
Note, that some form of Filter State and Explore caching are required. If either of these caches
are undefined, Superset falls back to using a built-in cache that stores data in the metadata
database. While it is recommended to use a dedicated cache, the built-in cache can also be used
to cache other data.
For example, to use the built-in cache to store chart data, use the following config:
```python
DATA_CACHE_CONFIG = {
"CACHE_TYPE": "SupersetMetastoreCache",
"CACHE_KEY_PREFIX": "superset_results", # make sure this string is unique to avoid collisions
"CACHE_DEFAULT_TIMEOUT": 86400, # 60 seconds * 60 minutes * 24 hours
}
```
## Chart Cache Timeout
The cache timeout for charts may be overridden by the settings for an individual chart, dataset, or
database. Each of these configurations will be checked in order before falling back to the default
value defined in `DATA_CACHE_CONFIG`.
Note, that by setting the cache timeout to `-1`, caching for charting data can be disabled, either
per chart, dataset or database, or by default if set in `DATA_CACHE_CONFIG`.
## SQL Lab Query Results
Caching for SQL Lab query results is used when async queries are enabled and is configured using
`RESULTS_BACKEND`.
Note that this configuration does not use a flask-caching dictionary for its configuration, but
instead requires a cachelib object.
See [Async Queries via Celery](/admin-docs/configuration/async-queries-celery) for details.
## Caching Thumbnails
This is an optional feature that can be turned on by activating its [feature flag](/admin-docs/configuration/configuring-superset#feature-flags) on config:
```
FEATURE_FLAGS = {
"THUMBNAILS": True,
"THUMBNAILS_SQLA_LISTENERS": True,
}
```
By default thumbnails are rendered per user, and will fall back to the Selenium user for anonymous users.
To always render thumbnails as a fixed user (`admin` in this example), use the following configuration:
```python
from superset.tasks.types import FixedExecutor
THUMBNAIL_EXECUTORS = [FixedExecutor("admin")]
```
For this feature you will need a cache system and celery workers. All thumbnails are stored on cache
and are processed asynchronously by the workers.
An example config where images are stored on S3 could be:
```python
from flask import Flask
from s3cache.s3cache import S3Cache
...
class CeleryConfig(object):
broker_url = "redis://localhost:6379/0"
imports = (
"superset.sql_lab",
"superset.tasks.thumbnails",
)
result_backend = "redis://localhost:6379/0"
worker_prefetch_multiplier = 10
task_acks_late = True
CELERY_CONFIG = CeleryConfig
def init_thumbnail_cache(app: Flask) -> S3Cache:
return S3Cache("bucket_name", 'thumbs_cache/')
THUMBNAIL_CACHE_CONFIG = init_thumbnail_cache
```
Using the above example cache keys for dashboards will be `superset_thumb__dashboard__{ID}`. You can
override the base URL for Selenium using:
```
WEBDRIVER_BASEURL = "https://superset.company.com"
```
To control which user account is used for rendering thumbnails and warming up caches, configure
`THUMBNAIL_EXECUTORS` and `CACHE_WARMUP_EXECUTORS`. Each accepts a list of executor types (which
resolve to an owner, creator, modifier, or the currently-logged-in user) and/or a `FixedExecutor`
pinned to a specific username. By default, thumbnails render as the current user
(`ExecutorType.CURRENT_USER`) and cache warmup runs as the chart/dashboard owner
(`ExecutorType.OWNER`).
To force both to run as a dedicated service account (`admin` in this example):
```python
from superset.tasks.types import ExecutorType, FixedExecutor
THUMBNAIL_EXECUTORS = [FixedExecutor("admin")]
CACHE_WARMUP_EXECUTORS = [FixedExecutor("admin")]
```
Use a dedicated read-only service account here rather than a personal admin account, so that
thumbnail rendering and cache warmup tasks don't fail if a specific user's credentials change.
Additional Selenium WebDriver configuration can be set using `WEBDRIVER_CONFIGURATION`. You can
implement a custom function to authenticate Selenium. The default function uses the `flask-login`
session cookie. Here's an example of a custom function signature:
```python
def auth_driver(driver: WebDriver, user: "User") -> WebDriver:
pass
```
Then on configuration:
```
WEBDRIVER_AUTH_FUNC = auth_driver
```
## ETag Support for Thumbnails
Thumbnail and screenshot endpoints return `ETag` response headers based on the cached content digest. Clients can use conditional requests to avoid downloading unchanged images:
```
GET /api/v1/chart/42/thumbnail/
If-None-Match: "abc123..."
→ 304 Not Modified (if unchanged)
→ 200 OK (with new image if changed)
```
This is particularly useful for embedded dashboards and external integrations that periodically poll for updated screenshots — unchanged thumbnails return immediately with no payload.
## Distributed Coordination Backend
Superset supports an optional distributed coordination (`DISTRIBUTED_COORDINATION_CONFIG`) for
high-performance distributed operations. This configuration enables:
- **Distributed locking**: Moves lock operations from the metadata database to Redis, improving
performance and reducing metastore load
- **Real-time event notifications**: Enables instant pub/sub messaging for task abort signals and
completion notifications instead of polling-based approaches
:::note
This requires Redis or Valkey specifically—it uses Redis-specific features (pub/sub, `SET NX EX`)
that are not available in general Flask-Caching backends.
:::
### Configuration
The distributed coordination uses Flask-Caching style configuration for consistency with other cache
backends. Configure `DISTRIBUTED_COORDINATION_CONFIG` in `superset_config.py`:
```python
DISTRIBUTED_COORDINATION_CONFIG = {
"CACHE_TYPE": "RedisCache",
"CACHE_REDIS_HOST": "localhost",
"CACHE_REDIS_PORT": 6379,
"CACHE_REDIS_DB": 0,
"CACHE_REDIS_PASSWORD": "", # Optional
}
```
For Redis Sentinel deployments:
```python
DISTRIBUTED_COORDINATION_CONFIG = {
"CACHE_TYPE": "RedisSentinelCache",
"CACHE_REDIS_SENTINELS": [("sentinel1", 26379), ("sentinel2", 26379)],
"CACHE_REDIS_SENTINEL_MASTER": "mymaster",
"CACHE_REDIS_SENTINEL_PASSWORD": None, # Sentinel password (if different)
"CACHE_REDIS_PASSWORD": "", # Redis password
"CACHE_REDIS_DB": 0,
}
```
For SSL/TLS connections:
```python
DISTRIBUTED_COORDINATION_CONFIG = {
"CACHE_TYPE": "RedisCache",
"CACHE_REDIS_HOST": "redis.example.com",
"CACHE_REDIS_PORT": 6380,
"CACHE_REDIS_SSL": True,
"CACHE_REDIS_SSL_CERTFILE": "/path/to/client.crt",
"CACHE_REDIS_SSL_KEYFILE": "/path/to/client.key",
"CACHE_REDIS_SSL_CA_CERTS": "/path/to/ca.crt",
}
```
### Distributed Lock TTL
You can configure the default lock TTL (time-to-live) in seconds. Locks automatically expire after
this duration to prevent deadlocks from crashed processes:
```python
DISTRIBUTED_LOCK_DEFAULT_TTL = 30 # Default: 30 seconds
```
Individual lock acquisitions can override this value when needed.
### Database-Only Mode
When `DISTRIBUTED_COORDINATION_CONFIG` is not configured, Superset uses database-backed operations:
- **Locking**: Uses the KeyValue table with periodic cleanup of expired entries
- **Event notifications**: Uses database polling instead of pub/sub
While database-backed operations work reliably, the Redis backend is recommended for production
deployments where low latency and reduced database load are important.
:::resources
- [Blog: The Data Engineer's Guide to Lightning-Fast Superset Dashboards](https://preset.io/blog/the-data-engineers-guide-to-lightning-fast-apache-superset-dashboards/)
- [Blog: Accelerating Dashboards with Materialized Views](https://preset.io/blog/accelerating-apache-superset-dashboards-with-materialized-views/)
:::