Files
superset2/docs/admin_docs_versioned_docs/version-6.1.0/configuration/async-queries-celery.mdx
Claude Code cbcfd9599f docs: cut 6.1.0 versions for docs, admin_docs, developer_docs, components
Snapshots all four versioned Docusaurus sections at v6.1.0. Built on
top of the version-cutting tooling work in chore/docs-cut-6.1.0-versions
so the snapshot benefits from:

- Auto-gen refresh before snapshotting (database pages from engine
  spec metadata, API reference from openapi.json, component pages
  from Storybook stories) — captured at the SHA we cut from rather
  than whatever happened to be on disk.
- Data-import freeze: country list, feature flag table, database
  diagnostics, and component metadata are copied into snapshot-local
  `_versioned_data/` dirs so the historical version doesn't silently
  mutate when the source files change.
- Depth-aware import-path rewriter that handles deeply-nested
  component MDX files referencing `../../../src/` from the snapshot.

Versioning behavior: `lastVersion` stays at `current` for every
section, so the canonical URLs (`/docs/...`, `/admin-docs/...`,
`/developer-docs/...`, `/components/...`) continue to render content
from master. The `current` version is consistently labeled "Next"
with an `unreleased` banner, and `6.1.0` is a historical pin
accessible only via its explicit version segment.

Component playground: previously `disabled: true` in versions-config.json,
now enabled and versioned. The plugin block in docusaurus.config.ts
was already gated only by the `disabled` flag, so no other code
changes were needed to bring it back online.

The frozen `databases.json` in the snapshot is the canonical 80-database
artifact from the latest committed state in master (preserved by the
generator's input-hash cache), not a fallback regenerated from a
local Flask environment.
2026-05-13 17:15:46 -07:00

109 lines
3.9 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: Async Queries via Celery
hide_title: true
sidebar_position: 4
version: 1
---
# Async Queries via Celery
## Celery
On large analytic databases, its common to run queries that execute for minutes or hours. To enable
support for long running queries that execute beyond the typical web requests timeout (30-60
seconds), it is necessary to configure an asynchronous backend for Superset which consists of:
- one or many Superset workers (which is implemented as a Celery worker), and can be started with
the `celery worker` command, run `celery worker --help` to view the related options.
- a celery broker (message queue) for which we recommend using Redis or RabbitMQ
- a results backend that defines where the worker will persist the query results
Configuring Celery requires defining a `CELERY_CONFIG` in your `superset_config.py`. Both the worker
and web server processes should have the same configuration.
```python
class CeleryConfig(object):
broker_url = "redis://localhost:6379/0"
imports = (
"superset.sql_lab",
"superset.tasks.scheduler",
)
result_backend = "redis://localhost:6379/0"
worker_prefetch_multiplier = 10
task_acks_late = True
task_annotations = {
"sql_lab.get_sql_results": {
"rate_limit": "100/s",
},
}
CELERY_CONFIG = CeleryConfig
```
To start a Celery worker to leverage the configuration, run the following command:
```bash
celery --app=superset.tasks.celery_app:app worker --pool=prefork -O fair -c 4
```
To start a job which schedules periodic background jobs, run the following command:
```bash
celery --app=superset.tasks.celery_app:app beat
```
To setup a result backend, you need to pass an instance of a derivative of `BaseCache` (`from
flask_caching.backends.base import BaseCache`) to the RESULTS_BACKEND configuration key in your
superset_config.py. You can use Memcached, Redis, S3 (https://pypi.python.org/pypi/s3werkzeugcache),
memory or the file system (in a single server-type setup or for testing), or to write your own
caching interface. Your `superset_config.py` may look something like:
```python
# On S3
from s3cache.s3cache import S3Cache
S3_CACHE_BUCKET = 'foobar-superset'
S3_CACHE_KEY_PREFIX = 'sql_lab_result'
RESULTS_BACKEND = S3Cache(S3_CACHE_BUCKET, S3_CACHE_KEY_PREFIX)
# On Redis
from flask_caching.backends.rediscache import RedisCache
RESULTS_BACKEND = RedisCache(
host='localhost', port=6379, key_prefix='superset_results')
```
For performance gains, [MessagePack](https://github.com/msgpack/msgpack-python) and
[PyArrow](https://arrow.apache.org/docs/python/) are now used for results serialization. This can be
disabled by setting `RESULTS_BACKEND_USE_MSGPACK = False` in your `superset_config.py`, should any
issues arise. Please clear your existing results cache store when upgrading an existing environment.
**Important Notes**
- It is important that all the worker nodes and web servers in the Superset cluster _share a common
metadata database_. This means that SQLite will not work in this context since it has limited
support for concurrency and typically lives on the local file system.
- There should _only be one instance of celery beat running_ in your entire setup. If not,
background jobs can get scheduled multiple times resulting in weird behaviors like duplicate
delivery of reports, higher than expected load / traffic etc.
- SQL Lab will _only run your queries asynchronously if_ you enable **Asynchronous Query Execution**
in your database settings (Sources > Databases > Edit record).
## Celery Flower
Flower is a web based tool for monitoring the Celery cluster which you can install from pip:
```bash
pip install flower
```
You can run flower using:
```bash
celery --app=superset.tasks.celery_app:app flower
```
:::resources
- [Blog: How to Set Up Global Async Queries (GAQ) in Apache Superset](https://medium.com/@ngigilevis/how-to-set-up-global-async-queries-gaq-in-apache-superset-a-complete-guide-9d2f4a047559)
:::