mirror of
https://github.com/apache/superset.git
synced 2026-05-23 16:55:19 +00:00
docs: cut 6.1.0 versions for docs, admin_docs, developer_docs, components
Snapshots all four versioned Docusaurus sections at v6.1.0. Built on top of the version-cutting tooling work in chore/docs-cut-6.1.0-versions so the snapshot benefits from: - Auto-gen refresh before snapshotting (database pages from engine spec metadata, API reference from openapi.json, component pages from Storybook stories) — captured at the SHA we cut from rather than whatever happened to be on disk. - Data-import freeze: country list, feature flag table, database diagnostics, and component metadata are copied into snapshot-local `_versioned_data/` dirs so the historical version doesn't silently mutate when the source files change. - Depth-aware import-path rewriter that handles deeply-nested component MDX files referencing `../../../src/` from the snapshot. Versioning behavior: `lastVersion` stays at `current` for every section, so the canonical URLs (`/docs/...`, `/admin-docs/...`, `/developer-docs/...`, `/components/...`) continue to render content from master. The `current` version is consistently labeled "Next" with an `unreleased` banner, and `6.1.0` is a historical pin accessible only via its explicit version segment. Component playground: previously `disabled: true` in versions-config.json, now enabled and versioned. The plugin block in docusaurus.config.ts was already gated only by the `disabled` flag, so no other code changes were needed to bring it back online. The frozen `databases.json` in the snapshot is the canonical 80-database artifact from the latest committed state in master (preserved by the generator's input-hash cache), not a fallback regenerated from a local Flask environment.
This commit is contained in:
@@ -0,0 +1,108 @@
|
||||
---
|
||||
title: Async Queries via Celery
|
||||
hide_title: true
|
||||
sidebar_position: 4
|
||||
version: 1
|
||||
---
|
||||
|
||||
# Async Queries via Celery
|
||||
|
||||
## Celery
|
||||
|
||||
On large analytic databases, it’s common to run queries that execute for minutes or hours. To enable
|
||||
support for long running queries that execute beyond the typical web request’s timeout (30-60
|
||||
seconds), it is necessary to configure an asynchronous backend for Superset which consists of:
|
||||
|
||||
- one or many Superset workers (which is implemented as a Celery worker), and can be started with
|
||||
the `celery worker` command, run `celery worker --help` to view the related options.
|
||||
- a celery broker (message queue) for which we recommend using Redis or RabbitMQ
|
||||
- a results backend that defines where the worker will persist the query results
|
||||
|
||||
Configuring Celery requires defining a `CELERY_CONFIG` in your `superset_config.py`. Both the worker
|
||||
and web server processes should have the same configuration.
|
||||
|
||||
```python
|
||||
class CeleryConfig(object):
|
||||
broker_url = "redis://localhost:6379/0"
|
||||
imports = (
|
||||
"superset.sql_lab",
|
||||
"superset.tasks.scheduler",
|
||||
)
|
||||
result_backend = "redis://localhost:6379/0"
|
||||
worker_prefetch_multiplier = 10
|
||||
task_acks_late = True
|
||||
task_annotations = {
|
||||
"sql_lab.get_sql_results": {
|
||||
"rate_limit": "100/s",
|
||||
},
|
||||
}
|
||||
|
||||
CELERY_CONFIG = CeleryConfig
|
||||
```
|
||||
|
||||
To start a Celery worker to leverage the configuration, run the following command:
|
||||
|
||||
```bash
|
||||
celery --app=superset.tasks.celery_app:app worker --pool=prefork -O fair -c 4
|
||||
```
|
||||
|
||||
To start a job which schedules periodic background jobs, run the following command:
|
||||
|
||||
```bash
|
||||
celery --app=superset.tasks.celery_app:app beat
|
||||
```
|
||||
|
||||
To setup a result backend, you need to pass an instance of a derivative of `BaseCache` (`from
|
||||
flask_caching.backends.base import BaseCache`) to the RESULTS_BACKEND configuration key in your
|
||||
superset_config.py. You can use Memcached, Redis, S3 (https://pypi.python.org/pypi/s3werkzeugcache),
|
||||
memory or the file system (in a single server-type setup or for testing), or to write your own
|
||||
caching interface. Your `superset_config.py` may look something like:
|
||||
|
||||
```python
|
||||
# On S3
|
||||
from s3cache.s3cache import S3Cache
|
||||
S3_CACHE_BUCKET = 'foobar-superset'
|
||||
S3_CACHE_KEY_PREFIX = 'sql_lab_result'
|
||||
RESULTS_BACKEND = S3Cache(S3_CACHE_BUCKET, S3_CACHE_KEY_PREFIX)
|
||||
|
||||
# On Redis
|
||||
from flask_caching.backends.rediscache import RedisCache
|
||||
RESULTS_BACKEND = RedisCache(
|
||||
host='localhost', port=6379, key_prefix='superset_results')
|
||||
```
|
||||
|
||||
For performance gains, [MessagePack](https://github.com/msgpack/msgpack-python) and
|
||||
[PyArrow](https://arrow.apache.org/docs/python/) are now used for results serialization. This can be
|
||||
disabled by setting `RESULTS_BACKEND_USE_MSGPACK = False` in your `superset_config.py`, should any
|
||||
issues arise. Please clear your existing results cache store when upgrading an existing environment.
|
||||
|
||||
**Important Notes**
|
||||
|
||||
- It is important that all the worker nodes and web servers in the Superset cluster _share a common
|
||||
metadata database_. This means that SQLite will not work in this context since it has limited
|
||||
support for concurrency and typically lives on the local file system.
|
||||
|
||||
- There should _only be one instance of celery beat running_ in your entire setup. If not,
|
||||
background jobs can get scheduled multiple times resulting in weird behaviors like duplicate
|
||||
delivery of reports, higher than expected load / traffic etc.
|
||||
|
||||
- SQL Lab will _only run your queries asynchronously if_ you enable **Asynchronous Query Execution**
|
||||
in your database settings (Sources > Databases > Edit record).
|
||||
|
||||
## Celery Flower
|
||||
|
||||
Flower is a web based tool for monitoring the Celery cluster which you can install from pip:
|
||||
|
||||
```bash
|
||||
pip install flower
|
||||
```
|
||||
|
||||
You can run flower using:
|
||||
|
||||
```bash
|
||||
celery --app=superset.tasks.celery_app:app flower
|
||||
```
|
||||
|
||||
:::resources
|
||||
- [Blog: How to Set Up Global Async Queries (GAQ) in Apache Superset](https://medium.com/@ngigilevis/how-to-set-up-global-async-queries-gaq-in-apache-superset-a-complete-guide-9d2f4a047559)
|
||||
:::
|
||||
Reference in New Issue
Block a user