mirror of
https://github.com/apache/superset.git
synced 2026-04-19 16:14:52 +00:00
[SIP-3] Scheduled email reports for Slices / Dashboards (#5294)
* [scheduled reports] Add support for scheduled reports * Scheduled email reports for slice and dashboard visualization (attachment or inline) * Scheduled email reports for slice data (CSV attachment on inline table) * Each schedule has a list of recipients (all of them can receive a single mail, or separate mails) * All outgoing mails can have a mandatory bcc - for audit purposes. * Each dashboard/slice can have multiple schedules. In addition, this PR also makes a few minor improvements to the celery infrastructure. * Create a common celery app * Added more celery annotations for the tasks * Introduced celery beat * Update docs about concurrency / pools * [scheduled reports] - Debug mode for scheduled emails * [scheduled reports] - Ability to send test mails * [scheduled reports] - Test email functionality - minor improvements * [scheduled reports] - Rebase with master. Minor fixes * [scheduled reports] - Add warning messages * [scheduled reports] - flake8 * [scheduled reports] - fix rebase * [scheduled reports] - fix rebase * [scheduled reports] - fix flake8 * [scheduled reports] Rebase in prep for merge * Fixed alembic tree after rebase * Updated requirements to latest version of packages (and tested) * Removed py2 stuff * [scheduled reports] - fix flake8 * [scheduled reports] - address review comments * [scheduled reports] - rebase with master
This commit is contained in:
committed by
Maxime Beauchemin
parent
f366bbe735
commit
808622414c
@@ -88,7 +88,7 @@ It's easy: use the ``Filter Box`` widget, build a slice, and add it to your
|
||||
dashboard.
|
||||
|
||||
The ``Filter Box`` widget allows you to define a query to populate dropdowns
|
||||
that can be use for filtering. To build the list of distinct values, we
|
||||
that can be used for filtering. To build the list of distinct values, we
|
||||
run a query, and sort the result by the metric you provide, sorting
|
||||
descending.
|
||||
|
||||
|
||||
@@ -603,14 +603,12 @@ Upgrading should be as straightforward as running::
|
||||
superset db upgrade
|
||||
superset init
|
||||
|
||||
SQL Lab
|
||||
-------
|
||||
SQL Lab is a powerful SQL IDE that works with all SQLAlchemy compatible
|
||||
databases. By default, queries are executed in the scope of a web
|
||||
request so they
|
||||
may eventually timeout as queries exceed the maximum duration of a web
|
||||
request in your environment, whether it'd be a reverse proxy or the Superset
|
||||
server itself.
|
||||
Celery Tasks
|
||||
------------
|
||||
On large analytic databases, it's common to run background jobs, reports
|
||||
and/or queries that execute for minutes or hours. In certain cases, we need
|
||||
to support long running tasks that execute beyond the typical web request's
|
||||
timeout (30-60 seconds).
|
||||
|
||||
On large analytic databases, it's common to run queries that
|
||||
execute for minutes or hours.
|
||||
@@ -634,15 +632,41 @@ have the same configuration.
|
||||
|
||||
class CeleryConfig(object):
|
||||
BROKER_URL = 'redis://localhost:6379/0'
|
||||
CELERY_IMPORTS = ('superset.sql_lab', )
|
||||
CELERY_IMPORTS = (
|
||||
'superset.sql_lab',
|
||||
'superset.tasks',
|
||||
)
|
||||
CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'
|
||||
CELERY_ANNOTATIONS = {'tasks.add': {'rate_limit': '10/s'}}
|
||||
CELERYD_LOG_LEVEL = 'DEBUG'
|
||||
CELERYD_PREFETCH_MULTIPLIER = 10
|
||||
CELERY_ACKS_LATE = True
|
||||
CELERY_ANNOTATIONS = {
|
||||
'sql_lab.get_sql_results': {
|
||||
'rate_limit': '100/s',
|
||||
},
|
||||
'email_reports.send': {
|
||||
'rate_limit': '1/s',
|
||||
'time_limit': 120,
|
||||
'soft_time_limit': 150,
|
||||
'ignore_result': True,
|
||||
},
|
||||
}
|
||||
CELERYBEAT_SCHEDULE = {
|
||||
'email_reports.schedule_hourly': {
|
||||
'task': 'email_reports.schedule_hourly',
|
||||
'schedule': crontab(minute=1, hour='*'),
|
||||
},
|
||||
}
|
||||
|
||||
CELERY_CONFIG = CeleryConfig
|
||||
|
||||
To start a Celery worker to leverage the configuration run: ::
|
||||
* To start a Celery worker to leverage the configuration run: ::
|
||||
|
||||
celery worker --app=superset.sql_lab:celery_app --pool=gevent -Ofair
|
||||
celery worker --app=superset.tasks.celery_app:app --pool=prefork -Ofair -c 4
|
||||
|
||||
* To start a job which schedules periodic background jobs, run ::
|
||||
|
||||
celery beat --app=superset.tasks.celery_app:app
|
||||
|
||||
To setup a result backend, you need to pass an instance of a derivative
|
||||
of ``werkzeug.contrib.cache.BaseCache`` to the ``RESULTS_BACKEND``
|
||||
@@ -665,11 +689,65 @@ look something like:
|
||||
RESULTS_BACKEND = RedisCache(
|
||||
host='localhost', port=6379, key_prefix='superset_results')
|
||||
|
||||
Note that it's important that all the worker nodes and web servers in
|
||||
the Superset cluster share a common metadata database.
|
||||
This means that SQLite will not work in this context since it has
|
||||
limited support for concurrency and
|
||||
typically lives on the local file system.
|
||||
**Important notes**
|
||||
|
||||
* It is important that all the worker nodes and web servers in
|
||||
the Superset cluster share a common metadata database.
|
||||
This means that SQLite will not work in this context since it has
|
||||
limited support for concurrency and
|
||||
typically lives on the local file system.
|
||||
|
||||
* There should only be one instance of ``celery beat`` running in your
|
||||
entire setup. If not, background jobs can get scheduled multiple times
|
||||
resulting in weird behaviors like duplicate delivery of reports,
|
||||
higher than expected load / traffic etc.
|
||||
|
||||
|
||||
Email Reports
|
||||
-------------
|
||||
Email reports allow users to schedule email reports for
|
||||
|
||||
* slice and dashboard visualization (Attachment or inline)
|
||||
* slice data (CSV attachment on inline table)
|
||||
|
||||
Schedules are defined in crontab format and each schedule
|
||||
can have a list of recipients (all of them can receive a single mail,
|
||||
or separate mails). For audit purposes, all outgoing mails can have a
|
||||
mandatory bcc.
|
||||
|
||||
**Requirements**
|
||||
|
||||
* A selenium compatible driver & headless browser
|
||||
|
||||
* `geckodriver <https://github.com/mozilla/geckodriver>`_ and Firefox is preferred
|
||||
* `chromedriver <http://chromedriver.chromium.org/>`_ is a good option too
|
||||
* Run `celery worker` and `celery beat` as follows ::
|
||||
|
||||
celery worker --app=superset.tasks.celery_app:app --pool=prefork -Ofair -c 4
|
||||
celery beat --app=superset.tasks.celery_app:app
|
||||
|
||||
**Important notes**
|
||||
|
||||
* Be mindful of the concurrency setting for celery (using ``-c 4``).
|
||||
Selenium/webdriver instances can consume a lot of CPU / memory on your servers.
|
||||
|
||||
* In some cases, if you notice a lot of leaked ``geckodriver`` processes, try running
|
||||
your celery processes with ::
|
||||
|
||||
celery worker --pool=prefork --max-tasks-per-child=128 ...
|
||||
|
||||
* It is recommended to run separate workers for ``sql_lab`` and
|
||||
``email_reports`` tasks. Can be done by using ``queue`` field in ``CELERY_ANNOTATIONS``
|
||||
|
||||
SQL Lab
|
||||
-------
|
||||
SQL Lab is a powerful SQL IDE that works with all SQLAlchemy compatible
|
||||
databases. By default, queries are executed in the scope of a web
|
||||
request so they may eventually timeout as queries exceed the maximum duration of a web
|
||||
request in your environment, whether it'd be a reverse proxy or the Superset
|
||||
server itself. In such cases, it is preferred to use ``celery`` to run the queries
|
||||
in the background. Please follow the examples/notes mentioned above to get your
|
||||
celery setup working.
|
||||
|
||||
Also note that SQL Lab supports Jinja templating in queries and that it's
|
||||
possible to overload
|
||||
@@ -684,6 +762,8 @@ in this dictionary are made available for users to use in their SQL.
|
||||
}
|
||||
|
||||
|
||||
Celery Flower
|
||||
-------------
|
||||
Flower is a web based tool for monitoring the Celery cluster which you can
|
||||
install from pip: ::
|
||||
|
||||
@@ -691,7 +771,7 @@ install from pip: ::
|
||||
|
||||
and run via: ::
|
||||
|
||||
celery flower --app=superset.sql_lab:celery_app
|
||||
celery flower --app=superset.tasks.celery_app:app
|
||||
|
||||
Building from source
|
||||
---------------------
|
||||
|
||||
Reference in New Issue
Block a user