mirror of
https://github.com/apache/superset.git
synced 2026-04-17 23:25:05 +00:00
docs: bifurcate documentation into user and admin sections (#38196)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
72
docs/admin_docs/installation/architecture.mdx
Normal file
72
docs/admin_docs/installation/architecture.mdx
Normal file
@@ -0,0 +1,72 @@
|
||||
---
|
||||
title: Architecture
|
||||
hide_title: true
|
||||
sidebar_position: 1
|
||||
version: 1
|
||||
---
|
||||
|
||||
import useBaseUrl from "@docusaurus/useBaseUrl";
|
||||
|
||||
# Architecture
|
||||
|
||||
This page is meant to give new administrators an understanding of Superset's components.
|
||||
|
||||
## Components
|
||||
|
||||
A Superset installation is made up of these components:
|
||||
|
||||
1. The Superset application itself
|
||||
2. A metadata database
|
||||
3. A caching layer (optional, but necessary for some features)
|
||||
4. A worker & beat (optional, but necessary for some features)
|
||||
|
||||
### Optional components and associated features
|
||||
|
||||
The optional components above are necessary to enable these features:
|
||||
|
||||
- [Alerts and Reports](/admin-docs/configuration/alerts-reports)
|
||||
- [Caching](/admin-docs/configuration/cache)
|
||||
- [Async Queries](/admin-docs/configuration/async-queries-celery/)
|
||||
- [Dashboard Thumbnails](/admin-docs/configuration/cache/#caching-thumbnails)
|
||||
|
||||
If you install with Kubernetes or Docker Compose, all of these components will be created.
|
||||
|
||||
However, installing from PyPI only creates the application itself. Users installing from PyPI will need to configure a caching layer, worker, and beat on their own if they wish to enable the above features. Configuration of those components for a PyPI install is not currently covered in this documentation.
|
||||
|
||||
Here are further details on each component.
|
||||
|
||||
### The Superset Application
|
||||
|
||||
This is the core application. Superset operates like this:
|
||||
|
||||
- A user visits a chart or dashboard
|
||||
- That triggers a SQL query to the data warehouse holding the underlying dataset
|
||||
- The resulting data is served up in a data visualization
|
||||
- The Superset application is comprised of the Python (Flask) backend application (server), API layer, and the React frontend, built via Webpack, and static assets needed for the application to work
|
||||
|
||||
### Metadata Database
|
||||
|
||||
This is where chart and dashboard definitions, user information, logs, etc. are stored. Superset is tested to work with PostgreSQL and MySQL databases as the metadata database (not be confused with a data source like your data warehouse, which could be a much greater variety of options like Snowflake, Redshift, etc.).
|
||||
|
||||
Some installation methods like our Quickstart and PyPI come configured by default to use a SQLite on-disk database. And in a Docker Compose installation, the data would be stored in a PostgreSQL container volume. Neither of these cases are recommended for production instances of Superset.
|
||||
|
||||
For production, a properly-configured, managed, standalone database is recommended. No matter what database you use, you should plan to back it up regularly.
|
||||
|
||||
### Caching Layer
|
||||
|
||||
The caching layer serves two main functions:
|
||||
|
||||
- Store the results of queries to your data warehouse so that when a chart is loaded twice, it pulls from the cache the second time, speeding up the application and reducing load on your data warehouse.
|
||||
- Act as a message broker for the worker, enabling the Alerts & Reports, async queries, and thumbnail caching features.
|
||||
|
||||
Most people use Redis for their cache, but Superset supports other options too. See the [cache docs](/admin-docs/configuration/cache/) for more.
|
||||
|
||||
### Worker and Beat
|
||||
|
||||
This is one or more workers who execute tasks like run async queries or take snapshots of reports and send emails, and a "beat" that acts as the scheduler and tells workers when to perform their tasks. Most installations use Celery for these components.
|
||||
|
||||
## Other components
|
||||
|
||||
Other components can be incorporated into Superset. The best place to learn about additional configurations is the [Configuration page](/admin-docs/configuration/configuring-superset). For instance, you could set up a load balancer or reverse proxy to implement HTTPS in front of your Superset application, or specify a Mapbox URL to enable geospatial charts, etc.
|
||||
|
||||
Superset won't even start without certain configuration settings established, so it's essential to review that page.
|
||||
173
docs/admin_docs/installation/docker-builds.mdx
Normal file
173
docs/admin_docs/installation/docker-builds.mdx
Normal file
@@ -0,0 +1,173 @@
|
||||
---
|
||||
title: Docker Builds
|
||||
hide_title: true
|
||||
sidebar_position: 7
|
||||
version: 1
|
||||
---
|
||||
|
||||
# Docker builds, images and tags
|
||||
|
||||
The Apache Superset community extensively uses Docker for development, release,
|
||||
and productionizing Superset. This page details our Docker builds and tag naming
|
||||
schemes to help users navigate our offerings.
|
||||
|
||||
Images are built and pushed to the [Superset Docker Hub repository](
|
||||
https://hub.docker.com/r/apache/superset) using GitHub Actions.
|
||||
Different sets of images are built and/or published at different times:
|
||||
|
||||
- **Published releases** (`release`): published using
|
||||
tags like `5.0.0` and the `latest` tag.
|
||||
- **Pull request iterations** (`pull_request`): for each pull request, while
|
||||
we actively build the docker to validate the build, we do
|
||||
not publish those images for security reasons, we simply `docker build --load`
|
||||
- **Merges to the main branch** (`push`): resulting in new SHAs, with tags
|
||||
prefixed with `master` for the latest `master` version.
|
||||
|
||||
## Build presets
|
||||
|
||||
We have a set of build "presets" that each represent a combination of
|
||||
parameters for the build, mostly pointing to either different target layer
|
||||
for the build, and/or base image.
|
||||
|
||||
Here are the build presets that are exposed through the `supersetbot docker` utility:
|
||||
|
||||
- `lean`: The default Docker image, including both frontend and backend. Tags
|
||||
without a build_preset are lean builds (ie: `latest`, `5.0.0`, `4.1.2`, ...). `lean`
|
||||
builds do not contain database
|
||||
drivers, meaning you need to install your own. That applies to analytics databases **AND
|
||||
the metadata database**. You'll likely want to layer either `mysqlclient` or `psycopg2-binary`
|
||||
depending on the metadata database you choose for your installation, plus the required
|
||||
drivers to connect to your analytics database(s).
|
||||
- `dev`: For development, with a headless browser, dev-related utilities and root access. This
|
||||
includes some commonly used database drivers like `mysqlclient`, `psycopg2-binary` and
|
||||
some other used for development/CI
|
||||
- `py311`, e.g., Py311: Similar to lean but with a different Python version (in this example, 3.11).
|
||||
- `ci`: For certain CI workloads.
|
||||
- `websocket`: For Superset clusters supporting advanced features.
|
||||
- `dockerize`: Used by Helm in initContainers to wait for database dependencies to be available.
|
||||
|
||||
## Key tags examples
|
||||
|
||||
- `latest`: The latest official release build
|
||||
- `latest-dev`: the `-dev` image of the latest official release build, with a
|
||||
headless browser and root access.
|
||||
- `master`: The latest build from the `master` branch, implicitly the lean build
|
||||
preset
|
||||
- `master-dev`: Similar to `master` but includes a headless browser and root access.
|
||||
- `pr-5252`: The latest commit in PR 5252.
|
||||
- `30948dc401b40982cb7c0dbf6ebbe443b2748c1b-dev`: A build for
|
||||
this specific SHA, which could be from a `master` merge, or release.
|
||||
- `websocket-latest`: The WebSocket image for use in a Superset cluster.
|
||||
|
||||
For insights or modifications to the build matrix and tagging conventions,
|
||||
check the [supersetbot docker](https://github.com/apache-superset/supersetbot)
|
||||
subcommand and the [docker.yml](https://github.com/apache/superset/blob/master/.github/workflows/docker.yml)
|
||||
GitHub action.
|
||||
|
||||
## Building your own production Docker image
|
||||
|
||||
Every Superset deployment will require its own set of drivers depending on the data warehouse(s),
|
||||
etc. so we recommend that users build their own Docker image by extending the `lean` image.
|
||||
|
||||
Here's an example Dockerfile that does this. Follow the in-line comments to customize it for
|
||||
your desired Superset version and database drivers. The comments also note that a certain feature flag will
|
||||
have to be enabled in your config file.
|
||||
|
||||
You would build the image with `docker build -t mysuperset:latest .` or `docker build -t ourcompanysuperset:5.0.0 .`
|
||||
|
||||
```Dockerfile
|
||||
# change this to apache/superset:5.0.0 or whatever version you want to build from;
|
||||
# otherwise the default is the latest commit on GitHub master branch
|
||||
FROM apache/superset:master
|
||||
|
||||
USER root
|
||||
|
||||
# Set environment variable for Playwright
|
||||
ENV PLAYWRIGHT_BROWSERS_PATH=/usr/local/share/playwright-browsers
|
||||
|
||||
# Install packages using uv into the virtual environment
|
||||
RUN . /app/.venv/bin/activate && \
|
||||
uv pip install \
|
||||
# install psycopg2 for using PostgreSQL metadata store - could be a MySQL package if using that backend:
|
||||
psycopg2-binary \
|
||||
# add the driver(s) for your data warehouse(s), in this example we're showing for Microsoft SQL Server:
|
||||
pymssql \
|
||||
# package needed for using single-sign on authentication:
|
||||
Authlib \
|
||||
# openpyxl to be able to upload Excel files
|
||||
openpyxl \
|
||||
# Pillow for Alerts & Reports to generate PDFs of dashboards
|
||||
Pillow \
|
||||
# install Playwright for taking screenshots for Alerts & Reports. This assumes the feature flag PLAYWRIGHT_REPORTS_AND_THUMBNAILS is enabled
|
||||
# That feature flag will default to True starting in 6.0.0
|
||||
# Playwright works only with Chrome.
|
||||
# If you are still using Selenium instead of Playwright, you would instead install here the selenium package and a headless browser & webdriver
|
||||
playwright \
|
||||
&& playwright install-deps \
|
||||
&& PLAYWRIGHT_BROWSERS_PATH=/usr/local/share/playwright-browsers playwright install chromium
|
||||
|
||||
# Switch back to the superset user
|
||||
USER superset
|
||||
|
||||
CMD ["/app/docker/entrypoints/run-server.sh"]
|
||||
```
|
||||
|
||||
## Key ARGs in Dockerfile
|
||||
|
||||
- `BUILD_TRANSLATIONS`: whether to build the translations into the image. For the
|
||||
frontend build this tells webpack to strip out all locales other than `en` from
|
||||
the `moment-timezone` library. For the backendthis skips compiling the
|
||||
`*.po` translation files
|
||||
- `DEV_MODE`: whether to skip the frontend build, this is used by our `docker-compose` dev setup
|
||||
where we mount the local volume and build using `webpack` in `--watch` mode, meaning as you
|
||||
alter the code in the local file system, webpack, from within a docker image used for this
|
||||
purpose, will constantly rebuild the frontend as you go. This ARG enables the initial
|
||||
`docker-compose` build to take much less time and resources
|
||||
- `INCLUDE_CHROMIUM`: whether to include chromium in the backend build so that it can be
|
||||
used as a headless browser for workloads related to "Alerts & Reports" and thumbnail generation
|
||||
- `INCLUDE_FIREFOX`: same as above, but for firefox
|
||||
- `PY_VER`: specifying the base image for the python backend, we don't recommend altering
|
||||
this setting if you're not working on forwards or backwards compatibility
|
||||
|
||||
## Caching
|
||||
|
||||
To accelerate builds, we follow Docker best practices and use `apache/superset-cache`.
|
||||
|
||||
## About database drivers
|
||||
|
||||
Our docker images come with little to zero database driver support since
|
||||
each environment requires different drivers, and maintaining a build with
|
||||
wide database support would be both challenging (dozens of databases,
|
||||
python drivers, and os dependencies) and inefficient (longer
|
||||
build times, larger images, lower layer cache hit rate, ...).
|
||||
|
||||
For production use cases, we recommend that you derive our `lean` image(s) and
|
||||
add database support for the database you need.
|
||||
|
||||
## On supporting different platforms (namely arm64 AND amd64)
|
||||
|
||||
Currently all automated builds are multi-platform, supporting both `linux/arm64`
|
||||
and `linux/amd64`. This enables higher level constructs like `helm` and
|
||||
`docker compose` to point to these images and effectively be multi-platform
|
||||
as well.
|
||||
|
||||
Pull requests and master builds
|
||||
are one-image-per-platform so that they can be parallelized and the
|
||||
build matrix for those is more sparse as we don't need to build every
|
||||
build preset on every platform, and generally can be more selective here.
|
||||
For those builds, we suffix tags with `-arm` where it applies.
|
||||
|
||||
### Working with Apple silicon
|
||||
|
||||
Apple's current generation of computers uses ARM-based CPUs, and Docker
|
||||
running on MACs seem to require `linux/arm64/v8` (at least one user's M2 was
|
||||
configured in that way). Setting the environment
|
||||
variable `DOCKER_DEFAULT_PLATFORM` to `linux/amd64` seems to function in
|
||||
term of leveraging, and building upon the Superset builds provided here.
|
||||
|
||||
```bash
|
||||
export DOCKER_DEFAULT_PLATFORM=linux/amd64
|
||||
```
|
||||
|
||||
Presumably, `linux/arm64/v8` would be more optimized for this generation
|
||||
of chips, but less compatible across the ARM ecosystem.
|
||||
286
docs/admin_docs/installation/docker-compose.mdx
Normal file
286
docs/admin_docs/installation/docker-compose.mdx
Normal file
@@ -0,0 +1,286 @@
|
||||
---
|
||||
title: Docker Compose
|
||||
hide_title: true
|
||||
sidebar_position: 5
|
||||
version: 1
|
||||
---
|
||||
|
||||
import useBaseUrl from "@docusaurus/useBaseUrl";
|
||||
|
||||
# Using Docker Compose
|
||||
|
||||
<img src={useBaseUrl("/img/docker-compose.webp" )} width="150" />
|
||||
<br /><br />
|
||||
|
||||
:::caution
|
||||
Since `docker compose` is primarily designed to run a set of containers on **a single host**
|
||||
and can't support requirements for **high availability**, we do not support nor recommend
|
||||
using our `docker compose` constructs to support production-type use-cases. For single host
|
||||
environments, we recommend using [minikube](https://minikube.sigs.k8s.io/docs/start/) along
|
||||
with our [installing on k8s](https://superset.apache.org/admin-docs/installation/running-on-kubernetes)
|
||||
documentation.
|
||||
:::
|
||||
|
||||
As mentioned in our [quickstart guide](/user-docs/quickstart), the fastest way to try
|
||||
Superset locally is using Docker Compose on a Linux or Mac OSX
|
||||
computer. Superset does not have official support for Windows. It's also the easiest
|
||||
way to launch a fully functioning **development environment** quickly.
|
||||
|
||||
Note that there are 4 major ways we support to run `docker compose`:
|
||||
|
||||
1. **docker-compose.yml:** for interactive development, where we mount your local folder with the
|
||||
frontend/backend files that you can edit and experience the changes you
|
||||
make in the app in real time
|
||||
1. **docker-compose-light.yml:** a lightweight configuration with minimal services (database,
|
||||
Superset app, and frontend dev server) for development. Uses in-memory caching instead of Redis
|
||||
and is designed for running multiple instances simultaneously
|
||||
1. **docker-compose-non-dev.yml** where we just build a more immutable image based on the
|
||||
local branch and get all the required images running. Changes in the local branch
|
||||
at the time you fire this up will be reflected, but changes to the code
|
||||
while `up` won't be reflected in the app
|
||||
1. **docker-compose-image-tag.yml** where we fetch an image from docker-hub say for the
|
||||
`5.0.0` release for instance, and fire it up so you can try it. Here what's in
|
||||
the local branch has no effects on what's running, we just fetch and run
|
||||
pre-built images from docker-hub. For `docker compose` to work along with the
|
||||
Postgres image it boots up, you'll want to point to a `-dev`-suffixed TAG, as in
|
||||
`export TAG=5.0.0-dev` or `export TAG=4.1.2-dev`, with `latest-dev` being the default.
|
||||
The `dev` builds include the `psycopg2-binary` required to connect
|
||||
to the Postgres database launched as part of the `docker compose` builds.
|
||||
|
||||
More on these approaches after setting up the requirements for either.
|
||||
|
||||
## Requirements
|
||||
|
||||
Note that this documentation assumes that you have [Docker](https://www.docker.com) and
|
||||
[git](https://git-scm.com/) installed. Note also that we used to use `docker-compose` but that
|
||||
is on the path to deprecation so we now use `docker compose` instead.
|
||||
|
||||
## 1. Clone Superset's GitHub repository
|
||||
|
||||
[Clone Superset's repo](https://github.com/apache/superset) in your terminal with the
|
||||
following command:
|
||||
|
||||
```bash
|
||||
git clone --depth=1 https://github.com/apache/superset.git
|
||||
```
|
||||
|
||||
Once that command completes successfully, you should see a new `superset` folder in your
|
||||
current directory.
|
||||
|
||||
## 2. Launch Superset Through Docker Compose
|
||||
|
||||
First let's assume you're familiar with `docker compose` mechanics. Here we'll refer generally
|
||||
to `docker compose up` even though in some cases you may want to force a check for newer remote
|
||||
images using `docker compose pull`, force a build with `docker compose build` or force a build
|
||||
on latest base images using `docker compose build --pull`. In most cases though, the simple
|
||||
`up` command should do just fine. Refer to docker compose docs for more information on the topic.
|
||||
|
||||
### Option #1 - for an interactive development environment
|
||||
|
||||
```bash
|
||||
# The --build argument insures all the layers are up-to-date
|
||||
docker compose up --build
|
||||
```
|
||||
|
||||
:::tip
|
||||
When running in development mode the `superset-node`
|
||||
container needs to finish building assets in order for the UI to render properly. If you would just
|
||||
like to try out Superset without making any code changes follow the steps documented for
|
||||
`production` or a specific version below.
|
||||
:::
|
||||
|
||||
:::tip
|
||||
By default, we mount the local superset-frontend folder here and run `npm install` as well
|
||||
as `npm run dev` which triggers webpack to compile/bundle the frontend code. Depending
|
||||
on your local setup, especially if you have less than 16GB of memory, it may be very slow to
|
||||
perform those operations. In this case, we recommend you set the env var
|
||||
`BUILD_SUPERSET_FRONTEND_IN_DOCKER` to `false`, and to run this locally instead in a terminal.
|
||||
Simply trigger `npm i && npm run dev`, this should be MUCH faster.
|
||||
:::
|
||||
|
||||
:::tip
|
||||
Sometimes, your npm-related state can get out-of-wack, running `npm run prune` from
|
||||
the `superset-frontend/` folder will nuke the various' packages `node_module/` folders
|
||||
and help you start fresh. In the context of `docker compose` setting
|
||||
`export NPM_RUN_PRUNE=true` prior to running `docker compose up` will trigger that
|
||||
from within docker. This will slow down the startup, but will fix various npm-related issues.
|
||||
:::
|
||||
|
||||
### Option #2 - lightweight development with multiple instances
|
||||
|
||||
For a lighter development setup that uses fewer resources and supports running multiple instances:
|
||||
|
||||
```bash
|
||||
# Single lightweight instance (default port 9001)
|
||||
docker compose -f docker-compose-light.yml up
|
||||
|
||||
# Multiple instances with different ports
|
||||
NODE_PORT=9001 docker compose -p superset-1 -f docker-compose-light.yml up
|
||||
NODE_PORT=9002 docker compose -p superset-2 -f docker-compose-light.yml up
|
||||
NODE_PORT=9003 docker compose -p superset-3 -f docker-compose-light.yml up
|
||||
```
|
||||
|
||||
This configuration includes:
|
||||
- PostgreSQL database (internal network only)
|
||||
- Superset application server
|
||||
- Frontend development server with webpack hot reloading
|
||||
- In-memory caching (no Redis)
|
||||
- Isolated volumes and networks per instance
|
||||
|
||||
Access each instance at `http://localhost:{NODE_PORT}` (e.g., `http://localhost:9001`).
|
||||
|
||||
### Option #3 - build a set of immutable images from the local branch
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose-non-dev.yml up
|
||||
```
|
||||
|
||||
### Option #4 - boot up an official release
|
||||
|
||||
```bash
|
||||
# Set the version you want to run
|
||||
export TAG=5.0.0
|
||||
# Fetch the tag you're about to check out (assuming you shallow-cloned the repo)
|
||||
git fetch --depth=1 origin tag $TAG
|
||||
# Could also fetch all tags too if you've got bandwidth to spare
|
||||
# git fetch --tags
|
||||
# Checkout the corresponding git ref
|
||||
git checkout $TAG
|
||||
# Fire up docker compose
|
||||
docker compose -f docker-compose-image-tag.yml up
|
||||
```
|
||||
|
||||
Here various release tags, github SHA, and latest `master` can be referenced by the TAG env var.
|
||||
Refer to the docker-related documentation to learn more about existing tags you can point to
|
||||
from Docker Hub.
|
||||
|
||||
:::note
|
||||
For option #2 and #3, we recommend checking out the release tag from the git repository
|
||||
(ie: `git checkout 5.0.0`) for more guaranteed results. This ensures that the `docker-compose.*.yml`
|
||||
configurations and that the mounted `docker/` scripts are in sync with the image you are
|
||||
looking to fire up.
|
||||
:::
|
||||
|
||||
## `docker compose` tips & configuration
|
||||
|
||||
:::caution
|
||||
All of the content belonging to a Superset instance - charts, dashboards, users, etc. - is stored in
|
||||
its metadata database. In production, this database should be backed up. The default installation
|
||||
with docker compose will store that data in a PostgreSQL database contained in a Docker
|
||||
[volume](https://docs.docker.com/storage/volumes/), which is not backed up.
|
||||
|
||||
Again, **THE DOCKER-COMPOSE INSTALLATION IS NOT PRODUCTION-READY OUT OF THE BOX.**
|
||||
|
||||
:::
|
||||
|
||||
You should see a stream of logging output from the containers being launched on your machine. Once
|
||||
this output slows, you should have a running instance of Superset on your local machine! To avoid
|
||||
the wall of text on future runs, add the `-d` option to the end of the `docker compose up` command.
|
||||
|
||||
### Configuring Further
|
||||
|
||||
The following is for users who want to configure how Superset runs in Docker Compose; otherwise, you
|
||||
can skip to the next section.
|
||||
|
||||
You can install additional python packages and apply config overrides by following the steps
|
||||
mentioned in [docker/README.md](https://github.com/apache/superset/tree/master/docker#configuration)
|
||||
|
||||
Note that `docker/.env` sets the default environment variables for all the docker images
|
||||
used by `docker compose`, and that `docker/.env-local` can be used to override those defaults.
|
||||
Also note that `docker/.env-local` is referenced in our `.gitignore`,
|
||||
preventing developers from risking committing potentially sensitive configuration to the repository.
|
||||
|
||||
One important variable is `SUPERSET_LOAD_EXAMPLES` which determines whether the `superset_init`
|
||||
container will populate example data and visualizations into the metadata database. These examples
|
||||
are helpful for learning and testing out Superset but unnecessary for experienced users and
|
||||
production deployments. The loading process can sometimes take a few minutes and a good amount of
|
||||
CPU, so you may want to disable it on a resource-constrained device.
|
||||
|
||||
For more advanced or dynamic configurations that are typically managed in a `superset_config.py` file
|
||||
located in your `PYTHONPATH`, note that it can be done by providing a
|
||||
`docker/pythonpath_dev/superset_config_docker.py` that will be ignored by git
|
||||
(preventing you to commit/push your local configuration back to the repository).
|
||||
The mechanics of this are in `docker/pythonpath_dev/superset_config.py` where you can see
|
||||
that the logic runs a `from superset_config_docker import *`
|
||||
|
||||
:::note
|
||||
Users often want to connect to other databases from Superset. Currently, the easiest way to
|
||||
do this is to modify the `docker-compose-non-dev.yml` file and add your database as a service that
|
||||
the other services depend on (via `x-superset-depends-on`). Others have attempted to set
|
||||
`network_mode: host` on the Superset services, but these generally break the installation,
|
||||
because the configuration requires use of the Docker Compose DNS resolver for the service names.
|
||||
If you have a good solution for this, let us know!
|
||||
:::
|
||||
|
||||
:::note
|
||||
Superset uses [Scarf Gateway](https://about.scarf.sh/scarf-gateway) to collect telemetry
|
||||
data. Knowing the installation counts for different Superset versions informs the project's
|
||||
decisions about patching and long-term support. Scarf purges personally identifiable information
|
||||
(PII) and provides only aggregated statistics.
|
||||
|
||||
To opt-out of this data collection for packages downloaded through the Scarf Gateway by your docker
|
||||
compose based installation, edit the `x-superset-image:` line in your `docker-compose.yml` and
|
||||
`docker-compose-non-dev.yml` files, replacing `apachesuperset.docker.scarf.sh/apache/superset` with
|
||||
`apache/superset` to pull the image directly from Docker Hub.
|
||||
|
||||
To disable the Scarf telemetry pixel, set the `SCARF_ANALYTICS` environment variable to `False` in
|
||||
your terminal and/or in your `docker/.env` file.
|
||||
:::
|
||||
|
||||
## 3. Log in to Superset
|
||||
|
||||
Your local Superset instance also includes a Postgres server to store your data and is already
|
||||
pre-loaded with some example datasets that ship with Superset. You can access Superset now via your
|
||||
web browser by visiting `http://localhost:8088`. Note that many browsers now default to `https` - if
|
||||
yours is one of them, please make sure it uses `http`.
|
||||
|
||||
Log in with the default username and password:
|
||||
|
||||
```bash
|
||||
username: admin
|
||||
```
|
||||
|
||||
```bash
|
||||
password: admin
|
||||
```
|
||||
|
||||
## 4. Connecting Superset to your local database instance
|
||||
|
||||
When running Superset using `docker` or `docker compose` it runs in its own docker container, as if
|
||||
the Superset was running in a separate machine entirely. Therefore attempts to connect to your local
|
||||
database with the hostname `localhost` won't work as `localhost` refers to the docker container
|
||||
Superset is running in, and not your actual host machine. Fortunately, docker provides an easy way
|
||||
to access network resources in the host machine from inside a container, and we will leverage this
|
||||
capability to connect to our local database instance.
|
||||
|
||||
Here the instructions are for connecting to postgresql (which is running on your host machine) from
|
||||
Superset (which is running in its docker container). Other databases may have slightly different
|
||||
configurations but gist would be same and boils down to 2 steps -
|
||||
|
||||
1. **(Mac users may skip this step)** Configuring the local postgresql/database instance to accept
|
||||
public incoming connections. By default, postgresql only allows incoming connections from
|
||||
`localhost` and under Docker, unless you use `--network=host`, `localhost` will refer to different
|
||||
endpoints on the host machine and in a docker container respectively. Allowing postgresql to accept
|
||||
connections from the Docker involves making one-line changes to the files `postgresql.conf` and
|
||||
`pg_hba.conf`; you can find helpful links tailored to your OS / PG version on the web easily for
|
||||
this task. For Docker it suffices to only whitelist IPs `172.0.0.0/8` instead of `*`, but in any
|
||||
case you are _warned_ that doing this in a production database _may_ have disastrous consequences as
|
||||
you are opening your database to the public internet.
|
||||
1. Instead of `localhost`, try using `host.docker.internal` (Mac users, Ubuntu) or `172.18.0.1`
|
||||
(Linux users) as the hostname when attempting to connect to the database. This is a Docker internal
|
||||
detail -- what is happening is that, in Mac systems, Docker Desktop creates a dns entry for the
|
||||
hostname `host.docker.internal` which resolves to the correct address for the host machine, whereas
|
||||
in Linux this is not the case (at least by default). If neither of these 2 hostnames work then you
|
||||
may want to find the exact hostname you want to use, for that you can do `ifconfig` or
|
||||
`ip addr show` and look at the IP address of `docker0` interface that must have been created by
|
||||
Docker for you. Alternately if you don't even see the `docker0` interface try (if needed with sudo)
|
||||
`docker network inspect bridge` and see if there is an entry for `"Gateway"` and note the IP
|
||||
address.
|
||||
|
||||
## 4. To build or not to build
|
||||
|
||||
When running `docker compose up`, docker will build what is required behind the scene, but
|
||||
may use the docker cache if assets already exist. Running `docker compose build` prior to
|
||||
`docker compose up` or the equivalent shortcut `docker compose up --build` ensures that your
|
||||
docker images match the definition in the repository. This should only apply to the main
|
||||
docker-compose.yml file (default) and not to the alternative methods defined above.
|
||||
56
docs/admin_docs/installation/installation-methods.mdx
Normal file
56
docs/admin_docs/installation/installation-methods.mdx
Normal file
@@ -0,0 +1,56 @@
|
||||
---
|
||||
title: Installation Methods
|
||||
hide_title: true
|
||||
sidebar_position: 2
|
||||
version: 1
|
||||
---
|
||||
|
||||
import useBaseUrl from "@docusaurus/useBaseUrl";
|
||||
|
||||
# Installation Methods
|
||||
|
||||
How should you install Superset? Here's a comparison of the different options. It will help if you've first read the [Architecture](/admin-docs/installation/architecture) page to understand Superset's different components.
|
||||
|
||||
The fundamental trade-off is between you needing to do more of the detail work yourself vs. using a more complex deployment route that handles those details.
|
||||
|
||||
## [Docker Compose](/admin-docs/installation/docker-compose)
|
||||
|
||||
**Summary:** This takes advantage of containerization while remaining simpler than Kubernetes. This is the best way to try out Superset; it's also useful for developing & contributing back to Superset.
|
||||
|
||||
If you're not just demoing the software, you'll need a moderate understanding of Docker to customize your deployment and avoid a few risks. Even when fully-optimized this is not as robust a method as Kubernetes when it comes to large-scale production deployments.
|
||||
|
||||
You manage a superset-config.py file and a docker-compose.yml file. Docker Compose brings up all the needed services - the Superset application, a Postgres metadata DB, Redis cache, Celery worker and beat. They are automatically connected to each other.
|
||||
|
||||
**Responsibilities**
|
||||
|
||||
You will need to back up your metadata DB. That could mean backing up the service running as a Docker container and its volume; ideally you are running Postgres as a service outside of that container and backing up that service.
|
||||
|
||||
You will also need to extend the Superset docker image. The default `lean` images do not contain drivers needed to access your metadata database (Postgres or MySQL), nor to access your data warehouse, nor the headless browser needed for Alerts & Reports. You could run a `-dev` image while demoing Superset, which has some of this, but you'll still need to install the driver for your data warehouse. The `-dev` images run as root, which is not recommended for production.
|
||||
|
||||
Ideally you will build your own image of Superset that extends `lean`, adding what your deployment needs. See [Building your own production Docker image](/admin-docs/installation/docker-builds/#building-your-own-production-docker-image).
|
||||
|
||||
## [Kubernetes (K8s)](/admin-docs/installation/kubernetes)
|
||||
|
||||
**Summary:** This is the best-practice way to deploy a production instance of Superset, but has the steepest skill requirement - someone who knows Kubernetes.
|
||||
|
||||
You will deploy Superset into a K8s cluster. The most common method is using the community-maintained Helm chart, though work is now underway to implement [SIP-149 - a Kubernetes Operator for Superset](https://github.com/apache/superset/issues/31408).
|
||||
|
||||
A K8s deployment can scale up and down based on usage and deploy rolling updates with zero downtime - features that big deployments appreciate.
|
||||
|
||||
**Responsibilities**
|
||||
|
||||
You will need to build your own Docker image, and back up your metadata DB, both as described in Docker Compose above. You'll also need to customize your Helm chart values and deploy and maintain your Kubernetes cluster.
|
||||
|
||||
## [PyPI (Python)](/admin-docs/installation/pypi)
|
||||
|
||||
**Summary:** This is the only method that requires no knowledge of containers. It requires the most hands-on work to deploy, connect, and maintain each component.
|
||||
|
||||
You install Superset as a Python package and run it that way, providing your own metadata database. Superset has documentation on how to install this way, but it is updated infrequently.
|
||||
|
||||
If you want caching, you'll set up Redis or RabbitMQ. If you want Alerts & Reports, you'll set up Celery.
|
||||
|
||||
**Responsibilities**
|
||||
|
||||
You will need to get the component services running and communicating with each other. You'll need to arrange backups of your metadata database.
|
||||
|
||||
When upgrading, you'll need to manage the system environment and packages and ensure all components have functional dependencies.
|
||||
451
docs/admin_docs/installation/kubernetes.mdx
Normal file
451
docs/admin_docs/installation/kubernetes.mdx
Normal file
@@ -0,0 +1,451 @@
|
||||
---
|
||||
title: Kubernetes
|
||||
hide_title: true
|
||||
sidebar_position: 3
|
||||
version: 1
|
||||
---
|
||||
|
||||
import useBaseUrl from "@docusaurus/useBaseUrl";
|
||||
|
||||
# Installing on Kubernetes
|
||||
|
||||
<img src={useBaseUrl("/img/k8s.png" )} width="150" />
|
||||
<br /><br />
|
||||
|
||||
Running Superset on Kubernetes is supported with the provided [Helm](https://helm.sh/) chart
|
||||
found in the official [Superset helm repository](https://apache.github.io/superset/index.yaml).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- A Kubernetes cluster
|
||||
- Helm installed
|
||||
|
||||
:::note
|
||||
For simpler, single host environments, we recommend using
|
||||
[minikube](https://minikube.sigs.k8s.io/docs/start/) which is easy to setup on many platforms
|
||||
and works fantastically well with the Helm chart referenced here.
|
||||
:::
|
||||
|
||||
## Running
|
||||
|
||||
1. Add the Superset helm repository
|
||||
|
||||
```sh
|
||||
helm repo add superset https://apache.github.io/superset
|
||||
"superset" has been added to your repositories
|
||||
```
|
||||
|
||||
2. View charts in repo
|
||||
|
||||
```sh
|
||||
helm search repo superset
|
||||
NAME CHART VERSION APP VERSION DESCRIPTION
|
||||
superset/superset 0.1.1 1.0 Apache Superset is a modern, enterprise-ready b...
|
||||
```
|
||||
|
||||
3. Configure your setting overrides
|
||||
|
||||
Just like any typical Helm chart, you'll need to craft a `values.yaml` file that would define/override any of the values exposed into the default [values.yaml](https://github.com/apache/superset/tree/master/helm/superset/values.yaml), or from any of the dependent charts it depends on:
|
||||
|
||||
- [bitnami/redis](https://artifacthub.io/packages/helm/bitnami/redis)
|
||||
- [bitnami/postgresql](https://artifacthub.io/packages/helm/bitnami/postgresql)
|
||||
|
||||
More info down below on some important overrides you might need.
|
||||
|
||||
4. Install and run
|
||||
|
||||
```sh
|
||||
helm upgrade --install --values my-values.yaml superset superset/superset
|
||||
```
|
||||
|
||||
You should see various pods popping up, such as:
|
||||
|
||||
```sh
|
||||
kubectl get pods
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
superset-celerybeat-7cdcc9575f-k6xmc 1/1 Running 0 119s
|
||||
superset-f5c9c667-dw9lp 1/1 Running 0 4m7s
|
||||
superset-f5c9c667-fk8bk 1/1 Running 0 4m11s
|
||||
superset-init-db-zlm9z 0/1 Completed 0 111s
|
||||
superset-postgresql-0 1/1 Running 0 6d20h
|
||||
superset-redis-master-0 1/1 Running 0 6d20h
|
||||
superset-worker-75b48bbcc-jmmjr 1/1 Running 0 4m8s
|
||||
superset-worker-75b48bbcc-qrq49 1/1 Running 0 4m12s
|
||||
```
|
||||
|
||||
The exact list will depend on some of your specific configuration overrides but you should generally expect:
|
||||
|
||||
- N `superset-xxxx-yyyy` and `superset-worker-xxxx-yyyy` pods (depending on your `supersetNode.replicaCount` and `supersetWorker.replicaCount` values)
|
||||
- 1 `superset-postgresql-0` depending on your postgres settings
|
||||
- 1 `superset-redis-master-0` depending on your redis settings
|
||||
- 1 `superset-celerybeat-xxxx-yyyy` pod if you have `supersetCeleryBeat.enabled = true` in your values overrides
|
||||
|
||||
1. Access it
|
||||
|
||||
The chart will publish appropriate services to expose the Superset UI internally within your k8s cluster. To access it externally you will have to either:
|
||||
|
||||
- Configure the Service as a `LoadBalancer` or `NodePort`
|
||||
- Set up an `Ingress` for it - the chart includes a definition, but will need to be tuned to your needs (hostname, tls, annotations etc...)
|
||||
- Run `kubectl port-forward superset-xxxx-yyyy :8088` to directly tunnel one pod's port into your localhost
|
||||
|
||||
Depending how you configured external access, the URL will vary. Once you've identified the appropriate URL you can log in with:
|
||||
|
||||
- user: `admin`
|
||||
- password: `admin`
|
||||
|
||||
## Important settings
|
||||
|
||||
### Security settings
|
||||
|
||||
Default security settings and passwords are included but you **MUST** update them to run `prod` instances, in particular:
|
||||
|
||||
```yaml
|
||||
postgresql:
|
||||
postgresqlPassword: superset
|
||||
```
|
||||
|
||||
Make sure, you set a unique strong complex alphanumeric string for your SECRET_KEY and use a tool to help you generate
|
||||
a sufficiently random sequence.
|
||||
|
||||
- To generate a good key you can run, `openssl rand -base64 42`
|
||||
|
||||
```yaml
|
||||
configOverrides:
|
||||
secret: |
|
||||
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
|
||||
```
|
||||
|
||||
If you want to change the previous secret key then you should rotate the keys.
|
||||
Default secret key for kubernetes deployment is `thisISaSECRET_1234`
|
||||
|
||||
```yaml
|
||||
configOverrides:
|
||||
my_override: |
|
||||
PREVIOUS_SECRET_KEY = 'YOUR_PREVIOUS_SECRET_KEY'
|
||||
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
|
||||
init:
|
||||
command:
|
||||
- /bin/sh
|
||||
- -c
|
||||
- |
|
||||
. {{ .Values.configMountPath }}/superset_bootstrap.sh
|
||||
superset re-encrypt-secrets
|
||||
. {{ .Values.configMountPath }}/superset_init.sh
|
||||
```
|
||||
|
||||
:::note
|
||||
Superset uses [Scarf Gateway](https://about.scarf.sh/scarf-gateway) to collect telemetry data. Knowing the installation counts for different Superset versions informs the project's decisions about patching and long-term support. Scarf purges personally identifiable information (PII) and provides only aggregated statistics.
|
||||
|
||||
To opt-out of this data collection in your Helm-based installation, edit the `repository:` line in your `helm/superset/values.yaml` file, replacing `apachesuperset.docker.scarf.sh/apache/superset` with `apache/superset` to pull the image directly from Docker Hub.
|
||||
:::
|
||||
|
||||
### Dependencies
|
||||
|
||||
Install additional packages and do any other bootstrap configuration in the bootstrap script.
|
||||
For production clusters it's recommended to build own image with this step done in CI.
|
||||
|
||||
:::note
|
||||
|
||||
Superset requires a Python DB-API database driver and a SQLAlchemy
|
||||
dialect to be installed for each datastore you want to connect to.
|
||||
|
||||
See [Install Database Drivers](/admin-docs/databases#installing-database-drivers) for more information.
|
||||
It is recommended that you refer to versions listed in
|
||||
[pyproject.toml](https://github.com/apache/superset/blob/master/pyproject.toml)
|
||||
instead of hard-coding them in your bootstrap script, as seen below.
|
||||
|
||||
:::
|
||||
|
||||
The following example installs the drivers for BigQuery and Elasticsearch, allowing you to connect to these data sources within your Superset setup:
|
||||
|
||||
```yaml
|
||||
bootstrapScript: |
|
||||
#!/bin/bash
|
||||
uv pip install .[postgres] \
|
||||
.[bigquery] \
|
||||
.[elasticsearch] &&\
|
||||
if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi
|
||||
```
|
||||
|
||||
### superset_config.py
|
||||
|
||||
The default `superset_config.py` is fairly minimal and you will very likely need to extend it. This is done by specifying one or more key/value entries in `configOverrides`, e.g.:
|
||||
|
||||
```yaml
|
||||
configOverrides:
|
||||
my_override: |
|
||||
# This will make sure the redirect_uri is properly computed, even with SSL offloading
|
||||
ENABLE_PROXY_FIX = True
|
||||
FEATURE_FLAGS = {
|
||||
"DYNAMIC_PLUGINS": True
|
||||
}
|
||||
```
|
||||
|
||||
Those will be evaluated as Helm templates and therefore will be able to reference other `values.yaml` variables e.g. `{{ .Values.ingress.hosts[0] }}` will resolve to your ingress external domain.
|
||||
|
||||
The entire `superset_config.py` will be installed as a secret, so it is safe to pass sensitive parameters directly... however it might be more readable to use secret env variables for that.
|
||||
|
||||
Full python files can be provided by running `helm upgrade --install --values my-values.yaml --set-file configOverrides.oauth=set_oauth.py`
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Those can be passed as key/values either with `extraEnv` or `extraSecretEnv` if they're sensitive. They can then be referenced from `superset_config.py` using e.g. `os.environ.get("VAR")`.
|
||||
|
||||
```yaml
|
||||
extraEnv:
|
||||
SMTP_HOST: smtp.gmail.com
|
||||
SMTP_USER: user@gmail.com
|
||||
SMTP_PORT: "587"
|
||||
SMTP_MAIL_FROM: user@gmail.com
|
||||
|
||||
extraSecretEnv:
|
||||
SMTP_PASSWORD: xxxx
|
||||
|
||||
configOverrides:
|
||||
smtp: |
|
||||
import ast
|
||||
SMTP_HOST = os.getenv("SMTP_HOST","localhost")
|
||||
SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True"))
|
||||
SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False"))
|
||||
SMTP_USER = os.getenv("SMTP_USER","superset")
|
||||
SMTP_PORT = os.getenv("SMTP_PORT",25)
|
||||
SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset")
|
||||
```
|
||||
|
||||
### System packages
|
||||
|
||||
If new system packages are required, they can be installed before application startup by overriding the container's `command`, e.g.:
|
||||
|
||||
```yaml
|
||||
supersetWorker:
|
||||
command:
|
||||
- /bin/sh
|
||||
- -c
|
||||
- |
|
||||
apt update
|
||||
apt install -y somepackage
|
||||
apt autoremove -yqq --purge
|
||||
apt clean
|
||||
|
||||
# Run celery worker
|
||||
. {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker
|
||||
```
|
||||
|
||||
### Data sources
|
||||
|
||||
Data source definitions can be automatically declared by providing key/value yaml definitions in `extraConfigs`:
|
||||
|
||||
```yaml
|
||||
extraConfigs:
|
||||
import_datasources.yaml: |
|
||||
databases:
|
||||
- allow_file_upload: true
|
||||
allow_ctas: true
|
||||
allow_cvas: true
|
||||
database_name: example-db
|
||||
extra: "{\r\n \"metadata_params\": {},\r\n \"engine_params\": {},\r\n \"\
|
||||
metadata_cache_timeout\": {},\r\n \"schemas_allowed_for_file_upload\": []\r\n\
|
||||
}"
|
||||
sqlalchemy_uri: example://example-db.local
|
||||
tables: []
|
||||
```
|
||||
|
||||
Those will also be mounted as secrets and can include sensitive parameters.
|
||||
|
||||
## Configuration Examples
|
||||
|
||||
### Setting up OAuth
|
||||
|
||||
:::note
|
||||
|
||||
OAuth setup requires that the [authlib](https://authlib.org/) Python library is installed. This can
|
||||
be done using `pip` by updating the `bootstrapScript`. See the [Dependencies](#dependencies) section
|
||||
for more information.
|
||||
|
||||
:::
|
||||
|
||||
```yaml
|
||||
extraEnv:
|
||||
AUTH_DOMAIN: example.com
|
||||
|
||||
extraSecretEnv:
|
||||
GOOGLE_KEY: xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com
|
||||
GOOGLE_SECRET: xxxxxxxxxxxxxxxxxxxxxxxx
|
||||
|
||||
configOverrides:
|
||||
enable_oauth: |
|
||||
# This will make sure the redirect_uri is properly computed, even with SSL offloading
|
||||
ENABLE_PROXY_FIX = True
|
||||
|
||||
from flask_appbuilder.security.manager import AUTH_OAUTH
|
||||
AUTH_TYPE = AUTH_OAUTH
|
||||
OAUTH_PROVIDERS = [
|
||||
{
|
||||
"name": "google",
|
||||
"icon": "fa-google",
|
||||
"token_key": "access_token",
|
||||
"remote_app": {
|
||||
"client_id": os.getenv("GOOGLE_KEY"),
|
||||
"client_secret": os.getenv("GOOGLE_SECRET"),
|
||||
"api_base_url": "https://www.googleapis.com/oauth2/v2/",
|
||||
"client_kwargs": {"scope": "email profile"},
|
||||
"request_token_url": None,
|
||||
"access_token_url": "https://accounts.google.com/o/oauth2/token",
|
||||
"authorize_url": "https://accounts.google.com/o/oauth2/auth",
|
||||
"authorize_params": {"hd": os.getenv("AUTH_DOMAIN", "")}
|
||||
},
|
||||
}
|
||||
]
|
||||
|
||||
# Map Authlib roles to superset roles
|
||||
AUTH_ROLE_ADMIN = 'Admin'
|
||||
AUTH_ROLE_PUBLIC = 'Public'
|
||||
|
||||
# Will allow user self registration, allowing to create Flask users from Authorized User
|
||||
AUTH_USER_REGISTRATION = True
|
||||
|
||||
# The default user self registration role
|
||||
AUTH_USER_REGISTRATION_ROLE = "Admin"
|
||||
```
|
||||
|
||||
### Enable Alerts and Reports
|
||||
|
||||
For this, as per the [Alerts and Reports doc](/admin-docs/configuration/alerts-reports), you will need to:
|
||||
|
||||
#### Install a supported webdriver in the Celery worker
|
||||
|
||||
This is done either by using a custom image that has the webdriver pre-installed, or installing at startup time by overriding the `command`. Here's a working example for `chromedriver`:
|
||||
|
||||
```yaml
|
||||
supersetWorker:
|
||||
command:
|
||||
- /bin/sh
|
||||
- -c
|
||||
- |
|
||||
# Install chrome webdriver
|
||||
# See https://github.com/apache/superset/blob/4fa3b6c7185629b87c27fc2c0e5435d458f7b73d/docs/src/pages/admin-docs/installation/email_reports.mdx
|
||||
apt-get update
|
||||
apt-get install -y wget
|
||||
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
|
||||
apt-get install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb
|
||||
wget https://chromedriver.storage.googleapis.com/88.0.4324.96/chromedriver_linux64.zip
|
||||
apt-get install -y zip
|
||||
unzip chromedriver_linux64.zip
|
||||
chmod +x chromedriver
|
||||
mv chromedriver /usr/bin
|
||||
apt-get autoremove -yqq --purge
|
||||
apt-get clean
|
||||
rm -f google-chrome-stable_current_amd64.deb chromedriver_linux64.zip
|
||||
|
||||
# Run
|
||||
. {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker
|
||||
```
|
||||
|
||||
#### Run the Celery beat
|
||||
|
||||
This pod will trigger the scheduled tasks configured in the alerts and reports UI section:
|
||||
|
||||
```yaml
|
||||
supersetCeleryBeat:
|
||||
enabled: true
|
||||
```
|
||||
|
||||
#### Configure the appropriate Celery jobs and SMTP/Slack settings
|
||||
|
||||
```yaml
|
||||
extraEnv:
|
||||
SMTP_HOST: smtp.gmail.com
|
||||
SMTP_USER: user@gmail.com
|
||||
SMTP_PORT: "587"
|
||||
SMTP_MAIL_FROM: user@gmail.com
|
||||
|
||||
extraSecretEnv:
|
||||
SLACK_API_TOKEN: xoxb-xxxx-yyyy
|
||||
SMTP_PASSWORD: xxxx-yyyy
|
||||
|
||||
configOverrides:
|
||||
feature_flags: |
|
||||
import ast
|
||||
|
||||
FEATURE_FLAGS = {
|
||||
"ALERT_REPORTS": True
|
||||
}
|
||||
|
||||
SMTP_HOST = os.getenv("SMTP_HOST","localhost")
|
||||
SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True"))
|
||||
SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False"))
|
||||
SMTP_USER = os.getenv("SMTP_USER","superset")
|
||||
SMTP_PORT = os.getenv("SMTP_PORT",25)
|
||||
SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset")
|
||||
SMTP_MAIL_FROM = os.getenv("SMTP_MAIL_FROM","superset@superset.com")
|
||||
|
||||
SLACK_API_TOKEN = os.getenv("SLACK_API_TOKEN",None)
|
||||
celery_conf: |
|
||||
from celery.schedules import crontab
|
||||
|
||||
class CeleryConfig:
|
||||
broker_url = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
|
||||
imports = (
|
||||
"superset.sql_lab",
|
||||
"superset.tasks.cache",
|
||||
"superset.tasks.scheduler",
|
||||
)
|
||||
result_backend = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
|
||||
task_annotations = {
|
||||
"sql_lab.get_sql_results": {
|
||||
"rate_limit": "100/s",
|
||||
},
|
||||
}
|
||||
beat_schedule = {
|
||||
"reports.scheduler": {
|
||||
"task": "reports.scheduler",
|
||||
"schedule": crontab(minute="*", hour="*"),
|
||||
},
|
||||
"reports.prune_log": {
|
||||
"task": "reports.prune_log",
|
||||
'schedule': crontab(minute=0, hour=0),
|
||||
},
|
||||
'cache-warmup-hourly': {
|
||||
"task": "cache-warmup",
|
||||
"schedule": crontab(minute="*/30", hour="*"),
|
||||
"kwargs": {
|
||||
"strategy_name": "top_n_dashboards",
|
||||
"top_n": 10,
|
||||
"since": "7 days ago",
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
CELERY_CONFIG = CeleryConfig
|
||||
reports: |
|
||||
EMAIL_PAGE_RENDER_WAIT = 60
|
||||
WEBDRIVER_BASEURL = "http://{{ template "superset.fullname" . }}:{{ .Values.service.port }}/"
|
||||
WEBDRIVER_BASEURL_USER_FRIENDLY = "https://www.example.com/"
|
||||
WEBDRIVER_TYPE= "chrome"
|
||||
WEBDRIVER_OPTION_ARGS = [
|
||||
"--force-device-scale-factor=2.0",
|
||||
"--high-dpi-support=2.0",
|
||||
"--headless",
|
||||
"--disable-gpu",
|
||||
"--disable-dev-shm-usage",
|
||||
# This is required because our process runs as root (in order to install pip packages)
|
||||
"--no-sandbox",
|
||||
"--disable-setuid-sandbox",
|
||||
"--disable-extensions",
|
||||
]
|
||||
```
|
||||
|
||||
### Load the Examples data and dashboards
|
||||
|
||||
If you are trying Superset out and want some data and dashboards to explore, you can load some examples by creating a `my_values.yaml` and deploying it as described above in the **Configure your setting overrides** step of the **Running** section.
|
||||
To load the examples, add the following to the `my_values.yaml` file:
|
||||
|
||||
```yaml
|
||||
init:
|
||||
loadExamples: true
|
||||
```
|
||||
|
||||
:::resources
|
||||
- [Tutorial: Mastering Data Visualization — Installing Superset on Kubernetes with Helm Chart](https://mahira-technology.medium.com/mastering-data-visualization-installing-superset-on-kubernetes-cluster-using-helm-chart-e4ec99199e1e)
|
||||
- [Tutorial: Installing Apache Superset in Kubernetes](https://aws.plainenglish.io/installing-apache-superset-in-kubernetes-1aec192ac495)
|
||||
:::
|
||||
164
docs/admin_docs/installation/pypi.mdx
Normal file
164
docs/admin_docs/installation/pypi.mdx
Normal file
@@ -0,0 +1,164 @@
|
||||
---
|
||||
title: PyPI
|
||||
hide_title: true
|
||||
sidebar_position: 4
|
||||
version: 1
|
||||
---
|
||||
|
||||
import useBaseUrl from "@docusaurus/useBaseUrl";
|
||||
|
||||
# Installing Superset from PyPI
|
||||
|
||||
<img src={useBaseUrl("/img/pypi.png" )} width="150" />
|
||||
<br /><br />
|
||||
|
||||
This page describes how to install Superset using the `apache_superset` package [published on PyPI](https://pypi.org/project/apache_superset/).
|
||||
|
||||
## OS Dependencies
|
||||
|
||||
Superset stores database connection information in its metadata database. For that purpose, we use
|
||||
the cryptography Python library to encrypt connection passwords. Unfortunately, this library has OS
|
||||
level dependencies.
|
||||
|
||||
**Debian and Ubuntu**
|
||||
|
||||
Ubuntu **24.04** uses python 3.12 per default, which currently is not supported by Superset. You need to add a second python installation of 3.11 and install the required additional dependencies.
|
||||
```bash
|
||||
sudo add-apt-repository ppa:deadsnakes/ppa
|
||||
sudo apt update
|
||||
sudo apt install python3.11 python3.11-dev python3.11-venv build-essential libssl-dev libffi-dev libsasl2-dev libldap2-dev default-libmysqlclient-dev
|
||||
```
|
||||
|
||||
In Ubuntu **20.04 and 22.04** the following command will ensure that the required dependencies are installed:
|
||||
|
||||
```bash
|
||||
sudo apt-get install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev default-libmysqlclient-dev
|
||||
```
|
||||
|
||||
In Ubuntu **before 20.04** the following command will ensure that the required dependencies are installed:
|
||||
|
||||
```bash
|
||||
sudo apt-get install build-essential libssl-dev libffi-dev python-dev python-pip libsasl2-dev libldap2-dev default-libmysqlclient-dev
|
||||
```
|
||||
|
||||
**Fedora and RHEL-derivative Linux distributions**
|
||||
|
||||
Install the following packages using the `yum` package manager:
|
||||
|
||||
```bash
|
||||
sudo yum install gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel cyrus-sasl-devel openldap-devel
|
||||
```
|
||||
|
||||
In more recent versions of CentOS and Fedora, you may need to install a slightly different set of packages using `dnf`:
|
||||
|
||||
```bash
|
||||
sudo dnf install gcc gcc-c++ libffi-devel python3-devel python3-pip python3-wheel openssl-devel cyrus-sasl-devel openldap-devel
|
||||
```
|
||||
|
||||
Also, on CentOS, you may need to upgrade pip for the install to work:
|
||||
|
||||
```bash
|
||||
pip3 install --upgrade pip
|
||||
```
|
||||
|
||||
**Mac OS X**
|
||||
|
||||
If you're not on the latest version of OS X, we recommend upgrading because we've found that many
|
||||
issues people have run into are linked to older versions of Mac OS X. After updating, install the
|
||||
latest version of XCode command line tools:
|
||||
|
||||
```bash
|
||||
xcode-select --install
|
||||
```
|
||||
|
||||
We don't recommend using the system installed Python. Instead, first install the
|
||||
[homebrew](https://brew.sh/) manager and then run the following commands:
|
||||
|
||||
```bash
|
||||
brew install readline pkg-config libffi openssl mysql postgresql@14
|
||||
```
|
||||
|
||||
You should install a recent version of Python. Refer to the
|
||||
[pyproject.toml](https://github.com/apache/superset/blob/master/pyproject.toml) file for a list of Python
|
||||
versions officially supported by Superset. We'd recommend using a Python version manager
|
||||
like [pyenv](https://github.com/pyenv/pyenv)
|
||||
(and also [pyenv-virtualenv](https://github.com/pyenv/pyenv-virtualenv)).
|
||||
|
||||
Let's also make sure we have the latest version of `pip` and `setuptools`:
|
||||
|
||||
```bash
|
||||
pip install --upgrade setuptools pip
|
||||
```
|
||||
|
||||
Lastly, you may need to set LDFLAGS and CFLAGS for certain Python packages to properly build. You can export these variables with:
|
||||
|
||||
```bash
|
||||
export LDFLAGS="-L$(brew --prefix openssl)/lib"
|
||||
export CFLAGS="-I$(brew --prefix openssl)/include"
|
||||
```
|
||||
|
||||
These will now be available when pip installing requirements.
|
||||
|
||||
## Python Virtual Environment
|
||||
|
||||
We highly recommend installing Superset inside of a virtual environment.
|
||||
|
||||
You can create and activate a virtual environment using the following commands. Ensure you are using a compatible version of python. You might have to explicitly use for example `python3.11` instead of `python3`.
|
||||
|
||||
```bash
|
||||
# virtualenv is shipped in Python 3.6+ as venv instead of pyvenv.
|
||||
# See https://docs.python.org/3.6/library/venv.html
|
||||
python3 -m venv venv
|
||||
. venv/bin/activate
|
||||
```
|
||||
|
||||
Or with pyenv-virtualenv:
|
||||
|
||||
```bash
|
||||
# Here we name the virtual env 'superset'
|
||||
pyenv virtualenv superset
|
||||
pyenv activate superset
|
||||
```
|
||||
|
||||
Once you activated your virtual environment, all of the Python packages you install or uninstall
|
||||
will be confined to this environment. You can exit the environment by running `deactivate` on the
|
||||
command line.
|
||||
|
||||
### Installing and Initializing Superset
|
||||
|
||||
First, start by installing `apache_superset`:
|
||||
|
||||
```bash
|
||||
pip install apache_superset
|
||||
```
|
||||
|
||||
Then, define mandatory configurations, SECRET_KEY and FLASK_APP:
|
||||
```bash
|
||||
export SUPERSET_SECRET_KEY=YOUR-SECRET-KEY # For production use, make sure this is a strong key, for example generated using `openssl rand -base64 42`. See https://superset.apache.org/admin-docs/configuration/configuring-superset#specifying-a-secret_key
|
||||
export FLASK_APP=superset
|
||||
```
|
||||
|
||||
Then, you need to initialize the database:
|
||||
|
||||
```bash
|
||||
superset db upgrade
|
||||
```
|
||||
|
||||
Finish installing by running through the following commands:
|
||||
|
||||
```bash
|
||||
# Create an admin user in your metadata database (use `admin` as username to be able to load the examples)
|
||||
superset fab create-admin
|
||||
|
||||
# Load some data to play with
|
||||
superset load_examples
|
||||
|
||||
# Create default roles and permissions
|
||||
superset init
|
||||
|
||||
# To start a development web server on port 8088, use -p to bind to another port
|
||||
superset run -p 8088 --with-threads --reload --debugger
|
||||
```
|
||||
|
||||
If everything worked, you should be able to navigate to `hostname:port` in your browser (e.g.
|
||||
locally by default at `localhost:8088`) and login using the username and password you created.
|
||||
61
docs/admin_docs/installation/upgrading-superset.mdx
Normal file
61
docs/admin_docs/installation/upgrading-superset.mdx
Normal file
@@ -0,0 +1,61 @@
|
||||
---
|
||||
title: Upgrading Superset
|
||||
hide_title: true
|
||||
sidebar_position: 6
|
||||
version: 1
|
||||
---
|
||||
|
||||
# Upgrading Superset
|
||||
|
||||
## Docker Compose
|
||||
|
||||
First, make sure to shut down the running containers in Docker Compose:
|
||||
|
||||
```bash
|
||||
docker compose down
|
||||
```
|
||||
|
||||
Next, update the folder that mirrors the `superset` repo through git:
|
||||
|
||||
```bash
|
||||
git pull origin master
|
||||
```
|
||||
|
||||
Then, restart the containers and any changed Docker images will be automatically pulled down:
|
||||
|
||||
```bash
|
||||
docker compose up
|
||||
```
|
||||
|
||||
## Updating Superset Manually
|
||||
|
||||
To upgrade superset in a native installation, run the following commands:
|
||||
|
||||
```bash
|
||||
pip install apache_superset --upgrade
|
||||
```
|
||||
|
||||
## Upgrading the Metadata Database
|
||||
|
||||
Migrate the metadata database by running:
|
||||
|
||||
```bash
|
||||
superset db upgrade
|
||||
superset init
|
||||
```
|
||||
|
||||
While upgrading superset should not delete your charts and dashboards, we recommend following best
|
||||
practices and to backup your metadata database before upgrading. Before upgrading production, we
|
||||
recommend upgrading in a staging environment and upgrading production finally during off-peak usage.
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
For a detailed list of breaking changes and migration notes for each version, see
|
||||
[UPDATING.md](https://github.com/apache/superset/blob/master/UPDATING.md).
|
||||
|
||||
This file documents backwards-incompatible changes and provides guidance for migrating between
|
||||
major versions, including:
|
||||
- Configuration changes
|
||||
- API changes
|
||||
- Database migrations
|
||||
- Deprecated features
|
||||
Reference in New Issue
Block a user