docs: bifurcate documentation into user and admin sections (#38196)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Evan Rusackas
2026-02-26 16:29:08 -05:00
committed by GitHub
parent 8a053bbe07
commit 6589ee48f9
171 changed files with 10899 additions and 2866 deletions

View File

@@ -0,0 +1,72 @@
---
title: Architecture
hide_title: true
sidebar_position: 1
version: 1
---
import useBaseUrl from "@docusaurus/useBaseUrl";
# Architecture
This page is meant to give new administrators an understanding of Superset's components.
## Components
A Superset installation is made up of these components:
1. The Superset application itself
2. A metadata database
3. A caching layer (optional, but necessary for some features)
4. A worker & beat (optional, but necessary for some features)
### Optional components and associated features
The optional components above are necessary to enable these features:
- [Alerts and Reports](/admin-docs/configuration/alerts-reports)
- [Caching](/admin-docs/configuration/cache)
- [Async Queries](/admin-docs/configuration/async-queries-celery/)
- [Dashboard Thumbnails](/admin-docs/configuration/cache/#caching-thumbnails)
If you install with Kubernetes or Docker Compose, all of these components will be created.
However, installing from PyPI only creates the application itself. Users installing from PyPI will need to configure a caching layer, worker, and beat on their own if they wish to enable the above features. Configuration of those components for a PyPI install is not currently covered in this documentation.
Here are further details on each component.
### The Superset Application
This is the core application. Superset operates like this:
- A user visits a chart or dashboard
- That triggers a SQL query to the data warehouse holding the underlying dataset
- The resulting data is served up in a data visualization
- The Superset application is comprised of the Python (Flask) backend application (server), API layer, and the React frontend, built via Webpack, and static assets needed for the application to work
### Metadata Database
This is where chart and dashboard definitions, user information, logs, etc. are stored. Superset is tested to work with PostgreSQL and MySQL databases as the metadata database (not be confused with a data source like your data warehouse, which could be a much greater variety of options like Snowflake, Redshift, etc.).
Some installation methods like our Quickstart and PyPI come configured by default to use a SQLite on-disk database. And in a Docker Compose installation, the data would be stored in a PostgreSQL container volume. Neither of these cases are recommended for production instances of Superset.
For production, a properly-configured, managed, standalone database is recommended. No matter what database you use, you should plan to back it up regularly.
### Caching Layer
The caching layer serves two main functions:
- Store the results of queries to your data warehouse so that when a chart is loaded twice, it pulls from the cache the second time, speeding up the application and reducing load on your data warehouse.
- Act as a message broker for the worker, enabling the Alerts & Reports, async queries, and thumbnail caching features.
Most people use Redis for their cache, but Superset supports other options too. See the [cache docs](/admin-docs/configuration/cache/) for more.
### Worker and Beat
This is one or more workers who execute tasks like run async queries or take snapshots of reports and send emails, and a "beat" that acts as the scheduler and tells workers when to perform their tasks. Most installations use Celery for these components.
## Other components
Other components can be incorporated into Superset. The best place to learn about additional configurations is the [Configuration page](/admin-docs/configuration/configuring-superset). For instance, you could set up a load balancer or reverse proxy to implement HTTPS in front of your Superset application, or specify a Mapbox URL to enable geospatial charts, etc.
Superset won't even start without certain configuration settings established, so it's essential to review that page.

View File

@@ -0,0 +1,173 @@
---
title: Docker Builds
hide_title: true
sidebar_position: 7
version: 1
---
# Docker builds, images and tags
The Apache Superset community extensively uses Docker for development, release,
and productionizing Superset. This page details our Docker builds and tag naming
schemes to help users navigate our offerings.
Images are built and pushed to the [Superset Docker Hub repository](
https://hub.docker.com/r/apache/superset) using GitHub Actions.
Different sets of images are built and/or published at different times:
- **Published releases** (`release`): published using
tags like `5.0.0` and the `latest` tag.
- **Pull request iterations** (`pull_request`): for each pull request, while
we actively build the docker to validate the build, we do
not publish those images for security reasons, we simply `docker build --load`
- **Merges to the main branch** (`push`): resulting in new SHAs, with tags
prefixed with `master` for the latest `master` version.
## Build presets
We have a set of build "presets" that each represent a combination of
parameters for the build, mostly pointing to either different target layer
for the build, and/or base image.
Here are the build presets that are exposed through the `supersetbot docker` utility:
- `lean`: The default Docker image, including both frontend and backend. Tags
without a build_preset are lean builds (ie: `latest`, `5.0.0`, `4.1.2`, ...). `lean`
builds do not contain database
drivers, meaning you need to install your own. That applies to analytics databases **AND
the metadata database**. You'll likely want to layer either `mysqlclient` or `psycopg2-binary`
depending on the metadata database you choose for your installation, plus the required
drivers to connect to your analytics database(s).
- `dev`: For development, with a headless browser, dev-related utilities and root access. This
includes some commonly used database drivers like `mysqlclient`, `psycopg2-binary` and
some other used for development/CI
- `py311`, e.g., Py311: Similar to lean but with a different Python version (in this example, 3.11).
- `ci`: For certain CI workloads.
- `websocket`: For Superset clusters supporting advanced features.
- `dockerize`: Used by Helm in initContainers to wait for database dependencies to be available.
## Key tags examples
- `latest`: The latest official release build
- `latest-dev`: the `-dev` image of the latest official release build, with a
headless browser and root access.
- `master`: The latest build from the `master` branch, implicitly the lean build
preset
- `master-dev`: Similar to `master` but includes a headless browser and root access.
- `pr-5252`: The latest commit in PR 5252.
- `30948dc401b40982cb7c0dbf6ebbe443b2748c1b-dev`: A build for
this specific SHA, which could be from a `master` merge, or release.
- `websocket-latest`: The WebSocket image for use in a Superset cluster.
For insights or modifications to the build matrix and tagging conventions,
check the [supersetbot docker](https://github.com/apache-superset/supersetbot)
subcommand and the [docker.yml](https://github.com/apache/superset/blob/master/.github/workflows/docker.yml)
GitHub action.
## Building your own production Docker image
Every Superset deployment will require its own set of drivers depending on the data warehouse(s),
etc. so we recommend that users build their own Docker image by extending the `lean` image.
Here's an example Dockerfile that does this. Follow the in-line comments to customize it for
your desired Superset version and database drivers. The comments also note that a certain feature flag will
have to be enabled in your config file.
You would build the image with `docker build -t mysuperset:latest .` or `docker build -t ourcompanysuperset:5.0.0 .`
```Dockerfile
# change this to apache/superset:5.0.0 or whatever version you want to build from;
# otherwise the default is the latest commit on GitHub master branch
FROM apache/superset:master
USER root
# Set environment variable for Playwright
ENV PLAYWRIGHT_BROWSERS_PATH=/usr/local/share/playwright-browsers
# Install packages using uv into the virtual environment
RUN . /app/.venv/bin/activate && \
uv pip install \
# install psycopg2 for using PostgreSQL metadata store - could be a MySQL package if using that backend:
psycopg2-binary \
# add the driver(s) for your data warehouse(s), in this example we're showing for Microsoft SQL Server:
pymssql \
# package needed for using single-sign on authentication:
Authlib \
# openpyxl to be able to upload Excel files
openpyxl \
# Pillow for Alerts & Reports to generate PDFs of dashboards
Pillow \
# install Playwright for taking screenshots for Alerts & Reports. This assumes the feature flag PLAYWRIGHT_REPORTS_AND_THUMBNAILS is enabled
# That feature flag will default to True starting in 6.0.0
# Playwright works only with Chrome.
# If you are still using Selenium instead of Playwright, you would instead install here the selenium package and a headless browser & webdriver
playwright \
&& playwright install-deps \
&& PLAYWRIGHT_BROWSERS_PATH=/usr/local/share/playwright-browsers playwright install chromium
# Switch back to the superset user
USER superset
CMD ["/app/docker/entrypoints/run-server.sh"]
```
## Key ARGs in Dockerfile
- `BUILD_TRANSLATIONS`: whether to build the translations into the image. For the
frontend build this tells webpack to strip out all locales other than `en` from
the `moment-timezone` library. For the backendthis skips compiling the
`*.po` translation files
- `DEV_MODE`: whether to skip the frontend build, this is used by our `docker-compose` dev setup
where we mount the local volume and build using `webpack` in `--watch` mode, meaning as you
alter the code in the local file system, webpack, from within a docker image used for this
purpose, will constantly rebuild the frontend as you go. This ARG enables the initial
`docker-compose` build to take much less time and resources
- `INCLUDE_CHROMIUM`: whether to include chromium in the backend build so that it can be
used as a headless browser for workloads related to "Alerts & Reports" and thumbnail generation
- `INCLUDE_FIREFOX`: same as above, but for firefox
- `PY_VER`: specifying the base image for the python backend, we don't recommend altering
this setting if you're not working on forwards or backwards compatibility
## Caching
To accelerate builds, we follow Docker best practices and use `apache/superset-cache`.
## About database drivers
Our docker images come with little to zero database driver support since
each environment requires different drivers, and maintaining a build with
wide database support would be both challenging (dozens of databases,
python drivers, and os dependencies) and inefficient (longer
build times, larger images, lower layer cache hit rate, ...).
For production use cases, we recommend that you derive our `lean` image(s) and
add database support for the database you need.
## On supporting different platforms (namely arm64 AND amd64)
Currently all automated builds are multi-platform, supporting both `linux/arm64`
and `linux/amd64`. This enables higher level constructs like `helm` and
`docker compose` to point to these images and effectively be multi-platform
as well.
Pull requests and master builds
are one-image-per-platform so that they can be parallelized and the
build matrix for those is more sparse as we don't need to build every
build preset on every platform, and generally can be more selective here.
For those builds, we suffix tags with `-arm` where it applies.
### Working with Apple silicon
Apple's current generation of computers uses ARM-based CPUs, and Docker
running on MACs seem to require `linux/arm64/v8` (at least one user's M2 was
configured in that way). Setting the environment
variable `DOCKER_DEFAULT_PLATFORM` to `linux/amd64` seems to function in
term of leveraging, and building upon the Superset builds provided here.
```bash
export DOCKER_DEFAULT_PLATFORM=linux/amd64
```
Presumably, `linux/arm64/v8` would be more optimized for this generation
of chips, but less compatible across the ARM ecosystem.

View File

@@ -0,0 +1,286 @@
---
title: Docker Compose
hide_title: true
sidebar_position: 5
version: 1
---
import useBaseUrl from "@docusaurus/useBaseUrl";
# Using Docker Compose
<img src={useBaseUrl("/img/docker-compose.webp" )} width="150" />
<br /><br />
:::caution
Since `docker compose` is primarily designed to run a set of containers on **a single host**
and can't support requirements for **high availability**, we do not support nor recommend
using our `docker compose` constructs to support production-type use-cases. For single host
environments, we recommend using [minikube](https://minikube.sigs.k8s.io/docs/start/) along
with our [installing on k8s](https://superset.apache.org/admin-docs/installation/running-on-kubernetes)
documentation.
:::
As mentioned in our [quickstart guide](/user-docs/quickstart), the fastest way to try
Superset locally is using Docker Compose on a Linux or Mac OSX
computer. Superset does not have official support for Windows. It's also the easiest
way to launch a fully functioning **development environment** quickly.
Note that there are 4 major ways we support to run `docker compose`:
1. **docker-compose.yml:** for interactive development, where we mount your local folder with the
frontend/backend files that you can edit and experience the changes you
make in the app in real time
1. **docker-compose-light.yml:** a lightweight configuration with minimal services (database,
Superset app, and frontend dev server) for development. Uses in-memory caching instead of Redis
and is designed for running multiple instances simultaneously
1. **docker-compose-non-dev.yml** where we just build a more immutable image based on the
local branch and get all the required images running. Changes in the local branch
at the time you fire this up will be reflected, but changes to the code
while `up` won't be reflected in the app
1. **docker-compose-image-tag.yml** where we fetch an image from docker-hub say for the
`5.0.0` release for instance, and fire it up so you can try it. Here what's in
the local branch has no effects on what's running, we just fetch and run
pre-built images from docker-hub. For `docker compose` to work along with the
Postgres image it boots up, you'll want to point to a `-dev`-suffixed TAG, as in
`export TAG=5.0.0-dev` or `export TAG=4.1.2-dev`, with `latest-dev` being the default.
The `dev` builds include the `psycopg2-binary` required to connect
to the Postgres database launched as part of the `docker compose` builds.
More on these approaches after setting up the requirements for either.
## Requirements
Note that this documentation assumes that you have [Docker](https://www.docker.com) and
[git](https://git-scm.com/) installed. Note also that we used to use `docker-compose` but that
is on the path to deprecation so we now use `docker compose` instead.
## 1. Clone Superset's GitHub repository
[Clone Superset's repo](https://github.com/apache/superset) in your terminal with the
following command:
```bash
git clone --depth=1 https://github.com/apache/superset.git
```
Once that command completes successfully, you should see a new `superset` folder in your
current directory.
## 2. Launch Superset Through Docker Compose
First let's assume you're familiar with `docker compose` mechanics. Here we'll refer generally
to `docker compose up` even though in some cases you may want to force a check for newer remote
images using `docker compose pull`, force a build with `docker compose build` or force a build
on latest base images using `docker compose build --pull`. In most cases though, the simple
`up` command should do just fine. Refer to docker compose docs for more information on the topic.
### Option #1 - for an interactive development environment
```bash
# The --build argument insures all the layers are up-to-date
docker compose up --build
```
:::tip
When running in development mode the `superset-node`
container needs to finish building assets in order for the UI to render properly. If you would just
like to try out Superset without making any code changes follow the steps documented for
`production` or a specific version below.
:::
:::tip
By default, we mount the local superset-frontend folder here and run `npm install` as well
as `npm run dev` which triggers webpack to compile/bundle the frontend code. Depending
on your local setup, especially if you have less than 16GB of memory, it may be very slow to
perform those operations. In this case, we recommend you set the env var
`BUILD_SUPERSET_FRONTEND_IN_DOCKER` to `false`, and to run this locally instead in a terminal.
Simply trigger `npm i && npm run dev`, this should be MUCH faster.
:::
:::tip
Sometimes, your npm-related state can get out-of-wack, running `npm run prune` from
the `superset-frontend/` folder will nuke the various' packages `node_module/` folders
and help you start fresh. In the context of `docker compose` setting
`export NPM_RUN_PRUNE=true` prior to running `docker compose up` will trigger that
from within docker. This will slow down the startup, but will fix various npm-related issues.
:::
### Option #2 - lightweight development with multiple instances
For a lighter development setup that uses fewer resources and supports running multiple instances:
```bash
# Single lightweight instance (default port 9001)
docker compose -f docker-compose-light.yml up
# Multiple instances with different ports
NODE_PORT=9001 docker compose -p superset-1 -f docker-compose-light.yml up
NODE_PORT=9002 docker compose -p superset-2 -f docker-compose-light.yml up
NODE_PORT=9003 docker compose -p superset-3 -f docker-compose-light.yml up
```
This configuration includes:
- PostgreSQL database (internal network only)
- Superset application server
- Frontend development server with webpack hot reloading
- In-memory caching (no Redis)
- Isolated volumes and networks per instance
Access each instance at `http://localhost:{NODE_PORT}` (e.g., `http://localhost:9001`).
### Option #3 - build a set of immutable images from the local branch
```bash
docker compose -f docker-compose-non-dev.yml up
```
### Option #4 - boot up an official release
```bash
# Set the version you want to run
export TAG=5.0.0
# Fetch the tag you're about to check out (assuming you shallow-cloned the repo)
git fetch --depth=1 origin tag $TAG
# Could also fetch all tags too if you've got bandwidth to spare
# git fetch --tags
# Checkout the corresponding git ref
git checkout $TAG
# Fire up docker compose
docker compose -f docker-compose-image-tag.yml up
```
Here various release tags, github SHA, and latest `master` can be referenced by the TAG env var.
Refer to the docker-related documentation to learn more about existing tags you can point to
from Docker Hub.
:::note
For option #2 and #3, we recommend checking out the release tag from the git repository
(ie: `git checkout 5.0.0`) for more guaranteed results. This ensures that the `docker-compose.*.yml`
configurations and that the mounted `docker/` scripts are in sync with the image you are
looking to fire up.
:::
## `docker compose` tips & configuration
:::caution
All of the content belonging to a Superset instance - charts, dashboards, users, etc. - is stored in
its metadata database. In production, this database should be backed up. The default installation
with docker compose will store that data in a PostgreSQL database contained in a Docker
[volume](https://docs.docker.com/storage/volumes/), which is not backed up.
Again, **THE DOCKER-COMPOSE INSTALLATION IS NOT PRODUCTION-READY OUT OF THE BOX.**
:::
You should see a stream of logging output from the containers being launched on your machine. Once
this output slows, you should have a running instance of Superset on your local machine! To avoid
the wall of text on future runs, add the `-d` option to the end of the `docker compose up` command.
### Configuring Further
The following is for users who want to configure how Superset runs in Docker Compose; otherwise, you
can skip to the next section.
You can install additional python packages and apply config overrides by following the steps
mentioned in [docker/README.md](https://github.com/apache/superset/tree/master/docker#configuration)
Note that `docker/.env` sets the default environment variables for all the docker images
used by `docker compose`, and that `docker/.env-local` can be used to override those defaults.
Also note that `docker/.env-local` is referenced in our `.gitignore`,
preventing developers from risking committing potentially sensitive configuration to the repository.
One important variable is `SUPERSET_LOAD_EXAMPLES` which determines whether the `superset_init`
container will populate example data and visualizations into the metadata database. These examples
are helpful for learning and testing out Superset but unnecessary for experienced users and
production deployments. The loading process can sometimes take a few minutes and a good amount of
CPU, so you may want to disable it on a resource-constrained device.
For more advanced or dynamic configurations that are typically managed in a `superset_config.py` file
located in your `PYTHONPATH`, note that it can be done by providing a
`docker/pythonpath_dev/superset_config_docker.py` that will be ignored by git
(preventing you to commit/push your local configuration back to the repository).
The mechanics of this are in `docker/pythonpath_dev/superset_config.py` where you can see
that the logic runs a `from superset_config_docker import *`
:::note
Users often want to connect to other databases from Superset. Currently, the easiest way to
do this is to modify the `docker-compose-non-dev.yml` file and add your database as a service that
the other services depend on (via `x-superset-depends-on`). Others have attempted to set
`network_mode: host` on the Superset services, but these generally break the installation,
because the configuration requires use of the Docker Compose DNS resolver for the service names.
If you have a good solution for this, let us know!
:::
:::note
Superset uses [Scarf Gateway](https://about.scarf.sh/scarf-gateway) to collect telemetry
data. Knowing the installation counts for different Superset versions informs the project's
decisions about patching and long-term support. Scarf purges personally identifiable information
(PII) and provides only aggregated statistics.
To opt-out of this data collection for packages downloaded through the Scarf Gateway by your docker
compose based installation, edit the `x-superset-image:` line in your `docker-compose.yml` and
`docker-compose-non-dev.yml` files, replacing `apachesuperset.docker.scarf.sh/apache/superset` with
`apache/superset` to pull the image directly from Docker Hub.
To disable the Scarf telemetry pixel, set the `SCARF_ANALYTICS` environment variable to `False` in
your terminal and/or in your `docker/.env` file.
:::
## 3. Log in to Superset
Your local Superset instance also includes a Postgres server to store your data and is already
pre-loaded with some example datasets that ship with Superset. You can access Superset now via your
web browser by visiting `http://localhost:8088`. Note that many browsers now default to `https` - if
yours is one of them, please make sure it uses `http`.
Log in with the default username and password:
```bash
username: admin
```
```bash
password: admin
```
## 4. Connecting Superset to your local database instance
When running Superset using `docker` or `docker compose` it runs in its own docker container, as if
the Superset was running in a separate machine entirely. Therefore attempts to connect to your local
database with the hostname `localhost` won't work as `localhost` refers to the docker container
Superset is running in, and not your actual host machine. Fortunately, docker provides an easy way
to access network resources in the host machine from inside a container, and we will leverage this
capability to connect to our local database instance.
Here the instructions are for connecting to postgresql (which is running on your host machine) from
Superset (which is running in its docker container). Other databases may have slightly different
configurations but gist would be same and boils down to 2 steps -
1. **(Mac users may skip this step)** Configuring the local postgresql/database instance to accept
public incoming connections. By default, postgresql only allows incoming connections from
`localhost` and under Docker, unless you use `--network=host`, `localhost` will refer to different
endpoints on the host machine and in a docker container respectively. Allowing postgresql to accept
connections from the Docker involves making one-line changes to the files `postgresql.conf` and
`pg_hba.conf`; you can find helpful links tailored to your OS / PG version on the web easily for
this task. For Docker it suffices to only whitelist IPs `172.0.0.0/8` instead of `*`, but in any
case you are _warned_ that doing this in a production database _may_ have disastrous consequences as
you are opening your database to the public internet.
1. Instead of `localhost`, try using `host.docker.internal` (Mac users, Ubuntu) or `172.18.0.1`
(Linux users) as the hostname when attempting to connect to the database. This is a Docker internal
detail -- what is happening is that, in Mac systems, Docker Desktop creates a dns entry for the
hostname `host.docker.internal` which resolves to the correct address for the host machine, whereas
in Linux this is not the case (at least by default). If neither of these 2 hostnames work then you
may want to find the exact hostname you want to use, for that you can do `ifconfig` or
`ip addr show` and look at the IP address of `docker0` interface that must have been created by
Docker for you. Alternately if you don't even see the `docker0` interface try (if needed with sudo)
`docker network inspect bridge` and see if there is an entry for `"Gateway"` and note the IP
address.
## 4. To build or not to build
When running `docker compose up`, docker will build what is required behind the scene, but
may use the docker cache if assets already exist. Running `docker compose build` prior to
`docker compose up` or the equivalent shortcut `docker compose up --build` ensures that your
docker images match the definition in the repository. This should only apply to the main
docker-compose.yml file (default) and not to the alternative methods defined above.

View File

@@ -0,0 +1,56 @@
---
title: Installation Methods
hide_title: true
sidebar_position: 2
version: 1
---
import useBaseUrl from "@docusaurus/useBaseUrl";
# Installation Methods
How should you install Superset? Here's a comparison of the different options. It will help if you've first read the [Architecture](/admin-docs/installation/architecture) page to understand Superset's different components.
The fundamental trade-off is between you needing to do more of the detail work yourself vs. using a more complex deployment route that handles those details.
## [Docker Compose](/admin-docs/installation/docker-compose)
**Summary:** This takes advantage of containerization while remaining simpler than Kubernetes. This is the best way to try out Superset; it's also useful for developing & contributing back to Superset.
If you're not just demoing the software, you'll need a moderate understanding of Docker to customize your deployment and avoid a few risks. Even when fully-optimized this is not as robust a method as Kubernetes when it comes to large-scale production deployments.
You manage a superset-config.py file and a docker-compose.yml file. Docker Compose brings up all the needed services - the Superset application, a Postgres metadata DB, Redis cache, Celery worker and beat. They are automatically connected to each other.
**Responsibilities**
You will need to back up your metadata DB. That could mean backing up the service running as a Docker container and its volume; ideally you are running Postgres as a service outside of that container and backing up that service.
You will also need to extend the Superset docker image. The default `lean` images do not contain drivers needed to access your metadata database (Postgres or MySQL), nor to access your data warehouse, nor the headless browser needed for Alerts & Reports. You could run a `-dev` image while demoing Superset, which has some of this, but you'll still need to install the driver for your data warehouse. The `-dev` images run as root, which is not recommended for production.
Ideally you will build your own image of Superset that extends `lean`, adding what your deployment needs. See [Building your own production Docker image](/admin-docs/installation/docker-builds/#building-your-own-production-docker-image).
## [Kubernetes (K8s)](/admin-docs/installation/kubernetes)
**Summary:** This is the best-practice way to deploy a production instance of Superset, but has the steepest skill requirement - someone who knows Kubernetes.
You will deploy Superset into a K8s cluster. The most common method is using the community-maintained Helm chart, though work is now underway to implement [SIP-149 - a Kubernetes Operator for Superset](https://github.com/apache/superset/issues/31408).
A K8s deployment can scale up and down based on usage and deploy rolling updates with zero downtime - features that big deployments appreciate.
**Responsibilities**
You will need to build your own Docker image, and back up your metadata DB, both as described in Docker Compose above. You'll also need to customize your Helm chart values and deploy and maintain your Kubernetes cluster.
## [PyPI (Python)](/admin-docs/installation/pypi)
**Summary:** This is the only method that requires no knowledge of containers. It requires the most hands-on work to deploy, connect, and maintain each component.
You install Superset as a Python package and run it that way, providing your own metadata database. Superset has documentation on how to install this way, but it is updated infrequently.
If you want caching, you'll set up Redis or RabbitMQ. If you want Alerts & Reports, you'll set up Celery.
**Responsibilities**
You will need to get the component services running and communicating with each other. You'll need to arrange backups of your metadata database.
When upgrading, you'll need to manage the system environment and packages and ensure all components have functional dependencies.

View File

@@ -0,0 +1,451 @@
---
title: Kubernetes
hide_title: true
sidebar_position: 3
version: 1
---
import useBaseUrl from "@docusaurus/useBaseUrl";
# Installing on Kubernetes
<img src={useBaseUrl("/img/k8s.png" )} width="150" />
<br /><br />
Running Superset on Kubernetes is supported with the provided [Helm](https://helm.sh/) chart
found in the official [Superset helm repository](https://apache.github.io/superset/index.yaml).
## Prerequisites
- A Kubernetes cluster
- Helm installed
:::note
For simpler, single host environments, we recommend using
[minikube](https://minikube.sigs.k8s.io/docs/start/) which is easy to setup on many platforms
and works fantastically well with the Helm chart referenced here.
:::
## Running
1. Add the Superset helm repository
```sh
helm repo add superset https://apache.github.io/superset
"superset" has been added to your repositories
```
2. View charts in repo
```sh
helm search repo superset
NAME CHART VERSION APP VERSION DESCRIPTION
superset/superset 0.1.1 1.0 Apache Superset is a modern, enterprise-ready b...
```
3. Configure your setting overrides
Just like any typical Helm chart, you'll need to craft a `values.yaml` file that would define/override any of the values exposed into the default [values.yaml](https://github.com/apache/superset/tree/master/helm/superset/values.yaml), or from any of the dependent charts it depends on:
- [bitnami/redis](https://artifacthub.io/packages/helm/bitnami/redis)
- [bitnami/postgresql](https://artifacthub.io/packages/helm/bitnami/postgresql)
More info down below on some important overrides you might need.
4. Install and run
```sh
helm upgrade --install --values my-values.yaml superset superset/superset
```
You should see various pods popping up, such as:
```sh
kubectl get pods
NAME READY STATUS RESTARTS AGE
superset-celerybeat-7cdcc9575f-k6xmc 1/1 Running 0 119s
superset-f5c9c667-dw9lp 1/1 Running 0 4m7s
superset-f5c9c667-fk8bk 1/1 Running 0 4m11s
superset-init-db-zlm9z 0/1 Completed 0 111s
superset-postgresql-0 1/1 Running 0 6d20h
superset-redis-master-0 1/1 Running 0 6d20h
superset-worker-75b48bbcc-jmmjr 1/1 Running 0 4m8s
superset-worker-75b48bbcc-qrq49 1/1 Running 0 4m12s
```
The exact list will depend on some of your specific configuration overrides but you should generally expect:
- N `superset-xxxx-yyyy` and `superset-worker-xxxx-yyyy` pods (depending on your `supersetNode.replicaCount` and `supersetWorker.replicaCount` values)
- 1 `superset-postgresql-0` depending on your postgres settings
- 1 `superset-redis-master-0` depending on your redis settings
- 1 `superset-celerybeat-xxxx-yyyy` pod if you have `supersetCeleryBeat.enabled = true` in your values overrides
1. Access it
The chart will publish appropriate services to expose the Superset UI internally within your k8s cluster. To access it externally you will have to either:
- Configure the Service as a `LoadBalancer` or `NodePort`
- Set up an `Ingress` for it - the chart includes a definition, but will need to be tuned to your needs (hostname, tls, annotations etc...)
- Run `kubectl port-forward superset-xxxx-yyyy :8088` to directly tunnel one pod's port into your localhost
Depending how you configured external access, the URL will vary. Once you've identified the appropriate URL you can log in with:
- user: `admin`
- password: `admin`
## Important settings
### Security settings
Default security settings and passwords are included but you **MUST** update them to run `prod` instances, in particular:
```yaml
postgresql:
postgresqlPassword: superset
```
Make sure, you set a unique strong complex alphanumeric string for your SECRET_KEY and use a tool to help you generate
a sufficiently random sequence.
- To generate a good key you can run, `openssl rand -base64 42`
```yaml
configOverrides:
secret: |
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
```
If you want to change the previous secret key then you should rotate the keys.
Default secret key for kubernetes deployment is `thisISaSECRET_1234`
```yaml
configOverrides:
my_override: |
PREVIOUS_SECRET_KEY = 'YOUR_PREVIOUS_SECRET_KEY'
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
init:
command:
- /bin/sh
- -c
- |
. {{ .Values.configMountPath }}/superset_bootstrap.sh
superset re-encrypt-secrets
. {{ .Values.configMountPath }}/superset_init.sh
```
:::note
Superset uses [Scarf Gateway](https://about.scarf.sh/scarf-gateway) to collect telemetry data. Knowing the installation counts for different Superset versions informs the project's decisions about patching and long-term support. Scarf purges personally identifiable information (PII) and provides only aggregated statistics.
To opt-out of this data collection in your Helm-based installation, edit the `repository:` line in your `helm/superset/values.yaml` file, replacing `apachesuperset.docker.scarf.sh/apache/superset` with `apache/superset` to pull the image directly from Docker Hub.
:::
### Dependencies
Install additional packages and do any other bootstrap configuration in the bootstrap script.
For production clusters it's recommended to build own image with this step done in CI.
:::note
Superset requires a Python DB-API database driver and a SQLAlchemy
dialect to be installed for each datastore you want to connect to.
See [Install Database Drivers](/admin-docs/databases#installing-database-drivers) for more information.
It is recommended that you refer to versions listed in
[pyproject.toml](https://github.com/apache/superset/blob/master/pyproject.toml)
instead of hard-coding them in your bootstrap script, as seen below.
:::
The following example installs the drivers for BigQuery and Elasticsearch, allowing you to connect to these data sources within your Superset setup:
```yaml
bootstrapScript: |
#!/bin/bash
uv pip install .[postgres] \
.[bigquery] \
.[elasticsearch] &&\
if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi
```
### superset_config.py
The default `superset_config.py` is fairly minimal and you will very likely need to extend it. This is done by specifying one or more key/value entries in `configOverrides`, e.g.:
```yaml
configOverrides:
my_override: |
# This will make sure the redirect_uri is properly computed, even with SSL offloading
ENABLE_PROXY_FIX = True
FEATURE_FLAGS = {
"DYNAMIC_PLUGINS": True
}
```
Those will be evaluated as Helm templates and therefore will be able to reference other `values.yaml` variables e.g. `{{ .Values.ingress.hosts[0] }}` will resolve to your ingress external domain.
The entire `superset_config.py` will be installed as a secret, so it is safe to pass sensitive parameters directly... however it might be more readable to use secret env variables for that.
Full python files can be provided by running `helm upgrade --install --values my-values.yaml --set-file configOverrides.oauth=set_oauth.py`
### Environment Variables
Those can be passed as key/values either with `extraEnv` or `extraSecretEnv` if they're sensitive. They can then be referenced from `superset_config.py` using e.g. `os.environ.get("VAR")`.
```yaml
extraEnv:
SMTP_HOST: smtp.gmail.com
SMTP_USER: user@gmail.com
SMTP_PORT: "587"
SMTP_MAIL_FROM: user@gmail.com
extraSecretEnv:
SMTP_PASSWORD: xxxx
configOverrides:
smtp: |
import ast
SMTP_HOST = os.getenv("SMTP_HOST","localhost")
SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True"))
SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False"))
SMTP_USER = os.getenv("SMTP_USER","superset")
SMTP_PORT = os.getenv("SMTP_PORT",25)
SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset")
```
### System packages
If new system packages are required, they can be installed before application startup by overriding the container's `command`, e.g.:
```yaml
supersetWorker:
command:
- /bin/sh
- -c
- |
apt update
apt install -y somepackage
apt autoremove -yqq --purge
apt clean
# Run celery worker
. {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker
```
### Data sources
Data source definitions can be automatically declared by providing key/value yaml definitions in `extraConfigs`:
```yaml
extraConfigs:
import_datasources.yaml: |
databases:
- allow_file_upload: true
allow_ctas: true
allow_cvas: true
database_name: example-db
extra: "{\r\n \"metadata_params\": {},\r\n \"engine_params\": {},\r\n \"\
metadata_cache_timeout\": {},\r\n \"schemas_allowed_for_file_upload\": []\r\n\
}"
sqlalchemy_uri: example://example-db.local
tables: []
```
Those will also be mounted as secrets and can include sensitive parameters.
## Configuration Examples
### Setting up OAuth
:::note
OAuth setup requires that the [authlib](https://authlib.org/) Python library is installed. This can
be done using `pip` by updating the `bootstrapScript`. See the [Dependencies](#dependencies) section
for more information.
:::
```yaml
extraEnv:
AUTH_DOMAIN: example.com
extraSecretEnv:
GOOGLE_KEY: xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com
GOOGLE_SECRET: xxxxxxxxxxxxxxxxxxxxxxxx
configOverrides:
enable_oauth: |
# This will make sure the redirect_uri is properly computed, even with SSL offloading
ENABLE_PROXY_FIX = True
from flask_appbuilder.security.manager import AUTH_OAUTH
AUTH_TYPE = AUTH_OAUTH
OAUTH_PROVIDERS = [
{
"name": "google",
"icon": "fa-google",
"token_key": "access_token",
"remote_app": {
"client_id": os.getenv("GOOGLE_KEY"),
"client_secret": os.getenv("GOOGLE_SECRET"),
"api_base_url": "https://www.googleapis.com/oauth2/v2/",
"client_kwargs": {"scope": "email profile"},
"request_token_url": None,
"access_token_url": "https://accounts.google.com/o/oauth2/token",
"authorize_url": "https://accounts.google.com/o/oauth2/auth",
"authorize_params": {"hd": os.getenv("AUTH_DOMAIN", "")}
},
}
]
# Map Authlib roles to superset roles
AUTH_ROLE_ADMIN = 'Admin'
AUTH_ROLE_PUBLIC = 'Public'
# Will allow user self registration, allowing to create Flask users from Authorized User
AUTH_USER_REGISTRATION = True
# The default user self registration role
AUTH_USER_REGISTRATION_ROLE = "Admin"
```
### Enable Alerts and Reports
For this, as per the [Alerts and Reports doc](/admin-docs/configuration/alerts-reports), you will need to:
#### Install a supported webdriver in the Celery worker
This is done either by using a custom image that has the webdriver pre-installed, or installing at startup time by overriding the `command`. Here's a working example for `chromedriver`:
```yaml
supersetWorker:
command:
- /bin/sh
- -c
- |
# Install chrome webdriver
# See https://github.com/apache/superset/blob/4fa3b6c7185629b87c27fc2c0e5435d458f7b73d/docs/src/pages/admin-docs/installation/email_reports.mdx
apt-get update
apt-get install -y wget
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
apt-get install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb
wget https://chromedriver.storage.googleapis.com/88.0.4324.96/chromedriver_linux64.zip
apt-get install -y zip
unzip chromedriver_linux64.zip
chmod +x chromedriver
mv chromedriver /usr/bin
apt-get autoremove -yqq --purge
apt-get clean
rm -f google-chrome-stable_current_amd64.deb chromedriver_linux64.zip
# Run
. {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker
```
#### Run the Celery beat
This pod will trigger the scheduled tasks configured in the alerts and reports UI section:
```yaml
supersetCeleryBeat:
enabled: true
```
#### Configure the appropriate Celery jobs and SMTP/Slack settings
```yaml
extraEnv:
SMTP_HOST: smtp.gmail.com
SMTP_USER: user@gmail.com
SMTP_PORT: "587"
SMTP_MAIL_FROM: user@gmail.com
extraSecretEnv:
SLACK_API_TOKEN: xoxb-xxxx-yyyy
SMTP_PASSWORD: xxxx-yyyy
configOverrides:
feature_flags: |
import ast
FEATURE_FLAGS = {
"ALERT_REPORTS": True
}
SMTP_HOST = os.getenv("SMTP_HOST","localhost")
SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True"))
SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False"))
SMTP_USER = os.getenv("SMTP_USER","superset")
SMTP_PORT = os.getenv("SMTP_PORT",25)
SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset")
SMTP_MAIL_FROM = os.getenv("SMTP_MAIL_FROM","superset@superset.com")
SLACK_API_TOKEN = os.getenv("SLACK_API_TOKEN",None)
celery_conf: |
from celery.schedules import crontab
class CeleryConfig:
broker_url = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
imports = (
"superset.sql_lab",
"superset.tasks.cache",
"superset.tasks.scheduler",
)
result_backend = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
task_annotations = {
"sql_lab.get_sql_results": {
"rate_limit": "100/s",
},
}
beat_schedule = {
"reports.scheduler": {
"task": "reports.scheduler",
"schedule": crontab(minute="*", hour="*"),
},
"reports.prune_log": {
"task": "reports.prune_log",
'schedule': crontab(minute=0, hour=0),
},
'cache-warmup-hourly': {
"task": "cache-warmup",
"schedule": crontab(minute="*/30", hour="*"),
"kwargs": {
"strategy_name": "top_n_dashboards",
"top_n": 10,
"since": "7 days ago",
},
}
}
CELERY_CONFIG = CeleryConfig
reports: |
EMAIL_PAGE_RENDER_WAIT = 60
WEBDRIVER_BASEURL = "http://{{ template "superset.fullname" . }}:{{ .Values.service.port }}/"
WEBDRIVER_BASEURL_USER_FRIENDLY = "https://www.example.com/"
WEBDRIVER_TYPE= "chrome"
WEBDRIVER_OPTION_ARGS = [
"--force-device-scale-factor=2.0",
"--high-dpi-support=2.0",
"--headless",
"--disable-gpu",
"--disable-dev-shm-usage",
# This is required because our process runs as root (in order to install pip packages)
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-extensions",
]
```
### Load the Examples data and dashboards
If you are trying Superset out and want some data and dashboards to explore, you can load some examples by creating a `my_values.yaml` and deploying it as described above in the **Configure your setting overrides** step of the **Running** section.
To load the examples, add the following to the `my_values.yaml` file:
```yaml
init:
loadExamples: true
```
:::resources
- [Tutorial: Mastering Data Visualization — Installing Superset on Kubernetes with Helm Chart](https://mahira-technology.medium.com/mastering-data-visualization-installing-superset-on-kubernetes-cluster-using-helm-chart-e4ec99199e1e)
- [Tutorial: Installing Apache Superset in Kubernetes](https://aws.plainenglish.io/installing-apache-superset-in-kubernetes-1aec192ac495)
:::

View File

@@ -0,0 +1,164 @@
---
title: PyPI
hide_title: true
sidebar_position: 4
version: 1
---
import useBaseUrl from "@docusaurus/useBaseUrl";
# Installing Superset from PyPI
<img src={useBaseUrl("/img/pypi.png" )} width="150" />
<br /><br />
This page describes how to install Superset using the `apache_superset` package [published on PyPI](https://pypi.org/project/apache_superset/).
## OS Dependencies
Superset stores database connection information in its metadata database. For that purpose, we use
the cryptography Python library to encrypt connection passwords. Unfortunately, this library has OS
level dependencies.
**Debian and Ubuntu**
Ubuntu **24.04** uses python 3.12 per default, which currently is not supported by Superset. You need to add a second python installation of 3.11 and install the required additional dependencies.
```bash
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.11 python3.11-dev python3.11-venv build-essential libssl-dev libffi-dev libsasl2-dev libldap2-dev default-libmysqlclient-dev
```
In Ubuntu **20.04 and 22.04** the following command will ensure that the required dependencies are installed:
```bash
sudo apt-get install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev default-libmysqlclient-dev
```
In Ubuntu **before 20.04** the following command will ensure that the required dependencies are installed:
```bash
sudo apt-get install build-essential libssl-dev libffi-dev python-dev python-pip libsasl2-dev libldap2-dev default-libmysqlclient-dev
```
**Fedora and RHEL-derivative Linux distributions**
Install the following packages using the `yum` package manager:
```bash
sudo yum install gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel cyrus-sasl-devel openldap-devel
```
In more recent versions of CentOS and Fedora, you may need to install a slightly different set of packages using `dnf`:
```bash
sudo dnf install gcc gcc-c++ libffi-devel python3-devel python3-pip python3-wheel openssl-devel cyrus-sasl-devel openldap-devel
```
Also, on CentOS, you may need to upgrade pip for the install to work:
```bash
pip3 install --upgrade pip
```
**Mac OS X**
If you're not on the latest version of OS X, we recommend upgrading because we've found that many
issues people have run into are linked to older versions of Mac OS X. After updating, install the
latest version of XCode command line tools:
```bash
xcode-select --install
```
We don't recommend using the system installed Python. Instead, first install the
[homebrew](https://brew.sh/) manager and then run the following commands:
```bash
brew install readline pkg-config libffi openssl mysql postgresql@14
```
You should install a recent version of Python. Refer to the
[pyproject.toml](https://github.com/apache/superset/blob/master/pyproject.toml) file for a list of Python
versions officially supported by Superset. We'd recommend using a Python version manager
like [pyenv](https://github.com/pyenv/pyenv)
(and also [pyenv-virtualenv](https://github.com/pyenv/pyenv-virtualenv)).
Let's also make sure we have the latest version of `pip` and `setuptools`:
```bash
pip install --upgrade setuptools pip
```
Lastly, you may need to set LDFLAGS and CFLAGS for certain Python packages to properly build. You can export these variables with:
```bash
export LDFLAGS="-L$(brew --prefix openssl)/lib"
export CFLAGS="-I$(brew --prefix openssl)/include"
```
These will now be available when pip installing requirements.
## Python Virtual Environment
We highly recommend installing Superset inside of a virtual environment.
You can create and activate a virtual environment using the following commands. Ensure you are using a compatible version of python. You might have to explicitly use for example `python3.11` instead of `python3`.
```bash
# virtualenv is shipped in Python 3.6+ as venv instead of pyvenv.
# See https://docs.python.org/3.6/library/venv.html
python3 -m venv venv
. venv/bin/activate
```
Or with pyenv-virtualenv:
```bash
# Here we name the virtual env 'superset'
pyenv virtualenv superset
pyenv activate superset
```
Once you activated your virtual environment, all of the Python packages you install or uninstall
will be confined to this environment. You can exit the environment by running `deactivate` on the
command line.
### Installing and Initializing Superset
First, start by installing `apache_superset`:
```bash
pip install apache_superset
```
Then, define mandatory configurations, SECRET_KEY and FLASK_APP:
```bash
export SUPERSET_SECRET_KEY=YOUR-SECRET-KEY # For production use, make sure this is a strong key, for example generated using `openssl rand -base64 42`. See https://superset.apache.org/admin-docs/configuration/configuring-superset#specifying-a-secret_key
export FLASK_APP=superset
```
Then, you need to initialize the database:
```bash
superset db upgrade
```
Finish installing by running through the following commands:
```bash
# Create an admin user in your metadata database (use `admin` as username to be able to load the examples)
superset fab create-admin
# Load some data to play with
superset load_examples
# Create default roles and permissions
superset init
# To start a development web server on port 8088, use -p to bind to another port
superset run -p 8088 --with-threads --reload --debugger
```
If everything worked, you should be able to navigate to `hostname:port` in your browser (e.g.
locally by default at `localhost:8088`) and login using the username and password you created.

View File

@@ -0,0 +1,61 @@
---
title: Upgrading Superset
hide_title: true
sidebar_position: 6
version: 1
---
# Upgrading Superset
## Docker Compose
First, make sure to shut down the running containers in Docker Compose:
```bash
docker compose down
```
Next, update the folder that mirrors the `superset` repo through git:
```bash
git pull origin master
```
Then, restart the containers and any changed Docker images will be automatically pulled down:
```bash
docker compose up
```
## Updating Superset Manually
To upgrade superset in a native installation, run the following commands:
```bash
pip install apache_superset --upgrade
```
## Upgrading the Metadata Database
Migrate the metadata database by running:
```bash
superset db upgrade
superset init
```
While upgrading superset should not delete your charts and dashboards, we recommend following best
practices and to backup your metadata database before upgrading. Before upgrading production, we
recommend upgrading in a staging environment and upgrading production finally during off-peak usage.
## Breaking Changes
For a detailed list of breaking changes and migration notes for each version, see
[UPDATING.md](https://github.com/apache/superset/blob/master/UPDATING.md).
This file documents backwards-incompatible changes and provides guidance for migrating between
major versions, including:
- Configuration changes
- API changes
- Database migrations
- Deprecated features