mirror of
https://github.com/apache/superset.git
synced 2026-04-07 18:35:15 +00:00
174 lines
8.0 KiB
Plaintext
174 lines
8.0 KiB
Plaintext
---
|
|
title: Docker Builds
|
|
hide_title: true
|
|
sidebar_position: 7
|
|
version: 1
|
|
---
|
|
|
|
# Docker builds, images and tags
|
|
|
|
The Apache Superset community extensively uses Docker for development, release,
|
|
and productionizing Superset. This page details our Docker builds and tag naming
|
|
schemes to help users navigate our offerings.
|
|
|
|
Images are built and pushed to the [Superset Docker Hub repository](
|
|
https://hub.docker.com/r/apache/superset) using GitHub Actions.
|
|
Different sets of images are built and/or published at different times:
|
|
|
|
- **Published releases** (`release`): published using
|
|
tags like `5.0.0` and the `latest` tag.
|
|
- **Pull request iterations** (`pull_request`): for each pull request, while
|
|
we actively build the docker to validate the build, we do
|
|
not publish those images for security reasons, we simply `docker build --load`
|
|
- **Merges to the main branch** (`push`): resulting in new SHAs, with tags
|
|
prefixed with `master` for the latest `master` version.
|
|
|
|
## Build presets
|
|
|
|
We have a set of build "presets" that each represent a combination of
|
|
parameters for the build, mostly pointing to either different target layer
|
|
for the build, and/or base image.
|
|
|
|
Here are the build presets that are exposed through the `supersetbot docker` utility:
|
|
|
|
- `lean`: The default Docker image, including both frontend and backend. Tags
|
|
without a build_preset are lean builds (ie: `latest`, `5.0.0`, `4.1.2`, ...). `lean`
|
|
builds do not contain database
|
|
drivers, meaning you need to install your own. That applies to analytics databases **AND
|
|
the metadata database**. You'll likely want to layer either `mysqlclient` or `psycopg2-binary`
|
|
depending on the metadata database you choose for your installation, plus the required
|
|
drivers to connect to your analytics database(s).
|
|
- `dev`: For development, with a headless browser, dev-related utilities and root access. This
|
|
includes some commonly used database drivers like `mysqlclient`, `psycopg2-binary` and
|
|
some other used for development/CI
|
|
- `py311`, e.g., Py311: Similar to lean but with a different Python version (in this example, 3.11).
|
|
- `ci`: For certain CI workloads.
|
|
- `websocket`: For Superset clusters supporting advanced features.
|
|
- `dockerize`: Used by Helm in initContainers to wait for database dependencies to be available.
|
|
|
|
## Key tags examples
|
|
|
|
- `latest`: The latest official release build
|
|
- `latest-dev`: the `-dev` image of the latest official release build, with a
|
|
headless browser and root access.
|
|
- `master`: The latest build from the `master` branch, implicitly the lean build
|
|
preset
|
|
- `master-dev`: Similar to `master` but includes a headless browser and root access.
|
|
- `pr-5252`: The latest commit in PR 5252.
|
|
- `30948dc401b40982cb7c0dbf6ebbe443b2748c1b-dev`: A build for
|
|
this specific SHA, which could be from a `master` merge, or release.
|
|
- `websocket-latest`: The WebSocket image for use in a Superset cluster.
|
|
|
|
For insights or modifications to the build matrix and tagging conventions,
|
|
check the [supersetbot docker](https://github.com/apache-superset/supersetbot)
|
|
subcommand and the [docker.yml](https://github.com/apache/superset/blob/master/.github/workflows/docker.yml)
|
|
GitHub action.
|
|
|
|
## Building your own production Docker image
|
|
|
|
Every Superset deployment will require its own set of drivers depending on the data warehouse(s),
|
|
etc. so we recommend that users build their own Docker image by extending the `lean` image.
|
|
|
|
Here's an example Dockerfile that does this. Follow the in-line comments to customize it for
|
|
your desired Superset version and database drivers. The comments also note that a certain feature flag will
|
|
have to be enabled in your config file.
|
|
|
|
You would build the image with `docker build -t mysuperset:latest .` or `docker build -t ourcompanysuperset:5.0.0 .`
|
|
|
|
```Dockerfile
|
|
# change this to apache/superset:5.0.0 or whatever version you want to build from;
|
|
# otherwise the default is the latest commit on GitHub master branch
|
|
FROM apache/superset:master
|
|
|
|
USER root
|
|
|
|
# Set environment variable for Playwright
|
|
ENV PLAYWRIGHT_BROWSERS_PATH=/usr/local/share/playwright-browsers
|
|
|
|
# Install packages using uv into the virtual environment
|
|
RUN . /app/.venv/bin/activate && \
|
|
uv pip install \
|
|
# install psycopg2 for using PostgreSQL metadata store - could be a MySQL package if using that backend:
|
|
psycopg2-binary \
|
|
# add the driver(s) for your data warehouse(s), in this example we're showing for Microsoft SQL Server:
|
|
pymssql \
|
|
# package needed for using single-sign on authentication:
|
|
Authlib \
|
|
# openpyxl to be able to upload Excel files
|
|
openpyxl \
|
|
# Pillow for Alerts & Reports to generate PDFs of dashboards
|
|
Pillow \
|
|
# install Playwright for taking screenshots for Alerts & Reports. This assumes the feature flag PLAYWRIGHT_REPORTS_AND_THUMBNAILS is enabled
|
|
# That feature flag will default to True starting in 6.0.0
|
|
# Playwright works only with Chrome.
|
|
# If you are still using Selenium instead of Playwright, you would instead install here the selenium package and a headless browser & webdriver
|
|
playwright \
|
|
&& playwright install-deps \
|
|
&& PLAYWRIGHT_BROWSERS_PATH=/usr/local/share/playwright-browsers playwright install chromium
|
|
|
|
# Switch back to the superset user
|
|
USER superset
|
|
|
|
CMD ["/app/docker/entrypoints/run-server.sh"]
|
|
```
|
|
|
|
## Key ARGs in Dockerfile
|
|
|
|
- `BUILD_TRANSLATIONS`: whether to build the translations into the image. For the
|
|
frontend build this tells webpack to strip out all locales other than `en` from
|
|
the `moment-timezone` library. For the backendthis skips compiling the
|
|
`*.po` translation files
|
|
- `DEV_MODE`: whether to skip the frontend build, this is used by our `docker-compose` dev setup
|
|
where we mount the local volume and build using `webpack` in `--watch` mode, meaning as you
|
|
alter the code in the local file system, webpack, from within a docker image used for this
|
|
purpose, will constantly rebuild the frontend as you go. This ARG enables the initial
|
|
`docker-compose` build to take much less time and resources
|
|
- `INCLUDE_CHROMIUM`: whether to include chromium in the backend build so that it can be
|
|
used as a headless browser for workloads related to "Alerts & Reports" and thumbnail generation
|
|
- `INCLUDE_FIREFOX`: same as above, but for firefox
|
|
- `PY_VER`: specifying the base image for the python backend, we don't recommend altering
|
|
this setting if you're not working on forwards or backwards compatibility
|
|
|
|
## Caching
|
|
|
|
To accelerate builds, we follow Docker best practices and use `apache/superset-cache`.
|
|
|
|
## About database drivers
|
|
|
|
Our docker images come with little to zero database driver support since
|
|
each environment requires different drivers, and maintaining a build with
|
|
wide database support would be both challenging (dozens of databases,
|
|
python drivers, and os dependencies) and inefficient (longer
|
|
build times, larger images, lower layer cache hit rate, ...).
|
|
|
|
For production use cases, we recommend that you derive our `lean` image(s) and
|
|
add database support for the database you need.
|
|
|
|
## On supporting different platforms (namely arm64 AND amd64)
|
|
|
|
Currently all automated builds are multi-platform, supporting both `linux/arm64`
|
|
and `linux/amd64`. This enables higher level constructs like `helm` and
|
|
`docker compose` to point to these images and effectively be multi-platform
|
|
as well.
|
|
|
|
Pull requests and master builds
|
|
are one-image-per-platform so that they can be parallelized and the
|
|
build matrix for those is more sparse as we don't need to build every
|
|
build preset on every platform, and generally can be more selective here.
|
|
For those builds, we suffix tags with `-arm` where it applies.
|
|
|
|
### Working with Apple silicon
|
|
|
|
Apple's current generation of computers uses ARM-based CPUs, and Docker
|
|
running on MACs seem to require `linux/arm64/v8` (at least one user's M2 was
|
|
configured in that way). Setting the environment
|
|
variable `DOCKER_DEFAULT_PLATFORM` to `linux/amd64` seems to function in
|
|
term of leveraging, and building upon the Superset builds provided here.
|
|
|
|
```bash
|
|
export DOCKER_DEFAULT_PLATFORM=linux/amd64
|
|
```
|
|
|
|
Presumably, `linux/arm64/v8` would be more optimized for this generation
|
|
of chips, but less compatible across the ARM ecosystem.
|