--- title: Kubernetes hide_title: true sidebar_position: 3 version: 1 --- import useBaseUrl from "@docusaurus/useBaseUrl"; # Installing on Kubernetes

Running Superset on Kubernetes is supported with the provided [Helm](https://helm.sh/) chart found in the official [Superset helm repository](https://apache.github.io/superset/index.yaml). ## Prerequisites - A Kubernetes cluster - Helm installed :::note For simpler, single host environments, we recommend using [minikube](https://minikube.sigs.k8s.io/docs/start/) which is easy to setup on many platforms and works fantastically well with the Helm chart referenced here. ::: ## Running 1. Add the Superset helm repository ```sh helm repo add superset https://apache.github.io/superset "superset" has been added to your repositories ``` 2. View charts in repo ```sh helm search repo superset NAME CHART VERSION APP VERSION DESCRIPTION superset/superset 0.1.1 1.0 Apache Superset is a modern, enterprise-ready b... ``` 3. Configure your setting overrides Just like any typical Helm chart, you'll need to craft a `values.yaml` file that would define/override any of the values exposed into the default [values.yaml](https://github.com/apache/superset/tree/master/helm/superset/values.yaml), or from any of the dependent charts it depends on: - [bitnami/redis](https://artifacthub.io/packages/helm/bitnami/redis) - [bitnami/postgresql](https://artifacthub.io/packages/helm/bitnami/postgresql) More info down below on some important overrides you might need. 4. Install and run ```sh helm upgrade --install --values my-values.yaml superset superset/superset ``` You should see various pods popping up, such as: ```sh kubectl get pods NAME READY STATUS RESTARTS AGE superset-celerybeat-7cdcc9575f-k6xmc 1/1 Running 0 119s superset-f5c9c667-dw9lp 1/1 Running 0 4m7s superset-f5c9c667-fk8bk 1/1 Running 0 4m11s superset-init-db-zlm9z 0/1 Completed 0 111s superset-postgresql-0 1/1 Running 0 6d20h superset-redis-master-0 1/1 Running 0 6d20h superset-worker-75b48bbcc-jmmjr 1/1 Running 0 4m8s superset-worker-75b48bbcc-qrq49 1/1 Running 0 4m12s ``` The exact list will depend on some of your specific configuration overrides but you should generally expect: - N `superset-xxxx-yyyy` and `superset-worker-xxxx-yyyy` pods (depending on your `supersetNode.replicaCount` and `supersetWorker.replicaCount` values) - 1 `superset-postgresql-0` depending on your postgres settings - 1 `superset-redis-master-0` depending on your redis settings - 1 `superset-celerybeat-xxxx-yyyy` pod if you have `supersetCeleryBeat.enabled = true` in your values overrides 1. Access it The chart will publish appropriate services to expose the Superset UI internally within your k8s cluster. To access it externally you will have to either: - Configure the Service as a `LoadBalancer` or `NodePort` - Set up an `Ingress` for it - the chart includes a definition, but will need to be tuned to your needs (hostname, tls, annotations etc...) - Run `kubectl port-forward superset-xxxx-yyyy :8088` to directly tunnel one pod's port into your localhost Depending how you configured external access, the URL will vary. Once you've identified the appropriate URL you can log in with: - user: `admin` - password: `admin` ## Important settings ### Security settings Default security settings and passwords are included but you **MUST** update them to run `prod` instances, in particular: ```yaml postgresql: postgresqlPassword: superset ``` Make sure, you set a unique strong complex alphanumeric string for your SECRET_KEY and use a tool to help you generate a sufficiently random sequence. - To generate a good key you can run, `openssl rand -base64 42` ```yaml configOverrides: secret: | SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY' ``` If you want to change the previous secret key then you should rotate the keys. Default secret key for kubernetes deployment is `thisISaSECRET_1234` ```yaml configOverrides: my_override: | PREVIOUS_SECRET_KEY = 'YOUR_PREVIOUS_SECRET_KEY' SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY' init: command: - /bin/sh - -c - | . {{ .Values.configMountPath }}/superset_bootstrap.sh superset re-encrypt-secrets . {{ .Values.configMountPath }}/superset_init.sh ``` :::note Superset uses [Scarf Gateway](https://about.scarf.sh/scarf-gateway) to collect telemetry data. Knowing the installation counts for different Superset versions informs the project's decisions about patching and long-term support. Scarf purges personally identifiable information (PII) and provides only aggregated statistics. To opt-out of this data collection in your Helm-based installation, edit the `repository:` line in your `helm/superset/values.yaml` file, replacing `apachesuperset.docker.scarf.sh/apache/superset` with `apache/superset` to pull the image directly from Docker Hub. ::: ### Dependencies Install additional packages and do any other bootstrap configuration in the bootstrap script. For production clusters it's recommended to build own image with this step done in CI. :::note Superset requires a Python DB-API database driver and a SQLAlchemy dialect to be installed for each datastore you want to connect to. See [Install Database Drivers](/docs/configuration/databases) for more information. It is recommended that you refer to versions listed in [pyproject.toml](https://github.com/apache/superset/blob/master/pyproject.toml) instead of hard-coding them in your bootstrap script, as seen below. ::: The following example installs the drivers for BigQuery and Elasticsearch, allowing you to connect to these data sources within your Superset setup: ```yaml bootstrapScript: | #!/bin/bash uv pip install .[postgres] \ .[bigquery] \ .[elasticsearch] &&\ if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi ``` ### superset_config.py The default `superset_config.py` is fairly minimal and you will very likely need to extend it. This is done by specifying one or more key/value entries in `configOverrides`, e.g.: ```yaml configOverrides: my_override: | # This will make sure the redirect_uri is properly computed, even with SSL offloading ENABLE_PROXY_FIX = True FEATURE_FLAGS = { "DYNAMIC_PLUGINS": True } ``` Those will be evaluated as Helm templates and therefore will be able to reference other `values.yaml` variables e.g. `{{ .Values.ingress.hosts[0] }}` will resolve to your ingress external domain. The entire `superset_config.py` will be installed as a secret, so it is safe to pass sensitive parameters directly... however it might be more readable to use secret env variables for that. Full python files can be provided by running `helm upgrade --install --values my-values.yaml --set-file configOverrides.oauth=set_oauth.py` ### Environment Variables Those can be passed as key/values either with `extraEnv` or `extraSecretEnv` if they're sensitive. They can then be referenced from `superset_config.py` using e.g. `os.environ.get("VAR")`. ```yaml extraEnv: SMTP_HOST: smtp.gmail.com SMTP_USER: user@gmail.com SMTP_PORT: "587" SMTP_MAIL_FROM: user@gmail.com extraSecretEnv: SMTP_PASSWORD: xxxx configOverrides: smtp: | import ast SMTP_HOST = os.getenv("SMTP_HOST","localhost") SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True")) SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False")) SMTP_USER = os.getenv("SMTP_USER","superset") SMTP_PORT = os.getenv("SMTP_PORT",25) SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset") ``` ### System packages If new system packages are required, they can be installed before application startup by overriding the container's `command`, e.g.: ```yaml supersetWorker: command: - /bin/sh - -c - | apt update apt install -y somepackage apt autoremove -yqq --purge apt clean # Run celery worker . {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker ``` ### Data sources Data source definitions can be automatically declared by providing key/value yaml definitions in `extraConfigs`: ```yaml extraConfigs: import_datasources.yaml: | databases: - allow_file_upload: true allow_ctas: true allow_cvas: true database_name: example-db extra: "{\r\n \"metadata_params\": {},\r\n \"engine_params\": {},\r\n \"\ metadata_cache_timeout\": {},\r\n \"schemas_allowed_for_file_upload\": []\r\n\ }" sqlalchemy_uri: example://example-db.local tables: [] ``` Those will also be mounted as secrets and can include sensitive parameters. ## Configuration Examples ### Setting up OAuth :::note OAuth setup requires that the [authlib](https://authlib.org/) Python library is installed. This can be done using `pip` by updating the `bootstrapScript`. See the [Dependencies](#dependencies) section for more information. ::: ```yaml extraEnv: AUTH_DOMAIN: example.com extraSecretEnv: GOOGLE_KEY: xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com GOOGLE_SECRET: xxxxxxxxxxxxxxxxxxxxxxxx configOverrides: enable_oauth: | # This will make sure the redirect_uri is properly computed, even with SSL offloading ENABLE_PROXY_FIX = True from flask_appbuilder.security.manager import AUTH_OAUTH AUTH_TYPE = AUTH_OAUTH OAUTH_PROVIDERS = [ { "name": "google", "icon": "fa-google", "token_key": "access_token", "remote_app": { "client_id": os.getenv("GOOGLE_KEY"), "client_secret": os.getenv("GOOGLE_SECRET"), "api_base_url": "https://www.googleapis.com/oauth2/v2/", "client_kwargs": {"scope": "email profile"}, "request_token_url": None, "access_token_url": "https://accounts.google.com/o/oauth2/token", "authorize_url": "https://accounts.google.com/o/oauth2/auth", "authorize_params": {"hd": os.getenv("AUTH_DOMAIN", "")} }, } ] # Map Authlib roles to superset roles AUTH_ROLE_ADMIN = 'Admin' AUTH_ROLE_PUBLIC = 'Public' # Will allow user self registration, allowing to create Flask users from Authorized User AUTH_USER_REGISTRATION = True # The default user self registration role AUTH_USER_REGISTRATION_ROLE = "Admin" ``` ### Enable Alerts and Reports For this, as per the [Alerts and Reports doc](/docs/configuration/alerts-reports), you will need to: #### Install a supported webdriver in the Celery worker This is done either by using a custom image that has the webdriver pre-installed, or installing at startup time by overriding the `command`. Here's a working example for `chromedriver`: ```yaml supersetWorker: command: - /bin/sh - -c - | # Install chrome webdriver # See https://github.com/apache/superset/blob/4fa3b6c7185629b87c27fc2c0e5435d458f7b73d/docs/src/pages/docs/installation/email_reports.mdx apt-get update apt-get install -y wget wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb apt-get install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb wget https://chromedriver.storage.googleapis.com/88.0.4324.96/chromedriver_linux64.zip apt-get install -y zip unzip chromedriver_linux64.zip chmod +x chromedriver mv chromedriver /usr/bin apt-get autoremove -yqq --purge apt-get clean rm -f google-chrome-stable_current_amd64.deb chromedriver_linux64.zip # Run . {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker ``` #### Run the Celery beat This pod will trigger the scheduled tasks configured in the alerts and reports UI section: ```yaml supersetCeleryBeat: enabled: true ``` #### Configure the appropriate Celery jobs and SMTP/Slack settings ```yaml extraEnv: SMTP_HOST: smtp.gmail.com SMTP_USER: user@gmail.com SMTP_PORT: "587" SMTP_MAIL_FROM: user@gmail.com extraSecretEnv: SLACK_API_TOKEN: xoxb-xxxx-yyyy SMTP_PASSWORD: xxxx-yyyy configOverrides: feature_flags: | import ast FEATURE_FLAGS = { "ALERT_REPORTS": True } SMTP_HOST = os.getenv("SMTP_HOST","localhost") SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True")) SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False")) SMTP_USER = os.getenv("SMTP_USER","superset") SMTP_PORT = os.getenv("SMTP_PORT",25) SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset") SMTP_MAIL_FROM = os.getenv("SMTP_MAIL_FROM","superset@superset.com") SLACK_API_TOKEN = os.getenv("SLACK_API_TOKEN",None) celery_conf: | from celery.schedules import crontab class CeleryConfig: broker_url = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0" imports = ( "superset.sql_lab", "superset.tasks.cache", "superset.tasks.scheduler", ) result_backend = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0" task_annotations = { "sql_lab.get_sql_results": { "rate_limit": "100/s", }, } beat_schedule = { "reports.scheduler": { "task": "reports.scheduler", "schedule": crontab(minute="*", hour="*"), }, "reports.prune_log": { "task": "reports.prune_log", 'schedule': crontab(minute=0, hour=0), }, 'cache-warmup-hourly': { "task": "cache-warmup", "schedule": crontab(minute="*/30", hour="*"), "kwargs": { "strategy_name": "top_n_dashboards", "top_n": 10, "since": "7 days ago", }, } } CELERY_CONFIG = CeleryConfig reports: | EMAIL_PAGE_RENDER_WAIT = 60 WEBDRIVER_BASEURL = "http://{{ template "superset.fullname" . }}:{{ .Values.service.port }}/" WEBDRIVER_BASEURL_USER_FRIENDLY = "https://www.example.com/" WEBDRIVER_TYPE= "chrome" WEBDRIVER_OPTION_ARGS = [ "--force-device-scale-factor=2.0", "--high-dpi-support=2.0", "--headless", "--disable-gpu", "--disable-dev-shm-usage", # This is required because our process runs as root (in order to install pip packages) "--no-sandbox", "--disable-setuid-sandbox", "--disable-extensions", ] ``` ### Load the Examples data and dashboards If you are trying Superset out and want some data and dashboards to explore, you can load some examples by creating a `my_values.yaml` and deploying it as described above in the **Configure your setting overrides** step of the **Running** section. To load the examples, add the following to the `my_values.yaml` file: ```yaml init: loadExamples: true ```