feat: migrate examples from Python to YAML format with enhanced CLI

Migrates Superset's example data system from Python-based scripts to YAML configuration files, providing a cleaner, more maintainable approach to managing example datasets, charts, and dashboards.

- Converted 9 Python example modules to YAML configurations
- Exported existing examples from database and added as YAML files:
  - 11 dashboards (USA Births Names, World Bank's Data, etc.)
  - 115 charts
  - 25 datasets
- Moved test-specific fixtures to `tests/fixtures/examples/`
- Removed theme_id from dashboard exports for compatibility

- **New command group**: `superset examples` with subcommands:
  - `load` - Load example data (replaces `load-examples`)
  - `clear-old` - Remove old Python-based examples
  - `clear` - Placeholder for future YAML clearing
  - `reload` - Clear and reload in one command
- **Backwards compatibility**: `superset load-examples` still works with deprecation warning
- **Safety mechanism**: Detects old examples and preserves them to avoid data loss

- Fixed JSON data loading - examples can now load `.json.gz` files from CDN
- Fixed Docker compose configuration for isolated development
- Fixed webpack WebSocket configuration for different ports

- Import operations now log what's being created vs updated:
  - "Creating new dashboard: Sales Dashboard"
  - "Updating existing chart: World's Population"
- Provides clear visibility into the import process

- Moved import logging to individual import functions (DRY principle)
- Non-destructive migration approach - no user data is deleted
- Deterministic UUID generation for consistent example data

- Tested migration from old Python examples to new YAML format
- Verified safety mechanism prevents accidental data overwrites
- Confirmed backwards compatibility with deprecated command
- All pre-commit checks pass

- Updated installation docs to use new CLI commands
- Added deprecation notice to UPDATING.md
- Updated development documentation

None - the old `load-examples` command continues to work with a deprecation warning.

For users with existing Python-based examples:
1. Run `superset examples clear-old --confirm` to remove old examples
2. Run `superset examples load` to load new YAML-based examples
This commit is contained in:
Maxime Beauchemin
2025-07-27 13:34:19 -07:00
parent 6006a21378
commit 48d8c91b19
92 changed files with 8297 additions and 2831 deletions

View File

@@ -20,14 +20,15 @@ import pytest
from superset import app, db # noqa: F401
from superset.common.db_query_status import QueryStatus
from superset.connectors.sqla.models import SqlaTable
from superset.extensions import cache_manager
from superset.utils import json
from tests.integration_tests.base_tests import SupersetTestCase
from tests.integration_tests.constants import ADMIN_USERNAME
from tests.integration_tests.fixtures.birth_names_dashboard import (
load_birth_names_dashboard_with_slices, # noqa: F401
load_birth_names_data, # noqa: F401
)
from tests.integration_tests.fixtures.query_context import get_query_context
class TestCache(SupersetTestCase):
@@ -47,11 +48,10 @@ class TestCache(SupersetTestCase):
app.config["DATA_CACHE_CONFIG"] = {"CACHE_TYPE": "NullCache"}
cache_manager.init_app(app)
slc = self.get_slice("Pivot Table v2")
slc = self.get_slice("Genders")
# Get chart metadata
metadata = self.get_json_resp(f"api/v1/chart/{slc.id}")
query_context = json.loads(metadata.get("result").get("query_context"))
# Get query context using the fixture
query_context = get_query_context("birth_names")
query_context["form_data"] = slc.form_data
# Request chart for the first time
@@ -83,11 +83,16 @@ class TestCache(SupersetTestCase):
}
cache_manager.init_app(app)
slc = self.get_slice("Pivot Table v2")
slc = self.get_slice("Genders")
# Get chart metadata
metadata = self.get_json_resp(f"api/v1/chart/{slc.id}")
query_context = json.loads(metadata.get("result").get("query_context"))
# Clear the datasource cache timeout to test fallback to DATA_CACHE_CONFIG
datasource = db.session.query(SqlaTable).filter_by(id=slc.datasource_id).one()
original_cache_timeout = datasource.cache_timeout
datasource.cache_timeout = None
db.session.commit()
# Get query context using the fixture
query_context = get_query_context("birth_names")
query_context["form_data"] = slc.form_data
# Request chart for the first time
@@ -123,6 +128,10 @@ class TestCache(SupersetTestCase):
# should not exists in `cache`
assert cache_manager.cache.get(cached_result["cache_key"]) is None
# reset datasource cache timeout
datasource.cache_timeout = original_cache_timeout
db.session.commit()
# reset cache config
app.config["DATA_CACHE_CONFIG"] = data_cache_config
app.config["CACHE_DEFAULT_TIMEOUT"] = cache_default_timeout