feat(examples): Modernize example data loading with Parquet and YAML configs (#36538)

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
Evan Rusackas
2026-01-21 12:42:15 -08:00
committed by GitHub
parent ec36791551
commit dee063a4c5
271 changed files with 23340 additions and 12971 deletions

View File

@@ -24,6 +24,41 @@ assists people when migrating to a new version.
## Next
### Example Data Loading Improvements
#### New Directory Structure
Examples are now organized by name with data and configs co-located:
```
superset/examples/
├── _shared/ # Shared database & metadata configs
├── birth_names/ # Each example is self-contained
│ ├── data.parquet # Dataset (Parquet format)
│ ├── dataset.yaml # Dataset metadata
│ ├── dashboard.yaml # Dashboard config (optional)
│ └── charts/ # Chart configs (optional)
└── ...
```
#### Simplified Parquet-based Loading
- Auto-discovery: create `superset/examples/my_dataset/data.parquet` to add a new example
- Parquet is an Apache project format: compressed (~27% smaller), self-describing schema
- YAML configs define datasets, charts, and dashboards declaratively
- Removed Python-based data generation from individual example files
#### Test Data Reorganization
- Moved `big_data.py` to `superset/cli/test_loaders.py` - better reflects its purpose as a test utility
- Fixed inverted logic for `--load-test-data` flag (now correctly includes .test.yaml files when flag is set)
- Clarified CLI flags:
- `--force` / `-f`: Force reload even if tables exist
- `--only-metadata` / `-m`: Create table metadata without loading data
- `--load-test-data` / `-t`: Include test dashboards and .test.yaml configs
- `--load-big-data` / `-b`: Generate synthetic stress-test data
#### Bug Fixes
- Fixed numpy array serialization for PostgreSQL (converts complex types to JSON strings)
- Fixed KeyError for `allow_csv_upload` field in database configs (now optional with default)
- Fixed test data loading logic that was incorrectly filtering files
### MCP Service
The MCP (Model Context Protocol) service enables AI assistants and automation tools to interact programmatically with Superset.