mirror of
https://github.com/apache/superset.git
synced 2026-04-17 23:25:05 +00:00
feat(examples): Modernize example data loading with Parquet and YAML configs (#36538)
Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -350,6 +350,12 @@ superset init
|
||||
# Note: you MUST have previously created an admin user with the username `admin` for this command to work.
|
||||
superset load-examples
|
||||
|
||||
# The load-examples command supports various options:
|
||||
# --force / -f Force reload data even if tables exist
|
||||
# --only-metadata / -m Only create table metadata without loading data (fast setup)
|
||||
# --load-test-data / -t Load additional test dashboards and datasets
|
||||
# --load-big-data / -b Generate synthetic data for stress testing (wide tables, many tables)
|
||||
|
||||
# Start the Flask dev web server from inside your virtualenv.
|
||||
# Note that your page may not have CSS at this point.
|
||||
# See instructions below on how to build the front-end assets.
|
||||
@@ -692,6 +698,97 @@ secrets.
|
||||
|
||||
---
|
||||
|
||||
## Example Data and Test Loaders
|
||||
|
||||
### Example Datasets
|
||||
|
||||
Superset includes example datasets stored as Parquet files, organized by example name in the `superset/examples/` directory. Each example is self-contained:
|
||||
|
||||
```
|
||||
superset/examples/
|
||||
├── _shared/ # Shared configuration
|
||||
│ ├── database.yaml # Database connection config
|
||||
│ └── metadata.yaml # Import metadata
|
||||
├── birth_names/ # Example: US Birth Names
|
||||
│ ├── data.parquet # Dataset (compressed columnar)
|
||||
│ ├── dataset.yaml # Dataset metadata
|
||||
│ ├── dashboard.yaml # Dashboard configuration (optional)
|
||||
│ └── charts/ # Chart configurations (optional)
|
||||
│ ├── Boys.yaml
|
||||
│ ├── Girls.yaml
|
||||
│ └── ...
|
||||
├── energy_usage/ # Example: Energy Sankey
|
||||
│ ├── data.parquet
|
||||
│ ├── dataset.yaml
|
||||
│ └── charts/
|
||||
└── ... (27 example directories)
|
||||
```
|
||||
|
||||
#### Adding a New Example Dataset
|
||||
|
||||
**Simple dataset (data only):**
|
||||
|
||||
1. Create a directory: `superset/examples/my_dataset/`
|
||||
2. Add your data as `data.parquet`:
|
||||
```python
|
||||
import pandas as pd
|
||||
df = pd.read_csv("your_data.csv")
|
||||
df.to_parquet("superset/examples/my_dataset/data.parquet", compression="snappy")
|
||||
```
|
||||
3. The dataset will be auto-discovered when running `superset load-examples`
|
||||
|
||||
**Complete example with dashboard:**
|
||||
|
||||
1. Create your dataset directory with `data.parquet`
|
||||
2. Add `dataset.yaml` with metadata (columns, metrics, etc.)
|
||||
3. Add `dashboard.yaml` with dashboard layout
|
||||
4. Add chart configs in `charts/` directory
|
||||
5. See existing examples like `birth_names/` for reference
|
||||
|
||||
#### Exporting an Existing Dashboard
|
||||
|
||||
To export a dashboard and its charts as YAML configs:
|
||||
|
||||
1. In Superset, go to the dashboard you want to export
|
||||
2. Click the "..." menu → "Export"
|
||||
3. Unzip the exported file
|
||||
4. Copy the YAML files to your example directory
|
||||
5. Add the `data.parquet` file
|
||||
|
||||
#### Why Parquet?
|
||||
|
||||
- **Apache-friendly**: Parquet is an Apache project, ideal for ASF codebases
|
||||
- **Compressed**: Built-in Snappy compression (~27% smaller than CSV)
|
||||
- **Self-describing**: Schema is embedded in the file
|
||||
- **Widely supported**: Works with pandas, pyarrow, DuckDB, Spark, etc.
|
||||
|
||||
### Test Data Generation
|
||||
|
||||
For stress testing and development, Superset includes special test data generators that create synthetic data:
|
||||
|
||||
#### Big Data Loader (`--load-big-data`)
|
||||
|
||||
Located in `superset/cli/test_loaders.py`, this generates:
|
||||
|
||||
- **Wide Table** (`wide_table`): 100 columns of mixed types, 1000 rows
|
||||
- **Many Small Tables** (`small_table_0` through `small_table_999`): 1000 tables for testing catalog performance
|
||||
- **Long Name Table**: Table with 60-character random name for testing UI edge cases
|
||||
|
||||
This is primarily used for:
|
||||
- Performance testing with extreme data shapes
|
||||
- UI edge case validation
|
||||
- Database catalog stress testing
|
||||
- CI/CD pipeline validation
|
||||
|
||||
#### Test Dashboards (`--load-test-data`)
|
||||
|
||||
Loads additional test-specific content:
|
||||
- Tabbed dashboard example
|
||||
- Supported charts dashboard
|
||||
- Test configuration files (*.test.yaml)
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### Python Testing
|
||||
|
||||
Reference in New Issue
Block a user