mirror of
https://github.com/apache/superset.git
synced 2026-05-22 00:05:15 +00:00
First-pass schemas for the build pipeline's declarative config layer.
Each schema is documented inline + populated with concrete entries
ported from the legacy notebook's audited touchups (those that the
obsolescence check determined still need to ship).
scripts/
├── README.md — pipeline overview, layout, workflow
├── config/
│ ├── name_overrides.yaml — France typos, ISO codes; PHL renames
│ ├── flying_islands.yaml — USA/NOR/PRT/ESP/FRA repositions; NLD/GBR drops
│ ├── territory_assignments.yaml — China + SARs; Finland + Åland
│ ├── regional_aggregations.yaml — Turkey NUTS-1; FRA/ITA/PHL regions
│ └── composite_maps.yaml — France-with-Overseas
└── procedural/
└── README.md — escape-hatch rules + skeleton (currently empty)
All five YAML files parse cleanly (validated with PyYAML).
Schema design choices:
- Every entry has a `description:` field. Forces honest documentation
of why each fix exists; reviewers can scan rationale at a glance.
- Match semantics: simple AND-of-conditions; supports `{ in: [...] }`
for value-set matching.
- composite_maps and territory_assignments share the "pull feature
from sibling Admin 0" primitive; build script can implement once.
- composite_maps.yaml has a TODO marker for SPM offsets — notebook
cell 63 was truncated in the audit; will backfill during build
script implementation.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2.8 KiB
2.8 KiB
Procedural escape hatch
Small, named, single-purpose Python scripts for the rare cases where declarative YAML in ../config/ can't cleanly express a fix.
When to put a script here
Use this directory when all of the following are true:
- You've tried to express the fix in YAML and the resulting schema is awkward, ambiguous, or requires a one-off type to be added
- The fix is small (typically <50 lines of code, single conceptual operation)
- The fix is tied to a specific feature in the data (not a generalizable transform)
When NOT to put a script here
If any of the following apply, the fix belongs in ../config/ instead:
- It's a typo, rename, or attribute correction →
name_overrides.yaml - It's a reposition or bbox drop of a known territory →
flying_islands.yaml - It's adding a feature from another country →
territory_assignments.yaml - It's dissolving Admin 1 into a coarser admin level →
regional_aggregations.yaml - It's a multi-country composite →
composite_maps.yaml
If the same kind of operation surfaces here twice, that's a signal to extend a YAML schema rather than ship a third script.
Script conventions
- Filename:
NN_<descriptive_snake_case>.py. The numeric prefix sets execution order; the name documents intent. - Header comment: required. Must explain what the script does AND why this couldn't be expressed in YAML. If the "why" is weak, push it back into YAML.
- Interface: each script defines
def apply(geo: dict) -> dicttaking a parsed GeoJSON FeatureCollection and returning the modified one. The build orchestrator handles I/O. - No side effects other than the returned data — no network calls, no file writes, no
printother than logging viasys.stderr. - Pure function over GeoJSON. Don't import shapely/geopandas unless the operation truly needs polygon math; many fixes are just attribute mutations.
Skeleton
"""
NN_descriptive_name.py
======================
WHAT: One-sentence summary of what this script does to the data.
WHY: One-paragraph explanation of why this couldn't be expressed in
../config/<some_yaml>.yaml. If you find yourself writing
"because I didn't want to add a field to the schema", push the
fix into the YAML schema instead.
UPSTREAM TRACKING: link to NE issue / community discussion / blog post
explaining the underlying source of the problem, so future
maintainers can re-evaluate when upstream catches up.
"""
import sys
def apply(geo: dict) -> dict:
# ... mutate features ...
return geo
Currently empty
There are no procedural scripts yet. The audit suggested the France-with-Overseas Windward Islands sub-polygon drop might warrant one, but composite_maps.yaml already has a drop_parts field that covers it. We'll add scripts here only if/when a genuine edge case proves YAML can't express it.