mirror of
https://github.com/apache/superset.git
synced 2026-05-22 00:05:15 +00:00
Implements the fifth and final transform from the notebook audit.
A composite combines a base country's Admin 1 features with:
- base_repositions (with optional `group: true` for grouped transforms
like Paris + petite couronne treated as one body)
- additions (features pulled from sibling countries' Admin 1, with
optional dissolve, drop_parts, reposition, and attribute set)
Verified on France-with-Overseas:
france_overseas: 108 features → composite_france_overseas_ukr.geo.json
(322,058 bytes)
108 = 101 FRA admin1 departments + 7 additions (Polynésie française,
Terres australes et antarctiques françaises, Wallis-et-Futuna,
Nouvelle-Calédonie, Saint-Pierre-et-Miquelon, Saint-Martin,
Saint-Barthélémy).
Bug fix during implementation: composites pull additions from Admin 1
of sibling countries (Windward Islands is a PYF Admin 1 subdivision,
not an Admin 0 country), not from Admin 0. Initial implementation got
this wrong and warned 0 features. Fixed by sourcing from base_admin1
(the global Admin 1 dataset, which contains all countries'
subdivisions).
New helpers:
- _drop_parts(geom, indices) — drop sub-polygon indices from MultiPolygon
- _translate_and_scale_with_pivot — explicit pivot (vs feature centroid),
used for `group: true` transforms
==== Build pipeline status ====
All 5 declarative transforms implemented and verified:
✓ name_overrides (19 updates per Admin 1 build)
✓ flying_islands (12 reposition + 5 bbox drop)
✓ territory_assignments (4 features added: TWN/HKG/MAC/ALD)
✓ regional_aggregations (4 region sets: TUR/FRA/ITA/PHL)
✓ composite_maps (1 composite: france_overseas)
Current outputs (UA worldview):
ukr_admin0.geo.json 2.1 MB 249 features
ukr_admin1.geo.json 15 MB 4595 features
regional_TUR_nuts_1_ukr.geo.json 23 KB 12 regions
regional_FRA_regions_ukr.geo.json 32 KB 18 regions
regional_ITA_regions_ukr.geo.json 32 KB 20 regions
regional_PHL_regions_ukr.geo.json 32 KB 17 regions
composite_france_overseas_ukr.geo.json 322 KB 108 features
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Country Map data pipeline
This directory contains the build pipeline that turns upstream Natural Earth data into the GeoJSON files consumed by @superset-ui/plugin-chart-country-map.
It replaces the legacy scripts/Country Map GeoJSON Generator.ipynb notebook. See SIP_DRAFT.md in the parent directory for the full design rationale.
Layout
scripts/
build.sh # one-shot reproducible build
README.md # this file
config/ # declarative YAML — handles ~95% of fixes
name_overrides.yaml # typos, deprecated ISO codes, admin renames
flying_islands.yaml # repositioning + bbox drops for far-flung territories
territory_assignments.yaml # add features from sibling Admin 0 records
regional_aggregations.yaml # dissolve Admin 1 into administrative regions
composite_maps.yaml # multi-country composites (e.g. France-with-Overseas)
procedural/ # escape hatch — handles the rare 5%
README.md # when to use, when not
NN_<descriptive_name>.py # one focused script per genuine edge case
output/ # gitignored — build artifacts
Operating principles
- Default tool: declarative YAML. Most touchups are renames, repositions, dissolves, or filters — all expressible in YAML. Diffs are small, conflicts localize cleanly to one entry, contributors can submit "fix typo X" as a one-line PR.
- Escape hatch:
procedural/directory of small, named, single-purpose Python scripts for the rare cases YAML can't express cleanly. Each script has a header comment explaining why it's not in YAML. Seeprocedural/README.mdfor the bar. - Build is reproducible from a pinned NE version.
build.shrecords the NE git SHA it consumed; outputs are deterministic given inputs. - CI regenerates on schema change and opens a PR if outputs differ. Maintainers review the cartographic diff in legible GeoJSON, not opaque notebook JSON.
Workflow for adding a fix
- Identify the upstream NE issue (wrong name, missing territory, etc.).
- Try YAML first. Add the smallest possible entry to the appropriate config file with a
descriptionfield explaining the fix. - If YAML can't express it cleanly, add a numbered script in
procedural/with a header comment explaining why YAML didn't fit. - Run
build.shlocally, verify the output GeoJSON looks right. - Open PR. Reviewer sees the YAML diff (or new procedural script) plus the regenerated GeoJSON.
See also
SIP_DRAFT.md(parent dir) — design rationale, notebook audit, obsolescence checkprocedural/README.md— when to use the escape hatch