Files
superset2/superset-frontend/plugins/plugin-chart-country-map/scripts/README.md
Evan Rusackas 1eb48e94fc feat(country-map): scaffold scripts/ dir with YAML config schemas
First-pass schemas for the build pipeline's declarative config layer.
Each schema is documented inline + populated with concrete entries
ported from the legacy notebook's audited touchups (those that the
obsolescence check determined still need to ship).

scripts/
├── README.md                 — pipeline overview, layout, workflow
├── config/
│   ├── name_overrides.yaml         — France typos, ISO codes; PHL renames
│   ├── flying_islands.yaml         — USA/NOR/PRT/ESP/FRA repositions; NLD/GBR drops
│   ├── territory_assignments.yaml  — China + SARs; Finland + Åland
│   ├── regional_aggregations.yaml  — Turkey NUTS-1; FRA/ITA/PHL regions
│   └── composite_maps.yaml         — France-with-Overseas
└── procedural/
    └── README.md             — escape-hatch rules + skeleton (currently empty)

All five YAML files parse cleanly (validated with PyYAML).

Schema design choices:
- Every entry has a `description:` field. Forces honest documentation
  of why each fix exists; reviewers can scan rationale at a glance.
- Match semantics: simple AND-of-conditions; supports `{ in: [...] }`
  for value-set matching.
- composite_maps and territory_assignments share the "pull feature
  from sibling Admin 0" primitive; build script can implement once.
- composite_maps.yaml has a TODO marker for SPM offsets — notebook
  cell 63 was truncated in the audit; will backfill during build
  script implementation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 15:56:04 -07:00

44 lines
2.6 KiB
Markdown

# Country Map data pipeline
This directory contains the build pipeline that turns upstream Natural Earth data into the GeoJSON files consumed by `@superset-ui/plugin-chart-country-map`.
It replaces the legacy `scripts/Country Map GeoJSON Generator.ipynb` notebook. See `SIP_DRAFT.md` in the parent directory for the full design rationale.
## Layout
```
scripts/
build.sh # one-shot reproducible build
README.md # this file
config/ # declarative YAML — handles ~95% of fixes
name_overrides.yaml # typos, deprecated ISO codes, admin renames
flying_islands.yaml # repositioning + bbox drops for far-flung territories
territory_assignments.yaml # add features from sibling Admin 0 records
regional_aggregations.yaml # dissolve Admin 1 into administrative regions
composite_maps.yaml # multi-country composites (e.g. France-with-Overseas)
procedural/ # escape hatch — handles the rare 5%
README.md # when to use, when not
NN_<descriptive_name>.py # one focused script per genuine edge case
output/ # gitignored — build artifacts
```
## Operating principles
- **Default tool: declarative YAML.** Most touchups are renames, repositions, dissolves, or filters — all expressible in YAML. Diffs are small, conflicts localize cleanly to one entry, contributors can submit "fix typo X" as a one-line PR.
- **Escape hatch: `procedural/` directory** of small, named, single-purpose Python scripts for the rare cases YAML can't express cleanly. Each script has a header comment explaining *why* it's not in YAML. See `procedural/README.md` for the bar.
- **Build is reproducible from a pinned NE version.** `build.sh` records the NE git SHA it consumed; outputs are deterministic given inputs.
- **CI regenerates on schema change** and opens a PR if outputs differ. Maintainers review the cartographic diff in legible GeoJSON, not opaque notebook JSON.
## Workflow for adding a fix
1. Identify the upstream NE issue (wrong name, missing territory, etc.).
2. **Try YAML first.** Add the smallest possible entry to the appropriate config file with a `description` field explaining the fix.
3. If YAML can't express it cleanly, add a numbered script in `procedural/` with a header comment explaining why YAML didn't fit.
4. Run `build.sh` locally, verify the output GeoJSON looks right.
5. Open PR. Reviewer sees the YAML diff (or new procedural script) plus the regenerated GeoJSON.
## See also
- `SIP_DRAFT.md` (parent dir) — design rationale, notebook audit, obsolescence check
- `procedural/README.md` — when to use the escape hatch