mirror of
https://github.com/apache/superset.git
synced 2026-05-22 00:05:15 +00:00
First-pass schemas for the build pipeline's declarative config layer.
Each schema is documented inline + populated with concrete entries
ported from the legacy notebook's audited touchups (those that the
obsolescence check determined still need to ship).
scripts/
├── README.md — pipeline overview, layout, workflow
├── config/
│ ├── name_overrides.yaml — France typos, ISO codes; PHL renames
│ ├── flying_islands.yaml — USA/NOR/PRT/ESP/FRA repositions; NLD/GBR drops
│ ├── territory_assignments.yaml — China + SARs; Finland + Åland
│ ├── regional_aggregations.yaml — Turkey NUTS-1; FRA/ITA/PHL regions
│ └── composite_maps.yaml — France-with-Overseas
└── procedural/
└── README.md — escape-hatch rules + skeleton (currently empty)
All five YAML files parse cleanly (validated with PyYAML).
Schema design choices:
- Every entry has a `description:` field. Forces honest documentation
of why each fix exists; reviewers can scan rationale at a glance.
- Match semantics: simple AND-of-conditions; supports `{ in: [...] }`
for value-set matching.
- composite_maps and territory_assignments share the "pull feature
from sibling Admin 0" primitive; build script can implement once.
- composite_maps.yaml has a TODO marker for SPM offsets — notebook
cell 63 was truncated in the audit; will backfill during build
script implementation.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
82 lines
3.7 KiB
YAML
82 lines
3.7 KiB
YAML
# Per-feature attribute corrections to Natural Earth data.
|
|
#
|
|
# Use when NE has a wrong value for a specific feature: typos, outdated
|
|
# administrative names, deprecated ISO codes, etc.
|
|
# For one-off geometry fixes, use procedural/ scripts instead.
|
|
#
|
|
# Schema:
|
|
# overrides:
|
|
# - description: Human-readable why this override exists (REQUIRED)
|
|
# match:
|
|
# adm0_a3: <ISO3 country code> # required: scope to one country
|
|
# <field>: <value> # one or more match conditions
|
|
# set:
|
|
# <field>: <value> # one or more fields to set
|
|
# [...]
|
|
#
|
|
# Match semantics: ALL conditions must match (logical AND). Apply to
|
|
# both Admin 0 and Admin 1 features unless scope is restricted further.
|
|
#
|
|
# Tracking: each override should be revisited periodically against
|
|
# upstream NE — many of these become obsolete when NE catches up.
|
|
|
|
overrides:
|
|
# -------------------------------------------------------------------
|
|
# France — typos in NE attribute table (NE 5.x still ships these)
|
|
# -------------------------------------------------------------------
|
|
- description: Fix typo "Seien-et-Marne" → "Seine-et-Marne"
|
|
match: { adm0_a3: FRA, name: "Seien-et-Marne" }
|
|
set: { name: "Seine-et-Marne" }
|
|
|
|
- description: Fix typo "Haute-Rhin" → "Haut-Rhin"
|
|
match: { adm0_a3: FRA, name: "Haute-Rhin" }
|
|
set: { name: "Haut-Rhin" }
|
|
|
|
# -------------------------------------------------------------------
|
|
# France — update ISO 3166-2 codes to current values
|
|
# NE still uses pre-2016 region codes; map them to current standard.
|
|
# -------------------------------------------------------------------
|
|
- description: Paris uses ISO 3166-2 code FR-75C as of 2016 (NE has FR-75)
|
|
match: { adm0_a3: FRA, iso_3166_2: "FR-75" }
|
|
set: { iso_3166_2: "FR-75C" }
|
|
|
|
- description: Guadeloupe is FR-971 in current ISO (NE has FR-GP)
|
|
match: { adm0_a3: FRA, iso_3166_2: "FR-GP" }
|
|
set: { iso_3166_2: "FR-971" }
|
|
|
|
- description: Martinique is FR-972 in current ISO (NE has FR-MQ)
|
|
match: { adm0_a3: FRA, iso_3166_2: "FR-MQ" }
|
|
set: { iso_3166_2: "FR-972" }
|
|
|
|
- description: French Guiana is FR-973 in current ISO (NE has FR-GF)
|
|
match: { adm0_a3: FRA, iso_3166_2: "FR-GF" }
|
|
set: { iso_3166_2: "FR-973" }
|
|
|
|
- description: La Réunion is FR-974 in current ISO (NE has FR-RE)
|
|
match: { adm0_a3: FRA, iso_3166_2: "FR-RE" }
|
|
set: { iso_3166_2: "FR-974" }
|
|
|
|
- description: Mayotte is FR-976 in current ISO (NE has FR-YT)
|
|
match: { adm0_a3: FRA, iso_3166_2: "FR-YT" }
|
|
set: { iso_3166_2: "FR-976" }
|
|
|
|
# -------------------------------------------------------------------
|
|
# Philippines — administrative renames
|
|
# -------------------------------------------------------------------
|
|
- description: Region XIII renamed to "Caraga" in 2010 (NE still says "Dinagat Islands")
|
|
match: { adm0_a3: PHL, region: "Dinagat Islands (Region XIII)" }
|
|
set: { region: "Caraga Administrative Region (Region XIII)" }
|
|
|
|
- description: ARMM reorganized as BARMM under the Bangsamoro Organic Law (2018-2019)
|
|
match: { adm0_a3: PHL, region: "Autonomous Region in Muslim Mindanao (ARMM)" }
|
|
set: { region: "Bangsamoro Autonomous Region in Muslim Mindanao (BARMM)" }
|
|
|
|
# -------------------------------------------------------------------
|
|
# NOT included here — handled by other mechanisms:
|
|
# - Vietnam diacritics → use NE's NAME_VI field via name_language=vi
|
|
# - Crimea/Sevastopol → handled by NE _ukr worldview selection
|
|
# - China + SARs → see territory_assignments.yaml
|
|
# - Finland + Åland → see territory_assignments.yaml
|
|
# - France-with-Overseas → see composite_maps.yaml
|
|
# -------------------------------------------------------------------
|