Merge branch 'master' into feat/translation-backfill-tooling

fix(i18n): handle JSON-list plural responses from the model
A fresh test run on French exposed a real bug in _apply_translation: when the model returns a JSON list for a plural entry (e.g. ["form0", "form1"], which is a valid representation since plural forms are ordered), the previous code took the else branch and broadcast str(list) — Python list-repr like ['form0', 'form1'] — to every plural form. Both msgstr[0] and msgstr[1] ended up containing the same literal Python list-repr string, breaking gettext lookups for that entry. Spanish dodged it by chance (the model returned dicts that time); the failure mode is reproducible on French. Changes: - Extract _apply_plural_translation helper. Handles dict, list, scalar, and non-JSON-string responses. List path distributes forms by index and repeats the last form if the model returned fewer forms than the language requires (better than leaving slots blank, which falls back to displaying the raw English msgid). - The split also drops _apply_translation's cyclomatic complexity back below the C901 threshold. - Adds 4 regression tests covering: list response, list response round-tripped through parse_response, list shorter than required forms (last-form-repeats), and empty list (falls back to raw-string broadcast). Verified end-to-end on French: the previously-broken plural entry "Added 1 new column to the virtual dataset" / "Added %s new columns to the virtual dataset" now writes msgstr[0] and msgstr[1] correctly on a fresh run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 19:35:17 +00:00 · 2026-05-11 19:55:42 -07:00 · 2026-05-11 17:47:24 -07:00 · 2026-05-11 16:59:40 -07:00 · 2026-04-20 08:52:54 -07:00 · 2026-04-19 15:29:55 -04:00
9 changed files with 3517 additions and 1955 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -115,6 +115,8 @@ release.json
 superset/translations/**/messages.json
 # these mo binary files are generated by `pybabel compile`
 superset/translations/**/messages.mo
+# cross-language index generated by scripts/translations/build_translation_index.py
+superset/translations/translation_index.json

 docker/requirements-local.txt

--- a/docs/developer_docs/contributing/howtos.md
+++ b/docs/developer_docs/contributing/howtos.md
@@ -335,6 +335,92 @@ npm run build-translation
 pybabel compile -d superset/translations
 ```

+### Backfilling missing translations with AI
+
+For languages with many untranslated strings, the repo includes a script that
+uses Claude AI to generate draft translations for any missing entries. All
+AI-generated strings are marked `#, fuzzy` and tagged with an attribution
+comment so that human reviewers know they need to be checked before merging.
+
+#### Prerequisites
+
+```bash
+pip install -r superset/translations/requirements.txt
+```
+
+Claude Code must be installed and authenticated (`claude --version` should
+work). The script calls `claude -p` internally — no separate API key is needed.
+
+#### Step 1 — Build the translation index
+
+The index captures every already-translated string in every language and
+serves as cross-language context for the AI. Rebuild it whenever `.po` files
+change significantly:
+
+```bash
+python scripts/translations/build_translation_index.py
+# Writes: superset/translations/translation_index.json
+```
+
+#### Step 2 — Preview with a dry run
+
+Check what would be translated without writing anything:
+
+```bash
+python scripts/translations/backfill_po.py --lang fr --limit 20 --dry-run
+```
+
+Output shows each string, its translation, and a context tag:
+- No tag — 3+ reference languages available (high confidence)
+- `[ctx:N]` — only N other languages have this string (lower confidence)
+- `[ctx:0]` — no other language has this string yet; English alone used
+
+#### Step 3 — Run the backfill
+
+```bash
+python scripts/translations/backfill_po.py --lang fr
+```
+
+Options:
+
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--lang LANG` | required | ISO language code (`fr`, `de`, `ja`, …) |
+| `--batch-size N` | 50 | Strings per Claude request |
+| `--limit N` | unlimited | Stop after N entries |
+| `--min-context N` | 0 | Skip entries with fewer than N reference translations |
+| `--model MODEL` | `claude-sonnet-4-6` | Claude model to use |
+| `--dry-run` | off | Print without writing |
+| `--no-fuzzy` | off | Don't mark entries as fuzzy |
+
+Use `--min-context 2` to skip strings that have fewer than 2 reference
+translations in other languages. Those strings are more likely to be ambiguous
+(short labels, UI fragments) where the correct meaning can't be inferred
+without additional context.
+
+#### Step 4 — Review and commit
+
+Open the target `.po` file and search for `fuzzy`. For each generated entry:
+
+1. Verify the translation is correct for the UI context.
+2. Remove the `# Machine-translated via backfill_po.py` comment and the
+   `#, fuzzy` flag line once you are satisfied.
+3. If the translation is wrong, correct the `msgstr` before removing the flag.
+4. Commit the `.po` file — do **not** commit `translation_index.json` (it is
+   gitignored and regenerated locally).
+
+#### Running via npm
+
+From `superset-frontend/`:
+
+```bash
+# Rebuild index
+npm run translations:build-index
+
+# Backfill (pass arguments after --)
+npm run translations:backfill -- --lang fr --dry-run
+```
+
 ## Linting

 ### Python
--- a/scripts/translations/backfill_po.py
+++ b/scripts/translations/backfill_po.py
@@ -0,0 +1,632 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Backfill missing translations in a .po file using Claude AI.
+
+For each untranslated (empty msgstr) entry in the target language, the script
+sends the English source string along with all available translations in other
+languages to Claude as context, then writes the AI-generated translation back
+into the .po file marked as #, fuzzy for human review.
+
+Usage:
+  # Build the translation index first (one-time or when .po files change)
+  python scripts/translations/build_translation_index.py
+
+  # Backfill French translations
+  python scripts/translations/backfill_po.py --lang fr
+
+  # Dry run (print what would be translated without writing)
+  python scripts/translations/backfill_po.py --lang de --dry-run
+
+  # Limit to 100 entries and use a specific model
+  python scripts/translations/backfill_po.py --lang es --limit 100 \
+    --model claude-opus-4-6
+
+Options:
+  --lang LANG        ISO language code to backfill (required)
+  --batch-size N     Number of strings per Claude request (default: 50)
+  --limit N          Stop after translating N entries (default: unlimited)
+  --model MODEL      Claude model ID (default: claude-sonnet-4-6)
+  --index PATH       Path to translation_index.json (default: auto-detect)
+  --dry-run          Print translations without writing to .po file
+  --no-fuzzy         Do not mark generated translations as fuzzy (default: mark fuzzy)
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import re
+import shutil
+import subprocess
+import sys
+from pathlib import Path
+from typing import Any
+
+try:
+    import polib  # type: ignore[import-untyped]
+except ImportError:
+    print("polib is required. Run: pip install polib", file=sys.stderr)
+    sys.exit(1)
+
+TRANSLATIONS_DIR = Path(__file__).parent.parent.parent / "superset" / "translations"
+DEFAULT_INDEX = TRANSLATIONS_DIR / "translation_index.json"
+DEFAULT_MODEL = "claude-sonnet-4-6"
+DEFAULT_BATCH_SIZE = 50
+
+# Language names for the prompt, keyed by ISO code
+LANGUAGE_NAMES: dict[str, str] = {
+    "ar": "Arabic",
+    "ca": "Catalan",
+    "de": "German",
+    "es": "Spanish",
+    "fa": "Persian (Farsi)",
+    "fr": "French",
+    "it": "Italian",
+    "ja": "Japanese",
+    "ko": "Korean",
+    "mi": "Māori",
+    "nl": "Dutch",
+    "pl": "Polish",
+    "pt": "Portuguese",
+    "pt_BR": "Brazilian Portuguese",
+    "ru": "Russian",
+    "sk": "Slovak",
+    "sl": "Slovenian",
+    "tr": "Turkish",
+    "uk": "Ukrainian",
+    "zh": "Chinese (Simplified)",
+    "zh_TW": "Chinese (Traditional)",
+}
+
+
+def _lang_name(code: str) -> str:
+    """Return a human-readable language name for an ISO language code."""
+    return LANGUAGE_NAMES.get(code, code)
+
+
+def _plural_key(msgid: str, msgid_plural: str) -> str:
+    """Build the translation index key used for pluralized entries."""
+    return f"{msgid}\x00{msgid_plural}"
+
+
+def _is_missing(entry: polib.POEntry) -> bool:
+    """Return True for entries that need a translation."""
+    if entry.obsolete:
+        return False
+    if entry.msgid_plural:
+        return not any(v for v in entry.msgstr_plural.values())
+    return not entry.msgstr
+
+
+def _context_langs(
+    item: dict[str, Any], index: dict[str, Any], target_lang: str
+) -> list[str]:
+    """Return sorted list of language codes that have translations for this entry."""
+    key = item["index_key"]
+    if key not in index:
+        return []
+    return sorted(
+        lang for lang, val in index[key].items() if lang != target_lang and val
+    )
+
+
+def _context_count(
+    item: dict[str, Any], index: dict[str, Any], target_lang: str
+) -> int:
+    """Return the number of other-language translations available for this entry."""
+    return len(_context_langs(item, index, target_lang))
+
+
+def _render_item(
+    i: int,
+    item: dict[str, Any],
+    index: dict[str, Any],
+    target_lang: str,
+    reference_langs_sorted: list[str],
+) -> list[str]:
+    """Render one batch entry as prompt lines."""
+    lines: list[str] = []
+    ctx = _context_count(item, index, target_lang)
+    if ctx == 0:
+        lines.append(
+            f"--- [{i}] (no reference translations — translate conservatively) ---"
+        )
+    else:
+        plural = "s" if ctx != 1 else ""
+        lines.append(f"--- [{i}] ({ctx} reference translation{plural}) ---")
+    lines.append(f"English: {json.dumps(item['msgid'], ensure_ascii=False)}")
+    if item.get("msgid_plural"):
+        plural_json = json.dumps(item["msgid_plural"], ensure_ascii=False)
+        lines.append(f"English plural: {plural_json}")
+    key = item["index_key"]
+    if key in index and reference_langs_sorted:
+        for lang in reference_langs_sorted:
+            val = index[key].get(lang)
+            if val is None:
+                continue
+            if isinstance(val, dict):
+                forms = "; ".join(
+                    f"[{k}] {json.dumps(v, ensure_ascii=False)}" for k, v in val.items()
+                )
+                lines.append(f"{_lang_name(lang)}: {forms}")
+            else:
+                lines.append(
+                    f"{_lang_name(lang)}: {json.dumps(val, ensure_ascii=False)}"
+                )
+    lines.append("")
+    return lines
+
+
+def build_prompt(
+    target_lang: str,
+    batch: list[dict[str, Any]],
+    index: dict[str, Any],
+) -> str:
+    """Build the Claude prompt for a batch of entries."""
+    lang_name = _lang_name(target_lang)
+
+    # Collect which other languages actually have translations for this batch
+    reference_langs: set[str] = set()
+    for item in batch:
+        key = item["index_key"]
+        if key in index:
+            reference_langs.update(
+                lang for lang, val in index[key].items() if lang != target_lang and val
+            )
+    reference_langs_sorted = sorted(reference_langs)
+
+    lines: list[str] = [
+        "You are a professional translator specializing in software UI strings.",
+        f"Translate the following English strings into {lang_name} ({target_lang}).",
+        "",
+        "Rules:",
+        "- Preserve all format placeholders exactly: %(name)s, {name}, %s, %d, etc.",
+        "- Preserve HTML tags if present.",
+        "- Keep the same tone and register as the reference translations.",
+        "- For plural forms, provide translations for all plural forms"
+        " required by the language.",
+        "- Return ONLY a JSON object mapping each numeric index (as a string)"
+        " to its translation.",
+        "- Do not add any explanation, preamble, or markdown fences.",
+        "",
+        "Important: Many strings are short fragments or single words that are"
+        " ambiguous in English (e.g. 'Scale' could mean a measurement scale,"
+        " to scale an image, or fish scales). Use the translations in other"
+        " languages as your primary signal for which meaning is intended —"
+        " they collectively disambiguate the intended sense. When no"
+        " other-language translations are available for an entry, translate"
+        " conservatively based on the most common meaning in a data"
+        " visualization UI context.",
+        "",
+    ]
+
+    if reference_langs_sorted:
+        lines.append(
+            f"Reference translations are provided per string where available "
+            f"({', '.join(_lang_name(lc) for lc in reference_langs_sorted)})."
+        )
+        lines.append("")
+
+    lines.append("Strings to translate:")
+    lines.append("")
+
+    for i, item in enumerate(batch):
+        lines.extend(_render_item(i, item, index, target_lang, reference_langs_sorted))
+
+    if batch and batch[0].get("msgid_plural"):
+        # Add guidance on plural form counts per language
+        lines.append(
+            "Note: provide ALL plural forms required by the target language "
+            "(e.g. French needs 2, Russian needs 3, Arabic needs 6)."
+        )
+        lines.append("")
+
+    lines.append(
+        'Expected output format: {"0": "<translation>", "1": "<translation>", ...}'
+    )
+    lines.append("(keys are the numeric indices of the strings above)")
+
+    return "\n".join(lines)
+
+
+def parse_response(text: str, batch_size: int) -> dict[int, str]:
+    """Parse the JSON object from Claude's response."""
+    # Strip any accidental markdown fences
+    text = re.sub(r"^```[^\n]*\n", "", text.strip())
+    text = re.sub(r"\n```$", "", text)
+    try:
+        raw = json.loads(text)
+    except json.JSONDecodeError as exc:
+        raise ValueError(
+            f"Could not parse response as JSON: {exc}\n\nResponse:\n{text}"
+        ) from exc
+    # _process_batches only catches ValueError/RuntimeError, so a non-object
+    # response (list, scalar, null) must surface as ValueError rather than
+    # bubbling up an AttributeError from .items() and aborting the whole run.
+    if not isinstance(raw, dict):
+        raise ValueError(
+            f"Expected a JSON object mapping indices to translations, "
+            f"got {type(raw).__name__}.\n\nResponse:\n{text}"
+        )
+    # Preserve dict/list values as JSON strings so plural responses (where
+    # v is a dict of plural forms) can be re-parsed downstream by
+    # _apply_translation's json.loads. str(v) on a dict produces Python
+    # repr ({'0': 'x'}) which is not valid JSON.
+    return {
+        int(k): (
+            json.dumps(v, ensure_ascii=False) if isinstance(v, (dict, list)) else str(v)
+        )
+        for k, v in raw.items()
+        if str(k).isdigit()
+    }
+
+
+def translate_batch(
+    model: str,
+    target_lang: str,
+    batch: list[dict[str, Any]],
+    index: dict[str, Any],
+) -> dict[int, str]:
+    """Send a batch of strings to Claude via `claude -p`.
+
+    Returns a dict mapping batch index to translated string.
+    """
+    claude_bin = shutil.which("claude")
+    if not claude_bin:
+        raise RuntimeError(
+            "claude CLI not found. Install Claude Code or add it to PATH."
+        )
+    prompt = build_prompt(target_lang, batch, index)
+    # Pipe the prompt over stdin rather than passing it as argv: a single batch
+    # with many reference languages can grow into the tens of KB and approach
+    # ARG_MAX on some platforms.
+    # claude_bin is resolved via shutil.which — not user-controlled input
+    result = subprocess.run(  # noqa: S603
+        [claude_bin, "--model", model, "-p"],
+        input=prompt,
+        capture_output=True,
+        text=True,
+        check=False,
+    )
+    if result.returncode != 0:
+        raise RuntimeError(
+            f"claude exited with code {result.returncode}:\n{result.stderr}"
+        )
+    return parse_response(result.stdout.strip(), len(batch))
+
+
+def _apply_plural_translation(entry: polib.POEntry, translation: str) -> None:
+    """Distribute a model response across the entry's plural forms.
+
+    Model may return a JSON dict ({"0": "form0", "1": "form1"}), a JSON list
+    (["form0", "form1"], also valid since plural forms are ordered), a JSON
+    scalar (a single translation that fills every form), or a plain non-JSON
+    string (older models that ignore the JSON instruction).
+    """
+    try:
+        plural_value = json.loads(translation)
+    except (json.JSONDecodeError, ValueError):
+        for k in entry.msgstr_plural:
+            entry.msgstr_plural[k] = translation
+        return
+
+    if isinstance(plural_value, dict):
+        entry.msgstr_plural = {int(k): str(v) for k, v in plural_value.items()}
+        return
+
+    if isinstance(plural_value, list) and plural_value:
+        # Distribute list items across plural form indices in order; if the
+        # model returned fewer forms than the language requires, repeat the
+        # last form rather than leaving slots blank.
+        forms = [str(v) for v in plural_value]
+        for k in sorted(entry.msgstr_plural):
+            entry.msgstr_plural[k] = forms[k] if k < len(forms) else forms[-1]
+        return
+
+    # Scalar (or empty list) — broadcast to every form.
+    fill = str(plural_value) if plural_value not in (None, []) else translation
+    for k in entry.msgstr_plural:
+        entry.msgstr_plural[k] = fill
+
+
+def _apply_translation(
+    entry: polib.POEntry,
+    translation: str,
+    item: dict[str, Any],
+    model: str,
+    mark_fuzzy: bool,
+) -> None:
+    """Write a translation string into a POEntry and add attribution."""
+    if entry.msgid_plural:
+        _apply_plural_translation(entry, translation)
+    else:
+        entry.msgstr = translation
+
+    if mark_fuzzy and "fuzzy" not in entry.flags:
+        entry.flags.append("fuzzy")
+
+    refs = item["context_langs"]
+    refs_tag = f" [refs: {', '.join(refs)}]" if refs else " [no refs]"
+    attribution = f"Machine-translated via backfill_po.py ({model}){refs_tag}"
+    if entry.tcomment:
+        if attribution not in entry.tcomment:
+            entry.tcomment = f"{entry.tcomment}\n{attribution}"
+    else:
+        entry.tcomment = attribution
+
+
+def _build_batch_items(
+    entries: list[polib.POEntry],
+    index: dict[str, Any],
+    lang: str,
+) -> list[dict[str, Any]]:
+    """Convert a list of POEntries into the dict format used by translate_batch."""
+    items: list[dict[str, Any]] = []
+    for entry in entries:
+        if entry.msgid_plural:
+            item: dict[str, Any] = {
+                "msgid": entry.msgid,
+                "msgid_plural": entry.msgid_plural,
+                "index_key": _plural_key(entry.msgid, entry.msgid_plural),
+                "is_plural": True,
+            }
+        else:
+            item = {
+                "msgid": entry.msgid,
+                "index_key": entry.msgid,
+                "is_plural": False,
+            }
+        item["context_langs"] = _context_langs(item, index, lang)
+        item["context_count"] = len(item["context_langs"])
+        items.append(item)
+    return items
+
+
+def _process_batches(
+    missing: list[polib.POEntry],
+    index: dict[str, Any],
+    lang: str,
+    batch_size: int,
+    model: str,
+    dry_run: bool,
+    mark_fuzzy: bool,
+    cat: polib.POFile | None = None,
+    po_path: Path | None = None,
+) -> tuple[int, int]:
+    """Translate missing entries in batches. Returns (translated, failed) counts.
+
+    When ``cat`` and ``po_path`` are provided and ``dry_run`` is False, the
+    catalog is saved to disk after each batch that produced at least one
+    successful translation. This means a crash mid-run only loses the in-flight
+    batch rather than every batch translated so far.
+    """
+    translated_count = 0
+    failed_count = 0
+    for batch_start in range(0, len(missing), batch_size):
+        batch_entries = missing[batch_start : batch_start + batch_size]
+        batch_items = _build_batch_items(batch_entries, index, lang)
+        end = min(batch_start + batch_size, len(missing))
+        print(
+            f"  Translating entries {batch_start + 1}–{end} of {len(missing)} …",
+            file=sys.stderr,
+        )
+        try:
+            translations = translate_batch(model, lang, batch_items, index)
+        except (ValueError, RuntimeError) as exc:
+            print(f"  ERROR in batch starting at {batch_start}: {exc}", file=sys.stderr)
+            failed_count += len(batch_entries)
+            continue
+        batch_applied = 0
+        for i, entry in enumerate(batch_entries):
+            translation = translations.get(i)
+            if translation is None:
+                print(
+                    f"  WARNING: no translation returned for index {i} "
+                    f"(msgid: {entry.msgid[:60]!r})",
+                    file=sys.stderr,
+                )
+                failed_count += 1
+                continue
+            if dry_run:
+                ctx = batch_items[i]["context_count"]
+                ctx_tag = f" [ctx:{ctx}]" if ctx < 3 else ""
+                print(
+                    f"  [{lang}]{ctx_tag} {entry.msgid[:60]!r} → {translation[:60]!r}"
+                )
+            else:
+                _apply_translation(
+                    entry, translation, batch_items[i], model, mark_fuzzy
+                )
+                batch_applied += 1
+            translated_count += 1
+        if (
+            not dry_run
+            and batch_applied > 0
+            and cat is not None
+            and po_path is not None
+        ):
+            cat.save()
+            print(
+                f"  Saved {po_path} ({batch_applied} entry(ies) in this batch).",
+                file=sys.stderr,
+            )
+    return translated_count, failed_count
+
+
+def backfill(
+    lang: str,
+    *,
+    batch_size: int = DEFAULT_BATCH_SIZE,
+    limit: int | None = None,
+    min_context: int = 0,
+    model: str = DEFAULT_MODEL,
+    index_path: Path = DEFAULT_INDEX,
+    dry_run: bool = False,
+    mark_fuzzy: bool = True,
+) -> None:
+    """Backfill missing translations in the target language's .po file."""
+    po_path = TRANSLATIONS_DIR / lang / "LC_MESSAGES" / "messages.po"
+    if not po_path.exists():
+        print(f"No .po file found for language '{lang}': {po_path}", file=sys.stderr)
+        sys.exit(1)
+    if not index_path.exists():
+        print(
+            f"Translation index not found at {index_path}.\n"
+            "Run: python scripts/translations/build_translation_index.py",
+            file=sys.stderr,
+        )
+        sys.exit(1)
+
+    print("Loading translation index …", file=sys.stderr)
+    with open(index_path, encoding="utf-8") as f:
+        index: dict[str, Any] = json.load(f)
+
+    print(f"Loading {po_path} …", file=sys.stderr)
+    cat = polib.pofile(str(po_path))
+
+    missing: list[polib.POEntry] = [e for e in cat if e.msgid and _is_missing(e)]
+    print(f"Found {len(missing)} untranslated entries for '{lang}'.", file=sys.stderr)
+
+    if min_context > 0:
+        before = len(missing)
+        missing = [
+            e
+            for e in missing
+            if _context_count(
+                {
+                    "index_key": (
+                        _plural_key(e.msgid, e.msgid_plural)
+                        if e.msgid_plural
+                        else e.msgid
+                    )
+                },
+                index,
+                lang,
+            )
+            >= min_context
+        ]
+        skipped = before - len(missing)
+        print(
+            f"Skipping {skipped} entries with fewer than {min_context} reference "
+            f"translation(s) (use --min-context 0 to include them).",
+            file=sys.stderr,
+        )
+
+    if limit is not None:
+        missing = missing[:limit]
+        print(f"Limiting to {limit} entries.", file=sys.stderr)
+
+    if not missing:
+        print("Nothing to do.", file=sys.stderr)
+        return
+
+    translated_count, failed_count = _process_batches(
+        missing,
+        index,
+        lang,
+        batch_size,
+        model,
+        dry_run,
+        mark_fuzzy,
+        cat=cat,
+        po_path=po_path,
+    )
+
+    print(
+        f"\nDone. Translated: {translated_count}, Failed/skipped: {failed_count}.",
+        file=sys.stderr,
+    )
+    if not dry_run and translated_count > 0:
+        print(
+            f"Translations written to {po_path} (marked #, fuzzy for review).",
+            file=sys.stderr,
+        )
+
+
+def main() -> None:
+    """Parse CLI arguments and run translation backfill."""
+    parser = argparse.ArgumentParser(
+        description="Backfill missing .po translations using Claude AI",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog=__doc__,
+    )
+    parser.add_argument(
+        "--lang", required=True, help="ISO language code (e.g. fr, de, ja)"
+    )
+    parser.add_argument(
+        "--batch-size",
+        type=int,
+        default=DEFAULT_BATCH_SIZE,
+        help=f"Strings per Claude request (default: {DEFAULT_BATCH_SIZE})",
+    )
+    parser.add_argument(
+        "--limit",
+        type=int,
+        default=None,
+        help="Maximum number of entries to translate (default: unlimited)",
+    )
+    parser.add_argument(
+        "--model",
+        default=DEFAULT_MODEL,
+        help=f"Claude model ID (default: {DEFAULT_MODEL})",
+    )
+    parser.add_argument(
+        "--index",
+        type=Path,
+        default=DEFAULT_INDEX,
+        help=f"Path to translation_index.json (default: {DEFAULT_INDEX})",
+    )
+    parser.add_argument(
+        "--dry-run",
+        action="store_true",
+        help="Print translations without modifying the .po file",
+    )
+    parser.add_argument(
+        "--min-context",
+        type=int,
+        default=0,
+        metavar="N",
+        help=(
+            "Skip entries with fewer than N reference translations in other languages "
+            "(default: 0 = translate everything). Strings with low context are more "
+            "likely to be ambiguous single words or fragments — set to e.g. 2 to only "
+            "translate strings that have been confirmed in at least 2 other languages."
+        ),
+    )
+    parser.add_argument(
+        "--no-fuzzy",
+        dest="mark_fuzzy",
+        action="store_false",
+        default=True,
+        help="Do not mark generated translations as #, fuzzy",
+    )
+    args = parser.parse_args()
+
+    backfill(
+        lang=args.lang,
+        batch_size=args.batch_size,
+        limit=args.limit,
+        min_context=args.min_context,
+        model=args.model,
+        index_path=args.index,
+        dry_run=args.dry_run,
+        mark_fuzzy=args.mark_fuzzy,
+    )
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/translations/build_translation_index.py
+++ b/scripts/translations/build_translation_index.py
@@ -0,0 +1,153 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Build a cross-language translation index from all .po files.
+
+Outputs a JSON file structured as:
+  {
+    "<msgid>": {
+      "<lang>": "<translated string or null>",
+      ...
+    },
+    ...
+  }
+
+For plural entries the key is "<msgid>\x00<msgid_plural>" and the value
+is a dict mapping lang -> {0: "...", 1: "..."} (or null if untranslated).
+
+Usage:
+  python scripts/translations/build_translation_index.py
+  python scripts/translations/build_translation_index.py \
+    --translations-dir superset/translations \
+    --output /tmp/translation_index.json
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import sys
+from pathlib import Path
+from typing import Any
+
+try:
+    import polib  # type: ignore[import-untyped]
+except ImportError:
+    print("polib is required. Install with: pip install polib", file=sys.stderr)
+    sys.exit(1)
+
+TRANSLATIONS_DIR = Path(__file__).parent.parent.parent / "superset" / "translations"
+DEFAULT_OUTPUT = (
+    Path(__file__).parent.parent.parent
+    / "superset"
+    / "translations"
+    / "translation_index.json"
+)
+
+
+def _is_translated(entry: polib.POEntry) -> bool:
+    """Return True if the entry has a non-empty, non-fuzzy translation."""
+    if "fuzzy" in entry.flags:
+        return False
+    if entry.msgid_plural:
+        return any(v for v in entry.msgstr_plural.values())
+    return bool(entry.msgstr)
+
+
+def _plural_key(entry: polib.POEntry) -> str:
+    """Build the combined key used for plural translation entries."""
+    return f"{entry.msgid}\x00{entry.msgid_plural}"
+
+
+def build_index(translations_dir: Path) -> dict[str, Any]:
+    """Read all .po files and build a combined translation index."""
+    index: dict[str, dict[str, Any]] = {}
+
+    langs = sorted(
+        d
+        for d in os.listdir(translations_dir)
+        if (translations_dir / d / "LC_MESSAGES" / "messages.po").exists()
+        and d != "en"  # en has empty msgstr by convention (source = target)
+    )
+
+    for lang in langs:
+        po_path = translations_dir / lang / "LC_MESSAGES" / "messages.po"
+        cat = polib.pofile(str(po_path))
+        for entry in cat:
+            if not entry.msgid:
+                continue  # skip header entry
+
+            if entry.msgid_plural:
+                key = _plural_key(entry)
+                if key not in index:
+                    index[key] = {}
+                # Fuzzy entries are unreviewed (often machine-generated drafts),
+                # so excluding them prevents feeding unverified translations
+                # back into the AI backfill prompt as trusted context.
+                index[key][lang] = (
+                    dict(entry.msgstr_plural) if _is_translated(entry) else None
+                )
+            else:
+                key = entry.msgid
+                if key not in index:
+                    index[key] = {}
+                index[key][lang] = entry.msgstr if _is_translated(entry) else None
+
+    # Ensure every entry has a slot for every language (null if missing)
+    for key in index:
+        for lang in langs:
+            index[key].setdefault(lang, None)
+
+    return index
+
+
+def main() -> None:
+    """Parse arguments, build the translation index, and write it to disk."""
+    parser = argparse.ArgumentParser(
+        description="Build cross-language translation index"
+    )
+    parser.add_argument(
+        "--translations-dir",
+        type=Path,
+        default=TRANSLATIONS_DIR,
+        help="Path to the translations directory (default: superset/translations)",
+    )
+    parser.add_argument(
+        "--output",
+        "-o",
+        type=Path,
+        default=DEFAULT_OUTPUT,
+        help=(
+            "Output JSON file path"
+            " (default: superset/translations/translation_index.json)"
+        ),
+    )
+    args = parser.parse_args()
+
+    print(f"Reading .po files from {args.translations_dir} …", file=sys.stderr)
+    index = build_index(args.translations_dir)
+    print(f"Indexed {len(index)} message IDs.", file=sys.stderr)
+
+    args.output.parent.mkdir(parents=True, exist_ok=True)
+    with open(args.output, "w", encoding="utf-8") as f:
+        json.dump(index, f, ensure_ascii=False, indent=2)
+
+    print(f"Written to {args.output}", file=sys.stderr)
+
+
+if __name__ == "__main__":
+    main()
--- a/superset-frontend/package.json
+++ b/superset-frontend/package.json
@@ -43,6 +43,8 @@
    "build-instrumented": "cross-env NODE_ENV=production BABEL_ENV=instrumented webpack --mode=production --color",
    "build-storybook": "storybook build",
    "build-translation": "scripts/po2json.sh",
+    "translations:build-index": "python3 ../scripts/translations/build_translation_index.py",
+    "translations:backfill": "python3 ../scripts/translations/backfill_po.py",
    "bundle-stats": "cross-env BUNDLE_ANALYZER=true npm run build && npx open-cli ../superset/static/stats/statistics.html",
    "clear-npm": "mkdir -p /tmp/empty && rsync -a --delete /tmp/empty/ node_modules/ && rmdir node_modules /tmp/empty",
    "core:cover": "cross-env NODE_ENV=test NODE_OPTIONS=\"--max-old-space-size=4096\" jest --coverage --coverageThreshold='{\"global\":{\"statements\":100,\"branches\":100,\"functions\":100,\"lines\":100}}' --collectCoverageFrom='[\"packages/**/src/**/*.{js,ts}\", \"!packages/superset-core/**/*\"]' packages",
--- a/superset/translations/es/LC_MESSAGES/messages.po
+++ b/superset/translations/es/LC_MESSAGES/messages.po
--- a/superset/translations/requirements.txt
+++ b/superset/translations/requirements.txt
@@ -16,3 +16,4 @@
 # under the License.
 Babel==2.9.1
 jinja2==3.1.6
+polib>=1.2.0
--- a/tests/unit_tests/scripts/translations/init.py
+++ b/tests/unit_tests/scripts/translations/init.py
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
--- a/tests/unit_tests/scripts/translations/backfill_po_test.py
+++ b/tests/unit_tests/scripts/translations/backfill_po_test.py
@@ -0,0 +1,290 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Tests for ``scripts/translations/backfill_po.py``.
+
+The script is not installed as a package, so it is loaded via importlib from
+its filesystem path. The two units exercised here — ``parse_response`` and
+``_apply_translation`` — have enough edge cases (dict/list/scalar responses,
+plural vs singular entries, fuzzy flag, attribution comments) to be worth
+pinning against regressions.
+"""
+
+import importlib.util
+import json  # noqa: TID251 - testing a standalone script that uses stdlib json
+from pathlib import Path
+
+import polib  # type: ignore[import-untyped]
+import pytest
+
+_SCRIPT_PATH = (
+    Path(__file__).resolve().parents[4] / "scripts" / "translations" / "backfill_po.py"
+)
+_spec = importlib.util.spec_from_file_location("backfill_po", _SCRIPT_PATH)
+assert _spec is not None, f"Could not load {_SCRIPT_PATH}"
+assert _spec.loader is not None, f"No loader on spec for {_SCRIPT_PATH}"
+backfill_po = importlib.util.module_from_spec(_spec)
+_spec.loader.exec_module(backfill_po)
+
+
+def test_parse_response_singular_strings() -> None:
+    """A flat object of int-keyed strings is returned as-is."""
+    text = '{"0": "hola", "1": "mundo"}'
+    assert backfill_po.parse_response(text, batch_size=2) == {
+        0: "hola",
+        1: "mundo",
+    }
+
+
+def test_parse_response_strips_markdown_fences() -> None:
+    """Models sometimes wrap JSON in ```json fences; those must be stripped."""
+    text = '```json\n{"0": "hola"}\n```'
+    assert backfill_po.parse_response(text, batch_size=1) == {0: "hola"}
+
+
+def test_parse_response_preserves_plural_dict_as_json() -> None:
+    """
+    Plural entries arrive as nested dicts and must round-trip through
+    json.loads downstream — str(dict) would emit Python repr (single quotes)
+    and break parsing in _apply_translation. The serialized form must be
+    valid JSON.
+    """
+    text = '{"0": {"0": "manzana", "1": "manzanas"}}'
+    parsed = backfill_po.parse_response(text, batch_size=1)
+    assert set(parsed.keys()) == {0}
+    # Must be valid JSON (double-quoted), not Python repr (single-quoted).
+    assert json.loads(parsed[0]) == {"0": "manzana", "1": "manzanas"}
+
+
+def test_parse_response_preserves_non_ascii() -> None:
+    """ensure_ascii=False keeps non-ASCII characters readable in the .po file."""
+    text = '{"0": {"0": "日本語", "1": "日本語s"}}'
+    parsed = backfill_po.parse_response(text, batch_size=1)
+    assert "日本語" in parsed[0]
+
+
+def test_parse_response_skips_non_numeric_keys() -> None:
+    """Keys that are not numeric strings are silently skipped."""
+    text = '{"0": "ok", "comment": "ignored", "2": "kept"}'
+    assert backfill_po.parse_response(text, batch_size=3) == {
+        0: "ok",
+        2: "kept",
+    }
+
+
+@pytest.mark.parametrize(
+    "raw",
+    ['["hola", "mundo"]', '"just a string"', "null", "42"],
+)
+def test_parse_response_rejects_non_object(raw: str) -> None:
+    """
+    Non-object JSON (list, string, null, number) must raise ValueError so
+    _process_batches catches it instead of crashing on AttributeError from
+    .items().
+    """
+    with pytest.raises(ValueError, match="Expected a JSON object"):
+        backfill_po.parse_response(raw, batch_size=1)
+
+
+def test_parse_response_rejects_invalid_json() -> None:
+    """Garbage input surfaces as ValueError, not the underlying JSONDecodeError."""
+    with pytest.raises(ValueError, match="Could not parse response as JSON"):
+        backfill_po.parse_response("not even close to json", batch_size=1)
+
+
+# ---------------------------------------------------------------------------
+# _apply_translation
+# ---------------------------------------------------------------------------
+
+
+def _make_singular_entry(msgid: str = "Hello") -> polib.POEntry:
+    return polib.POEntry(msgid=msgid, msgstr="")
+
+
+def _make_plural_entry(
+    msgid: str = "%(n)s apple",
+    msgid_plural: str = "%(n)s apples",
+) -> polib.POEntry:
+    entry = polib.POEntry(msgid=msgid, msgid_plural=msgid_plural)
+    entry.msgstr_plural = {0: "", 1: ""}
+    return entry
+
+
+def _item(refs: list[str] | None = None) -> dict[str, list[str]]:
+    return {"context_langs": refs if refs is not None else ["fr", "de"]}
+
+
+def test_apply_translation_singular_writes_msgstr_and_marks_fuzzy() -> None:
+    entry = _make_singular_entry()
+    backfill_po._apply_translation(
+        entry, "Hola", _item(["fr", "de"]), model="claude-test", mark_fuzzy=True
+    )
+    assert entry.msgstr == "Hola"
+    assert "fuzzy" in entry.flags
+
+
+def test_apply_translation_singular_no_fuzzy_when_disabled() -> None:
+    entry = _make_singular_entry()
+    backfill_po._apply_translation(
+        entry, "Hola", _item(), model="claude-test", mark_fuzzy=False
+    )
+    assert "fuzzy" not in entry.flags
+
+
+def test_apply_translation_attribution_includes_refs() -> None:
+    entry = _make_singular_entry()
+    backfill_po._apply_translation(
+        entry, "Hola", _item(["fr", "de"]), model="claude-test", mark_fuzzy=True
+    )
+    assert "Machine-translated via backfill_po.py (claude-test)" in entry.tcomment
+    assert "[refs: fr, de]" in entry.tcomment
+
+
+def test_apply_translation_attribution_marks_no_refs() -> None:
+    entry = _make_singular_entry()
+    backfill_po._apply_translation(
+        entry, "Hola", _item([]), model="claude-test", mark_fuzzy=True
+    )
+    assert "[no refs]" in entry.tcomment
+
+
+def test_apply_translation_attribution_appended_not_duplicated() -> None:
+    """Re-running on an already-translated entry must not duplicate attribution."""
+    entry = _make_singular_entry()
+    entry.tcomment = "Existing maintainer note"
+    backfill_po._apply_translation(
+        entry, "Hola", _item(["fr"]), model="claude-test", mark_fuzzy=True
+    )
+    # Existing comment preserved, attribution appended.
+    assert entry.tcomment.startswith("Existing maintainer note\n")
+    assert "Machine-translated via backfill_po.py" in entry.tcomment
+
+    # Apply again — attribution must not duplicate.
+    backfill_po._apply_translation(
+        entry, "Hola", _item(["fr"]), model="claude-test", mark_fuzzy=True
+    )
+    assert entry.tcomment.count("Machine-translated via backfill_po.py") == 1
+
+
+def test_apply_translation_plural_dict_response() -> None:
+    """A JSON-dict response writes each plural form to msgstr_plural."""
+    entry = _make_plural_entry()
+    translation = json.dumps({"0": "manzana", "1": "manzanas"})
+    backfill_po._apply_translation(
+        entry, translation, _item(), model="claude-test", mark_fuzzy=True
+    )
+    assert entry.msgstr_plural == {0: "manzana", 1: "manzanas"}
+    assert "fuzzy" in entry.flags
+
+
+def test_apply_translation_plural_scalar_json_fills_all_forms() -> None:
+    """
+    A JSON-scalar response (e.g. ``"hola"``) is broadcast to every plural form.
+    This is the documented fallback when the model returns a single string for
+    a plural entry.
+    """
+    entry = _make_plural_entry()
+    backfill_po._apply_translation(
+        entry, '"manzana"', _item(), model="claude-test", mark_fuzzy=True
+    )
+    assert entry.msgstr_plural == {0: "manzana", 1: "manzana"}
+
+
+def test_apply_translation_plural_invalid_json_fills_all_forms() -> None:
+    """
+    A non-JSON string also broadcasts to every plural form (rather than
+    crashing). This handles older models that ignore the JSON instruction.
+    """
+    entry = _make_plural_entry()
+    backfill_po._apply_translation(
+        entry, "manzana", _item(), model="claude-test", mark_fuzzy=True
+    )
+    assert entry.msgstr_plural == {0: "manzana", 1: "manzana"}
+
+
+def test_apply_translation_plural_round_trip_from_parse_response() -> None:
+    """
+    End-to-end guard: the JSON string produced by parse_response for a plural
+    entry must be consumable by _apply_translation without losing forms. This
+    is the regression that #39448 fixed (str(dict) → Python repr broke the
+    round-trip).
+    """
+    raw = '{"0": {"0": "manzana", "1": "manzanas"}}'
+    parsed = backfill_po.parse_response(raw, batch_size=1)
+    entry = _make_plural_entry()
+    backfill_po._apply_translation(
+        entry, parsed[0], _item(), model="claude-test", mark_fuzzy=True
+    )
+    assert entry.msgstr_plural == {0: "manzana", 1: "manzanas"}
+
+
+def test_apply_translation_plural_list_response() -> None:
+    """
+    Models sometimes return a JSON array for plural forms (forms are ordered,
+    so a list is a valid representation). Each element must map to the
+    corresponding plural index. Without this branch, ``str(list)`` would emit
+    Python list-repr and broadcast it to every form — observed in the wild
+    on a fresh run for French.
+    """
+    entry = _make_plural_entry()
+    translation = json.dumps(["manzana", "manzanas"])
+    backfill_po._apply_translation(
+        entry, translation, _item(), model="claude-test", mark_fuzzy=True
+    )
+    assert entry.msgstr_plural == {0: "manzana", 1: "manzanas"}
+
+
+def test_apply_translation_plural_list_round_trip_from_parse_response() -> None:
+    """
+    The list-of-forms response must also survive parse_response → _apply
+    round-trip. parse_response JSON-serializes lists; _apply_translation
+    must json.loads them back into a list and distribute across forms.
+    """
+    raw = '{"0": ["manzana", "manzanas"]}'
+    parsed = backfill_po.parse_response(raw, batch_size=1)
+    entry = _make_plural_entry()
+    backfill_po._apply_translation(
+        entry, parsed[0], _item(), model="claude-test", mark_fuzzy=True
+    )
+    assert entry.msgstr_plural == {0: "manzana", 1: "manzanas"}
+
+
+def test_apply_translation_plural_list_shorter_repeats_last_form() -> None:
+    """
+    If the model returns fewer forms than the language requires, repeat the
+    last form rather than leaving slots empty (which would render as the
+    literal English msgid via gettext fallback).
+    """
+    entry = polib.POEntry(msgid="apple", msgid_plural="apples")
+    entry.msgstr_plural = {0: "", 1: "", 2: ""}
+    backfill_po._apply_translation(
+        entry,
+        json.dumps(["uno", "dos"]),
+        _item(),
+        model="claude-test",
+        mark_fuzzy=True,
+    )
+    assert entry.msgstr_plural == {0: "uno", 1: "dos", 2: "dos"}
+
+
+def test_apply_translation_plural_empty_list_falls_back_to_string_broadcast() -> None:
+    """An empty JSON list isn't usable; fall back to writing the raw string."""
+    entry = _make_plural_entry()
+    backfill_po._apply_translation(
+        entry, "[]", _item(), model="claude-test", mark_fuzzy=True
+    )
+    # Falls into the JSONDecodeError/ValueError branch → broadcast raw string.
+    assert entry.msgstr_plural == {0: "[]", 1: "[]"}
Author	SHA1	Message	Date
Evan Rusackas	6d2bf6b10c	Merge branch 'master' into feat/translation-backfill-tooling	2026-05-11 19:55:42 -07:00
Superset Dev	fe3fa946c4	fix(i18n): handle JSON-list plural responses from the model A fresh test run on French exposed a real bug in _apply_translation: when the model returns a JSON list for a plural entry (e.g. ["form0", "form1"], which is a valid representation since plural forms are ordered), the previous code took the else branch and broadcast str(list) — Python list-repr like ['form0', 'form1'] — to every plural form. Both msgstr[0] and msgstr[1] ended up containing the same literal Python list-repr string, breaking gettext lookups for that entry. Spanish dodged it by chance (the model returned dicts that time); the failure mode is reproducible on French. Changes: - Extract _apply_plural_translation helper. Handles dict, list, scalar, and non-JSON-string responses. List path distributes forms by index and repeats the last form if the model returned fewer forms than the language requires (better than leaving slots blank, which falls back to displaying the raw English msgid). - The split also drops _apply_translation's cyclomatic complexity back below the C901 threshold. - Adds 4 regression tests covering: list response, list response round-tripped through parse_response, list shorter than required forms (last-form-repeats), and empty list (falls back to raw-string broadcast). Verified end-to-end on French: the previously-broken plural entry "Added 1 new column to the virtual dataset" / "Added %s new columns to the virtual dataset" now writes msgstr[0] and msgstr[1] correctly on a fresh run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 17:47:24 -07:00
Superset Dev	795d6e67df	test(i18n): cover backfill_po + address review feedback Addresses sadpandajoe's review on #39448: 1. Adds tests/unit_tests/scripts/translations/backfill_po_test.py with 19 cases covering parse_response (singular/plural/markdown-fence stripping/non-ASCII/non-numeric keys/list-and-scalar rejection/JSON errors) and _apply_translation (singular path, plural-dict path, plural-scalar fallback, plural invalid-JSON fallback, fuzzy flag, attribution append/dedup, end-to-end round-trip from parse_response into _apply_translation). The script is loaded via importlib since it lives outside the package tree. 2. translate_batch now pipes the prompt over stdin instead of passing it as argv. With --batch-size 50 and many reference languages a single batch can grow into the tens of KB and approach ARG_MAX on some platforms; stdin removes that ceiling. 3. _process_batches now saves the catalog after each batch that wrote at least one translation (when not in --dry-run). For sparse languages with thousands of missing strings, a crash mid-run now only loses the in-flight batch rather than every batch translated so far. The full save at end of backfill() is removed since the per-batch save covers it. 4. Module docstring referenced --fuzzy/--no-fuzzy but argparse only registers --no-fuzzy; doc updated to match the actual flag. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 16:59:40 -07:00
Claude Code	73a042ed5c	fix(i18n): validate parse_response JSON is a dict before .items() A non-object JSON response (list, scalar, null) would raise AttributeError from .items(). _process_batches only catches (ValueError, RuntimeError), so the crash would abort the entire run instead of being handled per-batch. Surface the type error as ValueError so it's caught gracefully. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-20 08:52:54 -07:00
Evan Rusackas	8e899d1497	Merge branch 'master' into feat/translation-backfill-tooling	2026-04-19 15:29:55 -04:00
Claude Code	1ac81f7a31	docs(i18n): add docstrings to backfill() and main() Addresses two codeant-ai review comments about missing docstrings on newly added top-level functions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 16:14:04 -07:00
Claude Code	37af05c292	fix(i18n): restore clobbered code, fix plural coercion & fuzzy-as-context Four "Apply suggestion" commits from codeant-ai replaced real code bodies with docstring-only lines, breaking syntax (syntax errors at build_translation_index.py:119). Restore the bodies while keeping the suggested docstrings: - build_translation_index.py: _plural_key, main() - backfill_po.py: _lang_name, _plural_key Also addresses two major issues raised in review: 1. parse_response() in backfill_po.py used str(v) on values, which converted dict responses (from plural entries) into Python repr like "{'0': 'x'}" that json.loads could not later parse in _apply_translation. Serialize dict/list values with json.dumps. 2. build_index() wrote fuzzy entries as trusted context in the cross-language index, letting AI-generated drafts propagate back into future backfill runs as if reviewed. Gate index values via _is_translated so fuzzy entries become null. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 16:12:08 -07:00
Evan Rusackas	88c7389ee5	Apply suggestion from @codeant-ai-for-open-source[bot] Co-authored-by: codeant-ai-for-open-source[bot] <244253245+codeant-ai-for-open-source[bot]@users.noreply.github.com>	2026-04-17 18:18:10 -07:00
Evan Rusackas	ee6779c84a	Apply suggestion from @codeant-ai-for-open-source[bot] Co-authored-by: codeant-ai-for-open-source[bot] <244253245+codeant-ai-for-open-source[bot]@users.noreply.github.com>	2026-04-17 18:17:53 -07:00
Evan Rusackas	7bc0e385b8	Apply suggestion from @codeant-ai-for-open-source[bot] Co-authored-by: codeant-ai-for-open-source[bot] <244253245+codeant-ai-for-open-source[bot]@users.noreply.github.com>	2026-04-17 18:17:42 -07:00
Evan Rusackas	f684eccd94	Apply suggestion from @codeant-ai-for-open-source[bot] Co-authored-by: codeant-ai-for-open-source[bot] <244253245+codeant-ai-for-open-source[bot]@users.noreply.github.com>	2026-04-17 18:17:12 -07:00
Claude Code	0a91bce9ea	feat(i18n): add AI-assisted translation backfill tooling + Spanish translations Adds two scripts to help maintainers fill in missing .po translations using Claude AI, and applies them to backfill all 184 missing Spanish strings. New scripts: - scripts/translations/build_translation_index.py — reads every .po file and outputs a cross-language JSON index {msgid: {lang: translation}} used to provide reference context to the AI - scripts/translations/backfill_po.py — for a target language, finds all untranslated entries, batches them, and calls claude -p with cross-language context to generate draft translations marked #, fuzzy for human review Design highlights: - Cross-language translations are passed per-string so the AI can disambiguate ambiguous English (e.g. "Scale", "Table") from how other translators handled it - --min-context N skips strings with fewer than N reference translations - Each generated entry is tagged with a translator comment listing the model and which languages provided context (e.g. [refs: fr, ru]) - translation_index.json added to .gitignore (regenerated locally) Spanish translations: - Backfilled all 184 previously untranslated strings in es/LC_MESSAGES/messages.po - All entries marked #, fuzzy pending human review Docs: added "Backfilling missing translations with AI" section to docs/developer_docs/contributing/howtos.md npm shortcuts added to superset-frontend/package.json: - translations:build-index - translations:backfill Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 10:04:22 -07:00