Addresses sadpandajoe's review on #39448:
1. Adds tests/unit_tests/scripts/translations/backfill_po_test.py with
19 cases covering parse_response (singular/plural/markdown-fence
stripping/non-ASCII/non-numeric keys/list-and-scalar rejection/JSON
errors) and _apply_translation (singular path, plural-dict path,
plural-scalar fallback, plural invalid-JSON fallback, fuzzy flag,
attribution append/dedup, end-to-end round-trip from parse_response
into _apply_translation). The script is loaded via importlib since
it lives outside the package tree.
2. translate_batch now pipes the prompt over stdin instead of passing
it as argv. With --batch-size 50 and many reference languages a
single batch can grow into the tens of KB and approach ARG_MAX on
some platforms; stdin removes that ceiling.
3. _process_batches now saves the catalog after each batch that wrote
at least one translation (when not in --dry-run). For sparse
languages with thousands of missing strings, a crash mid-run now
only loses the in-flight batch rather than every batch translated
so far. The full save at end of backfill() is removed since the
per-batch save covers it.
4. Module docstring referenced --fuzzy/--no-fuzzy but argparse only
registers --no-fuzzy; doc updated to match the actual flag.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A non-object JSON response (list, scalar, null) would raise AttributeError
from .items(). _process_batches only catches (ValueError, RuntimeError),
so the crash would abort the entire run instead of being handled per-batch.
Surface the type error as ValueError so it's caught gracefully.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses two codeant-ai review comments about missing docstrings on
newly added top-level functions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Four "Apply suggestion" commits from codeant-ai replaced real code bodies
with docstring-only lines, breaking syntax (syntax errors at
build_translation_index.py:119). Restore the bodies while keeping the
suggested docstrings:
- build_translation_index.py: _plural_key, main()
- backfill_po.py: _lang_name, _plural_key
Also addresses two major issues raised in review:
1. parse_response() in backfill_po.py used str(v) on values, which
converted dict responses (from plural entries) into Python repr
like "{'0': 'x'}" that json.loads could not later parse in
_apply_translation. Serialize dict/list values with json.dumps.
2. build_index() wrote fuzzy entries as trusted context in the
cross-language index, letting AI-generated drafts propagate back
into future backfill runs as if reviewed. Gate index values via
_is_translated so fuzzy entries become null.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds two scripts to help maintainers fill in missing .po translations
using Claude AI, and applies them to backfill all 184 missing Spanish strings.
New scripts:
- scripts/translations/build_translation_index.py — reads every .po file
and outputs a cross-language JSON index {msgid: {lang: translation}}
used to provide reference context to the AI
- scripts/translations/backfill_po.py — for a target language, finds all
untranslated entries, batches them, and calls claude -p with cross-language
context to generate draft translations marked #, fuzzy for human review
Design highlights:
- Cross-language translations are passed per-string so the AI can disambiguate
ambiguous English (e.g. "Scale", "Table") from how other translators handled it
- --min-context N skips strings with fewer than N reference translations
- Each generated entry is tagged with a translator comment listing the model
and which languages provided context (e.g. [refs: fr, ru])
- translation_index.json added to .gitignore (regenerated locally)
Spanish translations:
- Backfilled all 184 previously untranslated strings in es/LC_MESSAGES/messages.po
- All entries marked #, fuzzy pending human review
Docs: added "Backfilling missing translations with AI" section to
docs/developer_docs/contributing/howtos.md
npm shortcuts added to superset-frontend/package.json:
- translations:build-index
- translations:backfill
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>