mirror of
https://github.com/apache/superset.git
synced 2026-05-22 00:05:15 +00:00
feat(i18n): add AI-assisted translation backfill tooling + Spanish translations
Adds two scripts to help maintainers fill in missing .po translations
using Claude AI, and applies them to backfill all 184 missing Spanish strings.
New scripts:
- scripts/translations/build_translation_index.py — reads every .po file
and outputs a cross-language JSON index {msgid: {lang: translation}}
used to provide reference context to the AI
- scripts/translations/backfill_po.py — for a target language, finds all
untranslated entries, batches them, and calls claude -p with cross-language
context to generate draft translations marked #, fuzzy for human review
Design highlights:
- Cross-language translations are passed per-string so the AI can disambiguate
ambiguous English (e.g. "Scale", "Table") from how other translators handled it
- --min-context N skips strings with fewer than N reference translations
- Each generated entry is tagged with a translator comment listing the model
and which languages provided context (e.g. [refs: fr, ru])
- translation_index.json added to .gitignore (regenerated locally)
Spanish translations:
- Backfilled all 184 previously untranslated strings in es/LC_MESSAGES/messages.po
- All entries marked #, fuzzy pending human review
Docs: added "Backfilling missing translations with AI" section to
docs/developer_docs/contributing/howtos.md
npm shortcuts added to superset-frontend/package.json:
- translations:build-index
- translations:backfill
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -335,6 +335,92 @@ npm run build-translation
|
||||
pybabel compile -d superset/translations
|
||||
```
|
||||
|
||||
### Backfilling missing translations with AI
|
||||
|
||||
For languages with many untranslated strings, the repo includes a script that
|
||||
uses Claude AI to generate draft translations for any missing entries. All
|
||||
AI-generated strings are marked `#, fuzzy` and tagged with an attribution
|
||||
comment so that human reviewers know they need to be checked before merging.
|
||||
|
||||
#### Prerequisites
|
||||
|
||||
```bash
|
||||
pip install -r superset/translations/requirements.txt
|
||||
```
|
||||
|
||||
Claude Code must be installed and authenticated (`claude --version` should
|
||||
work). The script calls `claude -p` internally — no separate API key is needed.
|
||||
|
||||
#### Step 1 — Build the translation index
|
||||
|
||||
The index captures every already-translated string in every language and
|
||||
serves as cross-language context for the AI. Rebuild it whenever `.po` files
|
||||
change significantly:
|
||||
|
||||
```bash
|
||||
python scripts/translations/build_translation_index.py
|
||||
# Writes: superset/translations/translation_index.json
|
||||
```
|
||||
|
||||
#### Step 2 — Preview with a dry run
|
||||
|
||||
Check what would be translated without writing anything:
|
||||
|
||||
```bash
|
||||
python scripts/translations/backfill_po.py --lang fr --limit 20 --dry-run
|
||||
```
|
||||
|
||||
Output shows each string, its translation, and a context tag:
|
||||
- No tag — 3+ reference languages available (high confidence)
|
||||
- `[ctx:N]` — only N other languages have this string (lower confidence)
|
||||
- `[ctx:0]` — no other language has this string yet; English alone used
|
||||
|
||||
#### Step 3 — Run the backfill
|
||||
|
||||
```bash
|
||||
python scripts/translations/backfill_po.py --lang fr
|
||||
```
|
||||
|
||||
Options:
|
||||
|
||||
| Flag | Default | Description |
|
||||
|------|---------|-------------|
|
||||
| `--lang LANG` | required | ISO language code (`fr`, `de`, `ja`, …) |
|
||||
| `--batch-size N` | 50 | Strings per Claude request |
|
||||
| `--limit N` | unlimited | Stop after N entries |
|
||||
| `--min-context N` | 0 | Skip entries with fewer than N reference translations |
|
||||
| `--model MODEL` | `claude-sonnet-4-6` | Claude model to use |
|
||||
| `--dry-run` | off | Print without writing |
|
||||
| `--no-fuzzy` | off | Don't mark entries as fuzzy |
|
||||
|
||||
Use `--min-context 2` to skip strings that have fewer than 2 reference
|
||||
translations in other languages. Those strings are more likely to be ambiguous
|
||||
(short labels, UI fragments) where the correct meaning can't be inferred
|
||||
without additional context.
|
||||
|
||||
#### Step 4 — Review and commit
|
||||
|
||||
Open the target `.po` file and search for `fuzzy`. For each generated entry:
|
||||
|
||||
1. Verify the translation is correct for the UI context.
|
||||
2. Remove the `# Machine-translated via backfill_po.py` comment and the
|
||||
`#, fuzzy` flag line once you are satisfied.
|
||||
3. If the translation is wrong, correct the `msgstr` before removing the flag.
|
||||
4. Commit the `.po` file — do **not** commit `translation_index.json` (it is
|
||||
gitignored and regenerated locally).
|
||||
|
||||
#### Running via npm
|
||||
|
||||
From `superset-frontend/`:
|
||||
|
||||
```bash
|
||||
# Rebuild index
|
||||
npm run translations:build-index
|
||||
|
||||
# Backfill (pass arguments after --)
|
||||
npm run translations:backfill -- --lang fr --dry-run
|
||||
```
|
||||
|
||||
## Linting
|
||||
|
||||
### Python
|
||||
|
||||
Reference in New Issue
Block a user