Love the agentic reflection loop—this mirrors human LQA. You might cut wall‑clock while preserving quality by making the reflection step multi‑agent (terminology/glossary, style/fluency, schema/placeholder safety) and running them in parallel, then letting an agentic LLM reconcile fixes—a lightweight distributed/parallel agentic AI pattern. Also consider constrained decoding with glossaries + JSON Schema, placeholder validation, and aggressive dedup/caching of repeated strings; for tracking gains, COMET or MQM-style scoring tends to beat BLEU.
*Why Go?*
I chose Go over Python for several reasons:
- *Performance*: Concurrent batch processing with goroutines
- *Reliability*: Static typing catches errors at compile time
- *Distribution*: Single binary, no runtime dependencies
- *Production-ready*: Built-in error handling, testing, and logging
*Reflection Implementation*
The agentic reflection adds overhead (3x API calls per batch):
- Batch size: 20 keys (configurable)
- Example: 100 keys = 5 batches = 15 API calls (5 translate + 5 reflect + 5 improve)
- Trade-off: 3x cost for significantly higher quality
You can adjust `--batch-size` based on your needs:
- Smaller batches (10): More reliable, better quality, higher cost
- Larger batches (50): More efficient, lower cost, slightly lower quality
*Supported AI Providers*
- *OpenAI*: GPT-5, GPT-5 mini, GPT-5 nano, GPT-4o, etc.
- *Anthropic*: Claude Sonnet 4.5, Claude Haiku 4.5, Claude Opus 4.1, etc.
- *Gemini*: Gemini 2.5 Flash, Gemini 2.5 Pro, etc.
You can use any model from these providers. Generally:
- Larger models → better reflection insights → higher quality
- Faster models → quicker processing → lower cost
- Balance: GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro
*RTL Support*
Proper bidirectional text handling for Arabic, Hebrew, Persian, Urdu:
- Automatic direction markers for LTR content in RTL context
- Smart punctuation conversion for Arabic-script languages
- Supports 27 languages total
*Testing*
- 51.9% test coverage (actively improving)
- Integration tests with mock AI providers
- Format protection validation
- Incremental diff logic tests
*Step 2: AI Self-Critique*
The AI analyzes its own work as an expert reviewer:
```
"The translation '欢迎使用' is accurate but could be more natural.
Consider '欢迎来到' which conveys a warmer, more inviting tone that
better matches the welcoming nature of 'Welcome to'."
```
*Step 3: AI Self-Improvement*
```
Improved: "欢迎来到 {app_name}"
```
This isn't post-processing with static rules—it's the AI acting as its own expert
reviewer. The same model that translated understands the context, nuances, and
challenges, making it uniquely qualified to critique and improve its own work.
*Why it works:*
- *Context awareness*: The AI knows what it was trying to achieve
- *Dynamic analysis*: Identifies issues specific to each translation's context
- *Actionable feedback*: Generates specific improvements, not generic fixes
- *Iterative quality*: Every translation gets a complete review-and-refine cycle
```
## Automatic Terminology Detection
One of the biggest pain points in i18n: inconsistent terminology. Jta solves this
by using the LLM to analyze your source file and automatically identify:
All future translations automatically use this dictionary. You can manually refine
it, and the AI will respect your choices. This ensures 100% consistency across
thousands of strings.
## Incremental Translation (80-90% Cost Savings)
Real-world i18n workflow:
1. Release 1.0: Translate 500 strings
2. Update 1.1: Add 10 new strings, modify 5 strings
3. Problem: Most tools re-translate all 500 strings
The tool intelligently diffs the source file against existing translations,
maintaining a perfect sync while minimizing API calls.
*Best practices:*
- Development: Use `--incremental` for frequent updates
- Production release: Use full translation for maximum quality
- CI/CD: Use `--incremental -y` for automated updates
This makes Jta practical for continuous i18n workflows where you're updating
translations multiple times per day.
Love the agentic reflection loop—this mirrors human LQA. You might cut wall‑clock while preserving quality by making the reflection step multi‑agent (terminology/glossary, style/fluency, schema/placeholder safety) and running them in parallel, then letting an agentic LLM reconcile fixes—a lightweight distributed/parallel agentic AI pattern. Also consider constrained decoding with glossaries + JSON Schema, placeholder validation, and aggressive dedup/caching of repeated strings; for tracking gains, COMET or MQM-style scoring tends to beat BLEU.
## Technical Details
*Why Go?* I chose Go over Python for several reasons: - *Performance*: Concurrent batch processing with goroutines - *Reliability*: Static typing catches errors at compile time - *Distribution*: Single binary, no runtime dependencies - *Production-ready*: Built-in error handling, testing, and logging
*Architecture* Clean architecture with domain-driven design: - *Presentation Layer*: CLI (Cobra) + Terminal UI (Lipgloss) - *Application Layer*: Workflow orchestration - *Domain Layer*: Translation engine, reflection engine, terminology manager - *Infrastructure Layer*: AI provider adapters, JSON repository
*Reflection Implementation* The agentic reflection adds overhead (3x API calls per batch): - Batch size: 20 keys (configurable) - Example: 100 keys = 5 batches = 15 API calls (5 translate + 5 reflect + 5 improve) - Trade-off: 3x cost for significantly higher quality
You can adjust `--batch-size` based on your needs: - Smaller batches (10): More reliable, better quality, higher cost - Larger batches (50): More efficient, lower cost, slightly lower quality
*Supported AI Providers* - *OpenAI*: GPT-5, GPT-5 mini, GPT-5 nano, GPT-4o, etc. - *Anthropic*: Claude Sonnet 4.5, Claude Haiku 4.5, Claude Opus 4.1, etc. - *Gemini*: Gemini 2.5 Flash, Gemini 2.5 Pro, etc.
You can use any model from these providers. Generally: - Larger models → better reflection insights → higher quality - Faster models → quicker processing → lower cost - Balance: GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro
*Format Protection* Regex-based preservation of: - Placeholders: `{var}`, `{{var}}`, `%s`, `%(name)d` - HTML tags: `<b>`, `<span class="highlight">` - URLs: `https://example.com` - Markdown: `*bold*`, `italic`, `[link](url)`
*RTL Support* Proper bidirectional text handling for Arabic, Hebrew, Persian, Urdu: - Automatic direction markers for LTR content in RTL context - Smart punctuation conversion for Arabic-script languages - Supports 27 languages total
*Testing* - 51.9% test coverage (actively improving) - Integration tests with mock AI providers - Format protection validation - Incremental diff logic tests
## The Key Innovation: Agentic Reflection
Traditional translation: Source → LLM → Done Jta's approach: Source → LLM → LLM self-critique → LLM improvement
Here's a real example:
*Step 1: Initial Translation* ``` Source: "Welcome to {app_name}" Translation: "欢迎使用 {app_name}" ```
*Step 2: AI Self-Critique* The AI analyzes its own work as an expert reviewer: ``` "The translation '欢迎使用' is accurate but could be more natural. Consider '欢迎来到' which conveys a warmer, more inviting tone that better matches the welcoming nature of 'Welcome to'." ```
*Step 3: AI Self-Improvement* ``` Improved: "欢迎来到 {app_name}" ```
This isn't post-processing with static rules—it's the AI acting as its own expert reviewer. The same model that translated understands the context, nuances, and challenges, making it uniquely qualified to critique and improve its own work.
*Why it works:* - *Context awareness*: The AI knows what it was trying to achieve - *Dynamic analysis*: Identifies issues specific to each translation's context - *Actionable feedback*: Generates specific improvements, not generic fixes - *Iterative quality*: Every translation gets a complete review-and-refine cycle ```
## Automatic Terminology Detection
One of the biggest pain points in i18n: inconsistent terminology. Jta solves this by using the LLM to analyze your source file and automatically identify:
*Preserve Terms* (never translate): - Brand names: "GitHub", "OAuth", "MyApp" - Technical terms: "API", "JSON", "HTTP"
*Consistent Terms* (always translate the same way): - Domain terms: "workspace" → "工作空间" (always) - Feature names: "credits" → "积分" (consistent across all strings)
The AI saves these to `.jta/terminology.json`:
```json { "version": "1.0", "sourceLanguage": "en", "preserveTerms": ["GitHub", "API", "OAuth"], "consistentTerms": ["repository", "commit", "pull request"] } ```
Then creates language-specific translation files:
```json // .jta/terminology.zh.json { "translations": { "repository": "仓库", "commit": "提交", "pull request": "拉取请求" } } ```
All future translations automatically use this dictionary. You can manually refine it, and the AI will respect your choices. This ensures 100% consistency across thousands of strings.
## Incremental Translation (80-90% Cost Savings)
Real-world i18n workflow: 1. Release 1.0: Translate 500 strings 2. Update 1.1: Add 10 new strings, modify 5 strings 3. Problem: Most tools re-translate all 500 strings
Jta's incremental mode: - Detects new keys (10 strings) - Identifies modified content (5 strings) - Preserves unchanged translations (485 strings) - Only translates 15 strings
*Result: 80-90% API cost reduction on updates*
Usage is dead simple:
```bash # First time: Full translation jta en.json --to zh
# After updates: Incremental (saves cost) jta en.json --to zh --incremental
# Re-translate everything if needed (quality refresh) jta en.json --to zh ```
The tool intelligently diffs the source file against existing translations, maintaining a perfect sync while minimizing API calls.
*Best practices:* - Development: Use `--incremental` for frequent updates - Production release: Use full translation for maximum quality - CI/CD: Use `--incremental -y` for automated updates
This makes Jta practical for continuous i18n workflows where you're updating translations multiple times per day.