Automated QA Checks in Translation: Tags, Numbers, Terminology

What automated QA checks catch, how Xbench, Verifika, and built-in CAT QA tools work, and why even experienced translators miss errors without them.

Also in: RU EN UK
Automated QA Checks in Translation: Tags, Numbers, Terminology

Picture this. You deliver a 200-page technical manual - valves, pressure gauges, safety warnings. You proofread it twice, feel good about it, hit send. Two days later the client comes back with a list: 47 broken XML tags that crashed their help system, decimal separators wrong in 23 places (periods where commas should be), and the word “valve” translated three different ways across chapters. Two full days of rework - on a project you thought was done.

The kicker? An automated QA check would have caught every single one of those errors in about 30 seconds. Not exaggerating. You click “Run QA,” wait for the progress bar, and get a list of every tag mismatch, every number inconsistency, every terminology deviation. Before the file ever leaves your machine.

This article covers what automated QA checks actually do, what they catch (and what they don’t), how the main tools compare, and how to build a QA workflow that saves you from exactly this kind of nightmare.

What Automated QA Checks Are and Why You Need Them

Automated QA checks are scripts or modules - built into CAT tools or run as standalone software - that scan your source and target segments looking for specific error patterns. They don’t read for meaning. They don’t evaluate style. They compare structural and formal elements between source and target and flag anything that doesn’t match up.

Here’s what they typically catch:

  • Tag errors - missing, extra, or mismatched XML/HTML tags and placeholders
  • Number mismatches - a number in the source that’s different (or absent) in the target
  • Terminology violations - terms translated differently from what your glossary dictates
  • Consistency issues - same source segment translated two different ways, or same target used for different sources
  • Formatting problems - double spaces, wrong punctuation, missing capitalization
  • Punctuation mismatches - period vs. no period, extra commas, wrong quotation marks

Here’s what they don’t catch:

  • Semantic errors - you wrote “open the valve” when the source said “close the valve.” The words are all real, the tags are fine, the numbers match. Automated QA has no idea it’s wrong.
  • Style problems - the text is grammatically correct but reads like a robot wrote it. Not a QA tool’s job.
  • Wrong meaning, correct form - “the patient received 500mg” is formally perfect, but the source said the patient should receive 500mg in the future. Tense errors, subtle meaning shifts - humans catch these.

The point isn’t that automated QA replaces human review. It doesn’t. The point is that it handles the tedious, pattern-based checks that humans are terrible at doing consistently across 200 pages. Your brain glazes over after checking the 50th tag pair. Software doesn’t.

“memoQ’s QA module provides over 30 categories of checks covering more than 100 individual verification types - from tag and number consistency to terminology adherence and formatting rules.” - memoQ Documentation

Think of it this way: automated QA is the spell-checker of the translation world. Nobody argues that spell-check replaces editing. But nobody serious ships a document without running it first.

Tag and Formatting Checks

Tags are probably the single most common source of QA failures - and the most dangerous. A typo in a word is embarrassing. A broken tag can crash an application.

What counts as a “tag”

In translation, “tags” means any inline code that controls formatting or represents a variable. This includes:

  • XML/HTML tags - <b>, </b>, <a href="...">, <br/>, <span class="warning">
  • Placeholders - {0}, {1}, %s, %d, %1$s, {{userName}}, ${count}
  • Formatting markers - bold, italic, subscript, superscript markup that CAT tools display as colored tags
  • Self-closing tags - <br/>, <img src="..." />

What goes wrong

A tag check verifies that every tag in the source also appears in the target - in the right order, properly nested, nothing missing, nothing added. The most common errors:

Missing closing tags. You translate a segment and accidentally delete </b>. In the exported file, everything after that point renders in bold - the entire rest of the menu, the next three screens, everything until the parser hits another closing tag or gives up.

Swapped tag order. Source has <b><i>warning</i></b>, you produce <i><b>warning</b></i>. Some parsers handle this fine. Others don’t - and you get rendering errors that only show up in specific browsers or on specific devices.

Deleted placeholders. The source says “Hello, {userName}! You have {count} messages.” You translate it and forget {count}. In the running app, the user sees “You have messages.” - with a blank space where the number should be. Or worse, the app crashes because it expected a variable that isn’t there.

Extra tags. You accidentally paste a tag that wasn’t in the source. The parser encounters an unexpected element, and depending on how forgiving it is, you get anything from a visual glitch to a full crash.

“QA checks scan for tag pair consistency, placeholder integrity, and inline formatting compliance across source and target segments. A single missing or mismatched tag can break file rendering or cause application errors downstream.” - Phrase TMS Documentation

Why this matters more than you think

If you’re translating marketing copy or a blog post, a broken tag means a formatting glitch. Annoying but fixable. If you’re translating a software UI, a medical device interface, or an industrial control system - a broken tag can mean a non-functional product. In regulated industries, that’s not just embarrassing, it’s a compliance issue.

Most CAT tools show tags visually (colored markers in the segment) and warn you in real time if you delete one. But real-time warnings are easy to dismiss when you’re in flow. The full QA scan at the end catches what you clicked past.

Numbers, Dates, and Units

Number errors are insidious because they look right at a glance. “1.500” means fifteen hundred in English and one-and-a-half in German. The digits are identical. The meaning is completely different. And no human reviewer will catch all 200 number instances in a technical manual by eyeballing them.

Number format conventions

Every locale has its own rules for decimal separators and thousands separators:

Locale One thousand and a half Example
English (US/UK) 1,500.50 comma = thousands, period = decimal
German, Italian, Portuguese (BR) 1.500,50 period = thousands, comma = decimal
French, Russian 1 500,50 space = thousands, comma = decimal
Swiss German 1‘500.50 apostrophe = thousands, period = decimal

An automated QA check compares numbers in the source with numbers in the target. If the source says “1,500.50” and the target says “1.500,50” - that’s a correct localization, and a good QA tool recognizes it. If the target says “1500.50” or “15,000.50” - that’s a flag.

Dates

Date formats are a classic source of rejection. The US uses MM/DD/YYYY, most of Europe uses DD/MM/YYYY, and Japan uses YYYY/MM/DD. The date 03/04/2026 is March 4th in America and April 3rd in most of Europe. If you’re translating a contract and the effective date is ambiguous, that’s not a minor issue.

QA tools check that dates present in the source also appear in the target, and some can verify format conversion. But date checks are tricky - the tool needs to know both the source and target locale conventions. Most handle the common patterns; unusual formats might need custom regex rules.

Measurement units

Technical translations frequently involve unit conversions. Centimeters to inches, kilograms to pounds, Celsius to Fahrenheit. The QA tool won’t do the conversion for you - it’s not a calculator. But it flags cases where a number in the source doesn’t appear in the target, which prompts you to check whether you actually converted it.

Common pitfalls:

  • Forgetting to convert when you should (European manual localized for US, but still says “tighten to 25 Nm” instead of “18.4 lb-ft”)
  • Converting when you shouldn’t (the product specification is metric worldwide, but you reflexively changed centimeters to inches)
  • Converting the number but forgetting to change the unit label

Currency and phone numbers

Currency checks verify that monetary amounts are present and that the currency symbol is appropriate. A price list that says “$500” in English shouldn’t say “500$” in French (though Quebec puts the dollar sign after the number). Phone number formats vary wildly by country - +1 (555) 123-4567 vs +33 1 23 45 67 89 vs +49 30 12345678. Some QA tools check these; many don’t, and you’ll need regex rules for thorough coverage.

“Automated QA reports catch issues that are practically invisible to the human eye during manual review - transposed digits, decimal separator confusion, missing units. These are the errors that slip through three rounds of human proofreading.” - Mitra Translations

Terminology and Consistency

Here’s the “valve” problem from the opening. A 200-page manual about industrial equipment, and the translator used “valve,” “gate,” and “shutoff device” for the same source term across different chapters. Each individual usage might even be defensible in isolation. But the client has a glossary, and it says the term is “valve.” End of discussion.

Glossary enforcement

Terminology QA checks work by comparing your translations against a term base (TB) or glossary. For every segment, the tool checks: does the source contain a glossary term? If yes, does the target contain the approved translation? If not - flag.

This sounds simple, but it catches a surprising amount of inconsistency. Even experienced translators drift on terminology over a long project. You start fresh on Monday morning and use “pressure relief valve.” By Thursday afternoon you’re writing “pressure release valve” without even noticing the switch. The glossary check catches it.

Source consistency

Source consistency checks look for cases where the same source text appears in multiple segments but has different translations. If segment 45 says “Open the main menu” and segment 312 says “Open the main menu” - they should have the same target. If they don’t, the QA tool flags it.

This is especially valuable for software localization, where the same UI string might appear in multiple contexts. The user expects “Save” to always say “Save” - not “Save” in one dialog and “Store” in another.

Target consistency

The reverse check: same target text for different sources. If you translated both “annual report” and “yearly report” as the same thing in the target language - is that intentional? Maybe. But the QA tool flags it so you can decide. Sometimes collapsing two source terms into one target term is correct; sometimes it means you missed a distinction.

Forbidden terms

Some clients maintain “do not use” lists. The marketing department decided that “cheap” is banned in favor of “affordable.” Legal said never use “guarantee” - always “limited warranty.” Terminology QA can check for forbidden terms in the target and flag every occurrence.

The MQM framework

If you want to go beyond simple glossary checks and actually measure terminology quality, the Multidimensional Quality Metrics (MQM) framework gives you a structured way to do it. MQM categorizes errors into dimensions - accuracy, fluency, terminology, style, and more - each with sub-categories and severity levels. It’s the framework behind most serious quality evaluation programs in the industry.

MQM isn’t a tool - it’s a scoring system. You use it alongside your QA tools to quantify how bad the errors are. A wrong term in a marketing tagline (critical) vs. a wrong term in a footnote (minor) - MQM gives you a way to weight those differently.

For more on how terminology fits into the broader quality picture, see the ISO 17100 requirements - the standard that formalizes much of this.

Tools: Built-In CAT QA vs Standalone

You’ve got two categories of QA tools: the ones built into your CAT tool and standalone tools that work across different file formats and CAT environments. Both have their place.

Built-in CAT QA

Every major CAT tool includes QA checking. You run it inside the same environment where you translate - no exporting, no switching tools. Errors link directly to the segment, and you can fix them in place.

Standalone QA tools

Standalone tools like Xbench and Verifika work independently. You feed them bilingual files (XLIFF, SDLXLIFF, MQXLIFF, TMX, etc.) and they run their checks. The advantage: they work with files from any CAT tool, they’re often faster for large batches, and they offer checks that some CAT tools don’t.

Here’s how the main options compare:

Feature memoQ QA Trados QA Xbench 3.0 Verifika
Price included in memoQ included in Trados €99/year (v2.9 free) from $150/year
Out-of-box checks 100+ ~50 ~40 55
Fix errors in tool yes yes no (report only) yes
Batch processing yes limited yes yes (fast)
Regex checks yes yes yes yes
File formats memoQ projects Trados projects 30+ formats 25+ formats
Terminology checks yes (TB) yes (TB) yes (glossary + TB) yes

memoQ QA

memoQ’s built-in QA is arguably the most thorough among CAT tools. Over 100 check types organized into categories: tag verification, number checks, punctuation consistency, terminology enforcement, segment length, and more. You can create QA profiles - saved configurations that specify which checks to run and how strictly. For a software localization project, you’d crank tag checks to maximum severity. For a marketing text, you might relax punctuation rules.

The real-time QA feature highlights errors as you type, so you can fix them before moving to the next segment. The full QA scan runs across the entire project and generates a report you can click through.

Trados QA

Trados Studio’s QA verification covers the essentials: tag pairs, numbers, punctuation, segments left untranslated, double spaces, inconsistencies. Around 50 check types in total. It’s more limited than memoQ’s offering, but it handles the most common error categories.

One frustration with Trados QA: batch processing across multiple files can be slow, and the interface for reviewing results isn’t as smooth as it could be. For large projects, translators often export to Xbench for the final QA pass.

Xbench

ApSIC Xbench has been a staple in the translator’s toolkit for years. Version 2.9 is free and still widely used. Version 3.0 is a subscription at €99/year with expanded features.

Xbench’s strength is cross-format support. It reads SDLXLIFF, MQXLIFF, XLIFF, TMX, TBX, PO files, Excel glossaries, and about 25 more formats. You can load files from different CAT tools into the same project and run checks across all of them. This is huge when you’re working on a project that spans multiple tools or when you’re the reviewer checking someone else’s work.

The catch: Xbench is a reporting tool. It shows you the errors, but you can’t fix them in Xbench itself. You go back to your CAT tool, find the segment, and fix it there. It also works well as a terminology search engine - load your TMs and glossaries and search across all of them instantly.

Verifika

Verifika positions itself as the speed demon of QA tools. It handles large files fast and supports 25+ formats. At around $150/year, it’s more expensive than Xbench but offers inline editing - you can fix errors directly in Verifika without going back to the CAT tool.

Verifika’s interface is clean and modern compared to Xbench’s more utilitarian look. It also has strong regex support for custom checks and a built-in terminology verification module.

Which should you use?

If you work exclusively in memoQ, memoQ’s built-in QA is good enough for most projects. Same for Trados - its QA covers the basics. But if you want a safety net, or if you work across multiple CAT tools, add Xbench or Verifika as your second-pass tool. Many translators run the CAT’s built-in QA during translation and then Xbench or Verifika as a final check before delivery.

For a deeper comparison of CAT tools themselves, including their QA capabilities, see the CAT tools comparison for 2026.

According to Nimdzi’s research on translation quality tools, the trend is toward integrating QA more deeply into the translation workflow rather than treating it as a separate step. Real-time checks during translation, combined with a full scan at delivery, is becoming the standard approach.

Regex: When Standard Checks Aren’t Enough

Standard QA checks cover the common patterns. But every project has its quirks. Maybe your client uses a specific date format (DD-Mon-YYYY). Maybe part numbers follow a pattern (ABC-1234-XY). Maybe phone numbers in your target locale need to follow a specific grouping. This is where regex (regular expressions) comes in.

What regex is, in plain terms

Regex is a pattern-matching language. Instead of searching for a specific string like “2026,” you search for a pattern like “any four digits in a row.” Instead of checking for one phone number, you check for “any string that looks like a phone number in the format +XX XXX XXX XXXX.”

If you’ve never used regex, it looks intimidating at first - something like \b\d{1,3}(,\d{3})*\.\d{2}\b (that matches numbers in US format like 1,500.00). But you don’t need to write complex patterns from scratch. Most QA tools let you build them incrementally, and there are excellent regex testers online where you can paste your pattern and test text and see what matches.

When you need custom regex checks

  • Custom date formats - the standard date check doesn’t know about your client’s “DD-Mon-YYYY” format. Regex: \b\d{2}-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-\d{4}\b
  • Part numbers - verifying that alphanumeric product codes survived translation. If codes follow a pattern like “ABC-1234,” regex: \b[A-Z]{2,4}-\d{3,5}\b
  • Phone numbers - checking target locale formatting. German phone numbers: \+49\s?\d{2,5}\s?\d{3,8}
  • Measurement patterns - verifying that numbers followed by unit abbreviations are present. Like \d+\s?(mm|cm|m|kg|lb|°C|°F)
  • URLs and email addresses - making sure they weren’t accidentally translated (yes, this happens)

Regex in QA tools

All four major tools (memoQ, Trados, Xbench, Verifika) support regex-based custom checks. In memoQ, you add regex rules to your QA profile. In Xbench, you create custom checklists with regex patterns. Trados and Verifika have similar features.

A practical approach, as described by translators using Trados: start with 3-5 simple rules targeting the most common errors in your language pair or domain. Don’t try to build 50 regex rules on day one. Start with:

  1. Number format check for your target locale
  2. URL/email preservation
  3. Client-specific code or ID pattern
  4. Date format for target locale
  5. Double space detection (if your standard QA doesn’t already catch it)

Run these for a few projects. See what they catch. Refine and add more gradually. Building a good regex QA checklist is an iterative process, not a one-time setup.

A word of caution

Regex checks produce false positives. A pattern that matches “any number” will flag segment numbers, page references, footnote markers - not just the content numbers you care about. You’ll need to tune your patterns to reduce noise. Too many false positives and you’ll start ignoring the results, which defeats the purpose.

The goal is a QA check that flags real errors more than 80% of the time. If your regex rule produces 50 results and 45 of them are false positives, rewrite the rule. A noisy QA check is almost worse than no check at all.

Building a QA Workflow That Actually Works

Having QA tools is one thing. Using them effectively is another. Here’s a practical workflow that balances thoroughness with sanity.

Before you start translating

Set up your term base. If the client provides a glossary, import it into your CAT tool’s terminology module. If they don’t, build one as you go - but start it before you translate, not after. Every term you add early is a term that gets checked consistently across the whole project.

Configure your QA profile. Don’t use the default settings blindly. For a software localization project, tag checks should be at maximum strictness. For a literary translation, you probably want to turn off number checks entirely (character names that contain numbers, stylistic uses of numerals). Spend 10 minutes setting this up before a project, and you’ll save hours of false-positive review later.

Check the source. Run QA on the source file before you start. Source files have errors too - broken tags, inconsistent terminology, formatting issues. Better to know about them upfront than to discover mid-translation that your source is a mess.

During translation

Keep real-time QA on. Both memoQ and Trados can highlight QA warnings as you type. Yes, the little warning icons can be annoying. But fixing a tag error in the moment takes two seconds. Finding it in a report of 200 errors and navigating back to the segment takes much longer.

Don’t ignore warnings. This sounds obvious, but it’s the most common failure mode. Translators get a real-time warning, glance at it, think “that’s probably fine,” and move on. Then the same error pattern repeats 50 times. Treat every warning as real until you’ve confirmed it’s a false positive.

Run intermediate QA. On large projects (50+ pages), run a full QA check after every major section. Don’t wait until the end. Finding 15 errors after chapter 3 is manageable. Finding 300 errors after 12 chapters is demoralizing and time-consuming.

After translation, before delivery

Run the CAT tool’s built-in QA. Full project scan, all checks enabled for your project type. Review every error. Fix or mark as false positive.

Run a standalone tool. Export your bilingual file and run it through Xbench or Verifika. This catches things the CAT tool missed - or things you marked as false positive too hastily. Different tools have different check implementations; running two gives you better coverage.

Check the exported file. Open the final delivered file (not the bilingual project file, but the actual output - the .docx, .html, .xml, whatever format). This catches export-related issues: tags that look fine in the CAT but render incorrectly in the final format, encoding problems, font issues.

Human review

After all automated checks pass, the text still needs human review. Automated QA catches maybe 30-40% of all translation errors - the structural, formal, pattern-based ones. The remaining 60-70% are semantic, stylistic, and contextual - things only a qualified human reviewer can catch.

The ideal workflow matches the TEP model: translate with automated QA running, then a human editor reviews bilingual, then a human proofreader reviews monolingual. Automated QA doesn’t replace any of these steps. It makes each step faster and more focused, because the reviewer doesn’t waste time on tag errors and number mismatches - those are already handled.

For a full breakdown of what the TEP process covers and where automated QA fits in, see Translation QA: TEP Model and Multi-Step Quality Checks. And for a realistic look at what quality actually costs per word, check The Real Cost of Translation.

The economics

QA tools are cheap compared to the cost of errors. Xbench 2.9 is free. Xbench 3.0 is €99/year. Verifika is around $150/year. For context, a single rejected translation due to tag errors or number mismatches costs far more than that in rework time, client trust, and potential penalties.

According to ISO 18587 standards for post-edited machine translation, automated QA checks are not optional - they’re part of the minimum quality process. If you’re doing MTPE work without running QA checks, you’re operating below the standard the industry considers baseline.

FAQ

What QA checks are built into memoQ and Trados?

memoQ includes over 100 individual checks organized into categories: tag verification (missing, extra, or mismatched tags), number consistency (format, value, decimal separators), terminology adherence (against loaded term bases), segment-level checks (untranslated segments, identical source/target, empty targets), punctuation (trailing periods, brackets, quotation marks), formatting (double spaces, leading/trailing whitespace, capitalization), consistency (same source with different targets, same target for different sources), length restrictions (character count, segment length ratio), and regex-based custom checks. You can save these configurations as QA profiles and reuse them across projects.

Trados Studio includes around 50 check types. The core checks cover tag pairs, number verification, punctuation consistency, untranslated segments, double spaces, target segment length, and basic terminology verification against MultiTerm. Trados QA is effective for individual files but historically less convenient for batch processing across large multi-file projects. The 2024 version improved batch QA capabilities somewhat.

Does automated QA replace human proofreading?

No. Automated QA catches formal, pattern-based errors - tags, numbers, formatting, terminology violations against a glossary. It doesn’t understand meaning. If you translate “the bridge withstands 50 tons” as “the bridge supports 50 tons” and the source actually said “the bridge cannot withstand 50 tons” - QA tools see matching numbers, correct tags, and no issues. A human reviewer catches the missing negation.

Industry data consistently shows that automated QA catches roughly 30-40% of total translation errors. The rest are semantic, contextual, or stylistic - things that require a human who understands both languages and the subject matter. The best results come from combining both: automated QA handles the tedious pattern checks, freeing the human reviewer to focus on meaning and quality.

Is Xbench free or paid?

Both. ApSIC Xbench version 2.9 is free and still functional - many translators use it daily. It handles the core checks (consistency, terminology, tag pairs, number verification) across a wide range of file formats. Version 3.0 is a paid subscription at €99/year and adds features like improved regex support, faster processing, expanded format support, and a more modern interface. For most freelance translators, the free version covers the essentials. If you’re doing high-volume QA work or need advanced features, the paid version is worth the upgrade. You can download both from xbench.net.

How do I check terminology consistency in a translation?

Three approaches, ideally combined. First, load your glossary or term base into your CAT tool before you start translating. memoQ, Trados, and most other CAT tools will flag segments where a glossary term appears in the source but the approved translation doesn’t appear in the target. Second, run a consistency check - this looks for cases where the same source phrase has different translations across the project, regardless of whether it’s in the glossary. Third, use a standalone tool like Xbench to search across your TMs and glossaries simultaneously. Xbench is particularly good at this - you can load multiple reference files and instantly search for any term across all of them. For large projects or ongoing client relationships, maintaining and enforcing a term base is one of the highest-ROI quality investments you can make.

What is MQM and how does it relate to QA?

MQM (Multidimensional Quality Metrics) is an error classification framework maintained at themqm.org. It’s not a tool - it’s a standardized way of categorizing and scoring translation errors. MQM divides errors into dimensions (accuracy, fluency, terminology, style, design, locale conventions) with sub-categories and severity levels (critical, major, minor). When you or your client evaluates a translation, MQM gives you a common language: instead of “the quality is bad,” you can say “12 terminology errors (3 major, 9 minor) and 4 accuracy errors (1 critical, 3 major) across a 5,000-word sample.” Automated QA tools map roughly to some MQM categories - tag errors fall under “design,” number errors under “locale conventions,” terminology mismatches under “terminology.” But MQM also covers categories that require human judgment - mistranslation, omission, style register. The combination of automated QA (catching formal errors) plus human review scored against MQM (catching everything else) gives you the most complete quality picture available today.

Try ChatsControl

AI platform for professional translators

Try for free →