How to Evaluate Machine Translation Quality: A Practical Checklist

You ordered a contract translation through an online service, got the file back, opened it - and there’s text in a language you barely understand. It looks clean, formatting’s intact, and Google Translate seems to confirm everything’s fine. You sign it. Three months later you find out that “the tenant is obligated to” turned into “the tenant has the right to,” and now it’s costing you 15,000 euros. If you want to avoid surprises like this, here’s a checklist that’ll help you evaluate translation quality even without deep knowledge of the target language.

Why “looks fine” isn’t a quality indicator¶

The main trap with machine translation - it produces smooth, grammatically correct text. Especially modern models like ChatGPT, Claude, or DeepL. Sentences are well-constructed, style matches, punctuation’s on point. And that’s exactly what makes checking it so tricky.

Old Google Translate from ten years ago spit out nonsense you could spot immediately. You’d see the “translation” and know something was off. Modern AI makes elegant mistakes. It can add information that wasn’t in the original, skip a key word, or quietly change the meaning - and all of it will look perfect.

HSBC paid $10 million for a rebrand after their slogan “Assume Nothing” got translated as “Do Nothing.” Norway’s Olympic team ordered 15,000 eggs instead of 1,500 because of a Google Translate error with Korean counting systems. And Swedish Amazon at launch in 2020 displayed product cards with “hand-knitted penis” instead of “knitted pencil case” - and it sailed through automated translation without a single flag.

These examples are funny, but when it comes to legal translation or medical documentation - the consequences aren’t funny at all.

What exactly can go wrong: 6 types of MT errors¶

Before you check a translation, it helps to know what types of errors you’re looking for. Here are six main categories, from the most dangerous to the most obvious.

1. Omissions¶

The machine “eats” part of the text. The most treacherous error type because you can’t see what isn’t there. Especially dangerous when a negation disappears - and “the contractor shall not be liable” becomes “the contractor shall be liable.” One forum user described a situation where ChatGPT simply “swallowed” the probation period clause while translating an employment contract - and nobody noticed until signing.

2. Hallucinations (additions)¶

The opposite problem - the machine adds things that weren’t in the original. AI hallucinations are especially dangerous in legal texts: the model might “write in” a specific penalty amount, add a termination clause, or fabricate a reference to a law that doesn’t exist. According to 2025 benchmarks, even the best models hallucinate 0.7-4.4% of the time - and for legal texts those numbers are significantly higher.

3. Terminology errors¶

Is “Haftung” in a contract “liability” or “material liability”? Is “Gesellschaft” a “society,” a “company,” or a “partnership”? Machine translation often picks the common-use variant instead of the specialized legal term. Google Translate shows 92% accuracy for basic English-Spanish medical phrases but only 57.7% for complex medical terminology.

4. Meaning rewrites¶

“The tenant may extend the lease subject to the landlord’s approval” becomes “the tenant has the right to extend the lease.” The nuance disappears, the legal meaning changes. The machine “simplifies” the sentence, and along with the grammatical complexity, important conditions vanish.

5. Numbers, dates, units of measurement¶

In medical translation, mixing up mg and mcg (milli- and micrograms) can be life-threatening. In financial documents, confused commas and periods in numbers (1.500 in Germany means fifteen hundred, while in the US it’s one point five) can cost serious money.

6. Style mismatches¶

The least dangerous but most noticeable type - when a formal document is translated in casual language or vice versa. A birth certificate that reads like a text message isn’t what Ausländerbehörde will accept.

Checklist: 10 points for verifying translation quality¶

Even if you don’t speak the target language - these 10 points will help you catch problems. Take the list and start checking.

1. Compare text lengths¶

Open the original and translation side by side. If the original is 3 pages and the translation is 2, something’s missing. If the translation is 30%+ longer - the machine may have hallucinated extra content. For most European languages, translation differs from the original by 10-25% (German text is usually longer than English, but shorter than Russian or Ukrainian).

2. Check all numbers, dates, and amounts¶

You can do this even without knowing the language. Go through every number in the original and find it in the translation. 15,000 euros should be 15,000 euros, not 1,500 and not 150,000. Dates should match - and pay attention to format (DD.MM.YYYY in Europe vs MM/DD/YYYY in the US). Addresses, phone numbers, bank details - everything should be identical.

3. Verify proper names and titles¶

Last names, company names, streets, cities - they’re either transliterated or kept in the original language, but never “translated.” If “Kovalchuk” became “Kowalczyk” or “Kyiv” became “Kiev” - that’s a red flag.

4. Find untranslated fragments¶

Sometimes the machine leaves chunks of text in the source language. This happens especially often with tables, footnotes, headings, or text in parentheses. One skipped paragraph can be critical.

5. Verify document structure¶

The number of paragraphs, list items, table rows should match. If the original has 12 contract clauses and the translation has 11 - there’s a problem. If a 5-column table turned into 4 - same thing.

6. Do a back translation¶

Take the translation and run it through a different service back into the source language. Compare with the original. If the back translation is drastically different from the original - there’s a problem. This method isn’t perfect (double translation always introduces distortion), but major discrepancies are a clear signal.

7. Check terminology consistency¶

The same word should be translated the same way throughout the document. If “Vertrag” on page 1 is “contract,” on page 5 it’s “agreement,” and on page 8 it’s “arrangement” - the translation is inconsistent. For CAT tools this is a standard feature, but raw machine translation often “forgets” how it translated a term earlier.

8. Look for “AI markers”¶

Machine translation has telltale signs: excessively long sentences, template-like constructions, repetitive transitions between paragraphs (“furthermore,” “moreover,” “it is worth noting”). If the text reads like a 2005 textbook - chances are no human reviewed it.

9. Evaluate formatting¶

Tables should be tables, lists should be lists. If the original had a numbered list with 8 items and the translation merged them into one paragraph - document processing quality is low. Preserving formatting during translation is a separate topic, but it needs checking too.

10. Ask a native speaker to read it¶

The most reliable method - find someone who speaks the target language and ask them to read the translation. Doesn’t have to be a specialist - even a friend who understands the language can catch obvious nonsense, unnatural phrasing, or meaningless segments.

Quality metrics: what BLEU, COMET, and MQM mean¶

If you’ve ever encountered discussions about machine translation quality, you may have seen these acronyms. Here’s what they mean in plain language.

BLEU (Bilingual Evaluation Understudy) - the oldest and most well-known metric. It counts how many words and phrases in the machine translation match a “reference” human translation. Scale from 0 to 100. Above 30 is “acceptable,” above 50 is “good.” The problem: two equally correct translations with different wording will score low on BLEU because the words don’t match. That’s why BLEU is gradually falling out of favor.

COMET - a modern metric based on a neural network trained on human quality judgments. It understands meaning, not just word overlap. If a translation conveys the right meaning with different words - COMET will recognize that. It’s considered far more accurate than BLEU for modern translation systems.

MQM (Multidimensional Quality Metrics) - not a metric but an evaluation framework with 100+ error types. Each error gets a “weight”: minor (1 point), major (5 points), critical (25 points). Categories: terminology, accuracy, fluency, style, locale conventions. This is the gold standard for professional quality assessment, used by major translation companies and defined in the new ISO 5060:2024 standard.

For you as a client, these metrics mean one thing: if a provider can show COMET or MQM results - they’re serious about quality. If they’re bragging about a “high BLEU score” - that’s an outdated metric, and you should ask follow-up questions.

Red flags: when you should definitely worry¶

Some signs scream “there’s a problem” before you even open the translation file.

Price is too low. If someone offers to translate 4,000 words for $50 - it’s either raw machine translation without editing, or dumping. A professional translator does 2,000-2,500 words per day. Specialized translation (legal, medical) costs $0.15-0.30 per word. If the price is below $0.08 per word - ask whether editing is included.

Turnaround is unrealistically fast. 20 pages in 2 hours - that’s definitely machine translation. Maybe with post-editing (MTPE), maybe without. Ask.

The provider doesn’t ask questions. A good translator always clarifies: what’s the purpose of the translation, which authority it’s for, whether there’s a glossary or previous translations. If they silently took the document and returned it an hour later - be suspicious.

No certifications or reviews. For certified translation you need a translator with the appropriate credentials. For regular translation - at least client reviews or a portfolio. If there’s neither - that’s a risk.

Translation came back as .txt instead of original format. If you sent a .docx with tables and formatting and got back plain text - the document was run through basic MT without any processing.

How much does a translation quality check cost¶

Sometimes it’s cheaper to pay for an independent review than to deal with the fallout of a bad translation.

Service	Approximate price
Independent proofreading	$0.03-0.08 per word
Editing + proofreading	$0.05-0.12 per word
Full quality review (LQA) by MQM	$25-75 per hour
Back translation for verification	cost of a regular translation
Second opinion from a native speaker	often free (if you know someone)

For important documents (contracts, medical reports, court documents) - the review always pays for itself. Checking a 10-page contract costs $50-100. An error in a contract can cost tens of thousands.

When machine translation is good enough (and when it’s not)¶

Not every translation needs perfect quality. Here’s an honest breakdown.

MT without editing works for: - Internal communication (getting the gist of a colleague’s email) - Browsing foreign-language content (articles, reviews, news) - Preliminary document assessment (understanding what it’s about before ordering professional translation)

MT with post-editing works for: - Marketing materials on a limited budget - Technical documentation with repetitive terminology - Large volumes of text where perfect quality isn’t critical

Professional translation only for: - Legal documents - contracts, powers of attorney, articles of incorporation - Medical documentation - diagnoses, prescriptions, discharge summaries - Documents for government authorities - from Ausländerbehörde to Standesamt - Financial statements for audit - Anything requiring certified translation

FAQ¶

How do I check translation quality if I don’t speak the language?¶

Compare text lengths (a difference over 25% is suspicious), check all numbers, dates, and proper names, do a back translation through a different service and compare with the original. If you can - ask a native speaker to read the translation. These methods won’t replace professional review, but they’ll help catch major errors.

How much does an independent translation quality check cost?¶

Proofreading costs roughly $0.03-0.08 per word, full editing is $0.05-0.12 per word. For a 10-page document that’s approximately $50-150. For important documents (contracts, medical certificates, court documents) this investment pays for itself - a translation error can cost far more.

What’s a BLEU score and should I pay attention to it?¶

BLEU is a metric that counts word overlap between machine and “reference” translation. Scale from 0 to 100, above 30 is acceptable, above 50 is good. But BLEU is outdated: it doesn’t understand meaning, just compares words. Modern metrics like COMET are more accurate because they evaluate meaning, not just form. If a provider mentions BLEU - that’s fine, but also ask about COMET or MQM.

What are the most dangerous machine translation errors?¶

The most dangerous are omissions (when the machine “eats” part of the text, especially negations like “not”) and hallucinations (when the machine adds information that wasn’t in the original). Both types are very hard to spot without detailed comparison to the original, because the text looks smooth and correct. In legal documents, a missing “not” can flip the meaning entirely.

Can I trust back translation as a quality check method?¶

Back translation is a useful tool, but not a perfect one. You translate the result back into the source language using a different service and compare. Major discrepancies are a clear signal of a problem. But this method won’t catch style errors, wrong register, or subtle meaning shifts. Use it as one tool in your toolkit, not the only one.

Need a professional translation?

AI translation + human review + notary certification

Order translation →