AI Hallucinations in Legal Translation: Real Threat or Overhyped?

How ChatGPT and other AI tools fabricate legal terms, add nonexistent clauses to contracts, and why that's dangerous - with real court cases and hard numbers.

Also in: RU EN UK

A translator ran a lease agreement through ChatGPT, got a smooth German text, skimmed through it - everything looked perfect. Then the client called: “It says here the tenant can terminate without penalty within 14 days. That wasn’t in the original.” And they were right - the model “added” a clause that didn’t exist in the Ukrainian source text. A classic hallucination that could’ve cost the client real money. If you work with legal texts and use AI even as a first draft - this one’s for you.

What’s an AI hallucination and why it’s worse than a regular mistake

A hallucination is when a model generates something that wasn’t in the input data and presents it as fact. Not just a wrong word choice (that’s a regular error), but inventing new information that didn’t exist in the original.

For regular text, this might just be awkward. For legal text - it’s potentially catastrophic. Here’s why:

  • A regular translation error (wrong case, calque, imprecise term) - you see it and fix it. It “looks like a mistake”
  • A hallucination - looks completely normal. Grammatically correct, stylistically consistent, terminologically plausible. But the information doesn’t exist in the source

Picture the difference: an NMT system like DeepL translates “Haftung” as “responsibility” instead of “liability” - that’s an error, but you’ll catch it because it looks off in context. An LLM might add “limited liability capped at EUR 10,000” - when the original just said “Haftung.” And you won’t catch that unless you’re checking sentence by sentence against the source.

Not all hallucinations are equal. In legal document translation, there are three categories, and each is dangerous in its own way.

Addition (content addition)

The model “writes in” information that wasn’t in the original. This is the most insidious type because the added content often looks logical and appropriate.

Real examples from practice: - A specific penalty amount appears in a contract translation, though the original just said “penalties” - ChatGPT “added” a specific drug dosage in a medical report translation, though the original only mentioned the medication name - In product documentation translation, a product suddenly became “fully waterproof” when the original had no such claim - which could trigger legal consequences if a customer files a complaint

Omission (content omission)

The model “eats” part of the text. Dropped negations are especially dangerous - one missing “not” flips the meaning from “prohibited” to “permitted.”

On a translator forum, someone described a situation where ChatGPT translating a contract from Ukrainian to German dropped the negation in a liability clause. Instead of “The contractor shall not be liable for…” they got “Der Auftragnehmer haftet für…” - literally the opposite meaning. If that had gone to the client without review, the consequences could’ve been serious.

Rewriting (meaning rewrite)

The model “rephrases” in a way that changes the meaning. Grammar is perfect, style matches, but the legal substance is different.

For example, “the tenant may extend the agreement subject to the landlord’s approval” becomes “the tenant has the right to extend the agreement” - the nuance vanished, and the legal consequences are completely different. One is a conditional possibility, the other is an unconditional right.

Real cases: when hallucinations got expensive

This isn’t theoretical. Over the past two years, more than 300 documented cases of AI hallucinations led to real consequences in the legal field.

Mata v. Avianca (US, 2023): a lawyer submitted a court filing citing 6 court decisions generated by ChatGPT. None of those decisions existed. The judge called it “an unprecedented circumstance.” Both lawyers and their firm were fined $5,000.

California court (2024): two law firms were fined $31,000 for submitting briefs with fake AI-generated citations. The lawyers didn’t verify the output before filing.

Deloitte and fake sources (Australia, 2025): a Deloitte report submitted to the Australian government contained fabricated academic sources and a fake quote from a “court ruling.” At stake - a contract worth AUD 440,000.

In July 2025 alone, Thomson Reuters documented 22 instances where courts or opposing parties found nonexistent cases in submitted filings. And a Stanford University study found that when LLMs answer legal questions, they hallucinate at least 75% of the time about court rulings.

For translators, this is a direct warning: if AI fabricates entire court precedents - what stops it from fabricating a legal term or adding a nonexistent clause to a contract translation?

Not every text is equally risky for AI translation. Legal documents are the “perfect storm” for hallucinations, and here’s why.

Terminological precision. The German word “Gesellschaft” in a business contract - is it “partnership” or “company”? AI translated it as “society” - and the entire contract loses its meaning. Legal terms have precise definitions that depend on jurisdiction, document type, and context. AI doesn’t always grasp that.

Legal formulae. Fixed legal phrases - “without prejudice to,” “subject to compliance with,” “taking into account the requirements of” - have established translations in the target language. AI might “rephrase” them, and the legal meaning shifts.

Chain reaction. One error in a contract can invalidate the entire document or change the parties’ rights. This isn’t like marketing copy, where one bad word is just a style issue.

No “approximately correct.” In regular translation, “close enough” might work. In legal translation - it’s either precise or it’s wrong. “The landlord may” vs. “the landlord shall” - one word difference, thousands of euros in consequences.

Numbers: how often does AI actually mess up

Here’s the concrete data so you understand the scale.

Model Overall hallucination rate Note
Gemini 2.0 Flash 0.7% Lowest rate in 2025
GPT-4o 1.5% Consistently solid
Claude 3.7 Sonnet 4.4% Decent but higher than GPT
Claude 3 Opus 10.1% High hallucination rate

But these are general numbers across all tasks. For legal texts, the situation is worse. Stanford’s 2025 study found that even RAG-based systems (AI with access to legal databases) hallucinate on 17% to 34% of legal queries.

For translation specifically: AI systems add nonexistent content, drop critical words, and silently rewrite meaning. Studies from SemEval 2025 (Mu-SHROOM) and ACL 2025 (CCHall) confirmed that translation into less-supported languages (including Ukrainian) remains a hotspot for hallucinations.

For comparison: NMT systems like DeepL make mistakes too, but their errors are more predictable - wrong case, calque, mismatched term. You spot the error immediately. With LLM hallucinations, you might not notice the problem until the client calls.

How to protect yourself: a checklist for translators

Here are concrete steps to minimize hallucination risk when working with AI and legal texts.

No exceptions. Even if ChatGPT produced a perfect-looking translation - check every sentence against the original. Yes, it takes time. Yes, it’s tedious. But it’s the only reliable way to catch a hallucination.

2. Check numbers, dates, and amounts separately

AI hallucinates with specific figures especially often. If the original says “penalty of 5% of the contract value” - make sure the translation didn’t turn into “penalty of EUR 5,000” or “penalty of 50% of the value.”

3. Hunt for negations

Open the translation and check every “not,” “never,” “prohibited,” “except when” - make sure they exist in the original, and make sure they didn’t vanish from the translation. A dropped negation is the most dangerous hallucination.

4. Use AI as a draft, not a final product

The MTPE approach (machine translation post-editing) is the ideal format for legal texts. AI generates the first version, you review and correct. It’s faster than translating from scratch but safer than blindly trusting AI.

5. For official documents - humans only

If the document is going to court, a notary, or any government agency - an AI translation without certification by a sworn translator won’t be accepted anyway. For these cases, AI can only be a tool to speed up the translator’s work, never a replacement.

6. Pick the right model

For legal texts, NMT systems (DeepL, Google Translate) produce more predictable errors that are easier to catch. LLMs (ChatGPT, Claude) handle context better in long documents, but the hallucination risk is higher. We covered the NMT vs LLM comparison in detail in a separate article.

After everything above, it might seem like AI for legal texts is pure evil. It’s not. AI is a powerful tool when used correctly.

First draft. AI can save 40-60% of time on the first pass. You get a structured translation with correct baseline terminology - and then you edit rather than translate from scratch.

Cross-checking your own work. Translated a document manually - run the original through AI and compare. Sometimes the model catches terms you missed or suggests a better phrasing.

Terminology lookup. Instead of a dictionary, you can ask ChatGPT “what’s the correct translation of this legal term from Ukrainian to German in the context of employment law” - and get not just the translation but an explanation of usage context.

Terminology consistency. For long documents, an LLM can “hold” a glossary in context and translate consistently throughout - something NMT can’t do.

The key distinction: use AI as an assistant, not a substitute. Legal translation is the field where a human translator remains irreplaceable, and that’s unlikely to change anytime soon.

FAQ

ChatGPT can create a decent draft translation of a legal document, but the result absolutely requires careful human review. The main problem is hallucinations: the model can add nonexistent clauses, drop negations, or “rephrase” text in ways that change the legal meaning. For official documents going to court or government agencies, an AI translation without review by a sworn translator won’t cut it.

What’s the difference between a translation error and an AI hallucination?

A regular error is when the model mistranslates a word or term - you see something’s off and fix it. A hallucination is when the model invents information that wasn’t in the original but does it so smoothly that without comparing to the source you might not notice. Hallucinations are more dangerous because they look convincing - correct grammar, matching style, but fabricated content.

The only reliable method is comparing the translation to the original sentence by sentence. Pay special attention to: numbers and amounts (did new figures appear?), negations (did any “not” disappear?), specific conditions and deadlines (were any restrictions added or changed?). There are no automated tools for detecting translation hallucinations yet - only manual review.

Which AI hallucinates the least when translating?

Based on 2025 benchmarks, Gemini 2.0 Flash shows the lowest overall hallucination rate (0.7%), followed by GPT-4o (1.5%). But for legal texts specifically, these numbers are higher. NMT systems (DeepL, Google Translate) technically don’t “hallucinate” in the classic sense - they make different types of errors (calques, wrong terms) that are easier to spot and fix.

Yes, but as a tool, not a replacement. AI can save 40-60% of time on the first translation pass, help with terminology lookup, and maintain terminology consistency in long documents. But the final review must be done by a qualified translator, and for official documents, certification by a sworn translator remains mandatory.

Try ChatsControl

AI platform for professional translators

Try for free →