AI Translation and Privacy: What Happens to Your Data

March 2023. A Samsung engineer pastes confidential source code into ChatGPT to debug an error. A week later, a colleague uploads a transcript of an internal meeting. A third one optimizes a test sequence for detecting chip defects. Within 20 days, a $350 billion company leaked secret data into OpenAI’s training dataset three separate times. If Samsung - with its army of security professionals - couldn’t prevent this, you as a translator who handles confidential client documents every day need to understand exactly what happens to text after you hit “Translate.”

What happens to your text after you click “Translate”¶

When you paste text into any online translator - DeepL, Google Translate, ChatGPT - it doesn’t get translated magically on your computer. The text is encrypted and sent to the company’s servers, sometimes in Germany, sometimes in the US, sometimes to multiple data centers simultaneously.

The translation comes back in seconds, but what happens to the original text afterward? That’s where things get interesting. The difference between “deleted immediately” and “stored for model training” is the difference between client confidentiality and a potential data leak.

The core rule: free versions almost always store and use your text. Paid versions usually don’t. But “usually” is a dangerous word when you’re dealing with a client’s €500,000 contract. So let’s break down each service.

Who stores, who deletes: AI translator comparison¶

Service	Plan	Stores text?	Trains on data?	GDPR
DeepL	Free	Yes, temporarily	Yes	Partial
DeepL	Pro (from $8.74/mo)	No	No	Yes
ChatGPT	Free / Plus	Yes by default	Yes (can disable)	Partial
ChatGPT	API / Enterprise	No	No	Yes
Google Translate	Free	Yes	Likely	Partial
Google	Cloud Translation API	No	No	Yes
Claude	Free / Pro	Yes (since 09.2025)	Yes by default	Partial
Claude	API	No (7-day logs)	No	Yes
Microsoft	Free apps	Partially	Possibly	Partial
Microsoft	Azure Translator	No (No-Trace)	No	Yes

Now the details.

DeepL: free vs Pro¶

Free DeepL stores text temporarily and uses it to improve its neural networks. The terms of use explicitly state: you’re not allowed to translate texts containing personal data through the free version. DeepL themselves are telling you - the free version isn’t meant for confidential documents.

DeepL Pro (from $8.74/mo for the individual plan) works differently: text is encrypted, not stored after translation, and not used for training. For teams, there are Team ($28.74/user) and Business ($57.49/user) plans with additional security guarantees. More about DeepL’s capabilities in the DeepL vs Google Translate comparison.

ChatGPT and OpenAI¶

This one’s trickier. By default, everything you type into ChatGPT Free or Plus is used to train future models. When you paste a client’s contract for translation, it potentially becomes part of the training data for the next GPT version.

You can disable this: Settings → Data Controls → “Improve the model for everyone.” But even with training turned off, OpenAI still retains conversation logs for monitoring.

For Team and Enterprise accounts, training is off by default. The API is also safer: data is kept for only 30 days for abuse monitoring and never used for training. If you’re already using ChatGPT for translations, there are specific prompts and approaches for document translation that help you get better results.

Google Translate¶

Free Google Translate operates under Google’s general terms. The company can analyze your texts to improve services. There’s no specific promise of “we don’t store your translations” for the free version.

Google Cloud Translation API (the paid version) is a different story. Google explicitly states: client content is not used for any purpose other than providing the service. Texts aren’t stored after translation.

There’s also an offline mode in the mobile app - it runs entirely on your device without sending data to servers. But translation quality is noticeably lower.

Claude (Anthropic)¶

Since September 2025, Anthropic uses conversations from Claude Free, Pro, and Max to train models - unless you’ve disabled this in your privacy settings. Previously, data was kept for 30 days; now it’s up to 5 years for accounts with training enabled.

Incognito mode excludes a specific conversation from training data. API access is safer: data never goes to training, logs are automatically deleted after 7 days.

Microsoft Translator¶

Azure Translator has the strictest policy on the market - “No-Trace.” Text isn’t stored before or after translation. For companies working with sensitive data, that’s a strong selling point.

But the free apps (Microsoft Translator, Bing Translator, Edge) may store small fragments for quality improvement.

Real leaks: when “I only pasted it once” went wrong¶

Samsung and ChatGPT (2023)¶

Three separate incidents in 20 days. One engineer pasted confidential source code to find a bug. Another uploaded an internal meeting transcript. A third optimized a test sequence for detecting chip defects.

Samsung launched disciplinary investigations against all three, restricted input to 1,024 bytes per prompt, and blocked ChatGPT company-wide for months. They eventually restored access, but with strict internal rules. The problem wasn’t that AI “stole” data - it’s that people didn’t understand where that data was going.

Statoil and Translate.com (2017)¶

Norwegian oil giant Statoil (now Equinor, $68 billion revenue) used the free Translate.com for internal documents. In September 2017, journalists from NRK discovered that contracts, workforce reduction plans, termination letters, and even passwords had become accessible through regular Google searches. Anyone could find and read them.

The reason: Translate.com stored texts in the cloud for volunteer translators and failed to implement proper access controls. The Oslo Stock Exchange blocked access to the site, but the data was already public.

Why this matters for translators¶

These cases aren’t just about big corporations. If you’re a translator working with legal documents, medical records, or corporate contracts, one accidental leak could mean a lawsuit from a client, reputation damage, and the end of your business. According to IBM (2024), the average cost of a data breach is $4.88 million globally. For a freelancer, even a minimal fine would be catastrophic.

Data leaks are a separate threat from AI hallucinations in legal translations, but the consequences can be even more severe.

Since 2018, EU authorities have issued over 2,245 GDPR fines totaling €5.65 billion. In 2025 alone - €2.3 billion, a 38% increase from the previous year. Maximum fine: €20 million or 4% of annual revenue (whichever is higher).

Here are the specific risks for translators:

NDA. Signed an NDA with a client and pasted their document into free ChatGPT? Technically, you’ve breached the non-disclosure agreement. Even if no leak occurred. The mere act of transmitting data to a third party without the client’s consent is a violation.

Attorney-client privilege. Legal documents are protected by attorney-client privilege. Uploading them to a cloud service without security guarantees can qualify as a breach of confidentiality.

Medical data. Translating discharge summaries, diagnoses, medical reports - that’s processing sensitive personal data under GDPR. Requirements here are at their strictest.

EU AI Act. The full compliance deadline for high-risk systems is August 2, 2026. This adds another layer of regulation for companies using AI tools with personal data. Understanding the difference between LLMs and traditional NMT helps you assess which tools fall under stricter requirements.

Checklist: how to protect client data when working with AI¶

1. Paid versions for confidential documents¶

DeepL Pro, ChatGPT API, Google Cloud Translation API, Claude API, Azure Translator - all have data non-retention policies. $8-10/mo for DeepL Pro is less than the cost of one lost client. Free versions - only for non-sensitive texts: restaurant menus, tourist brochures, personal correspondence.

2. Disable training on your data¶

ChatGPT: Settings → Data Controls → turn off “Improve the model for everyone.” Claude: check Privacy Settings. It’s not perfect (logs are still kept temporarily), but it’s significantly better than the default settings.

3. Anonymize before translating¶

Before pasting a document into AI, replace real names, addresses, account numbers with placeholders: [NAME], [ADDRESS], [NUMBER]. After translation, restore the real data. It’s an extra step, but for critical documents it’s worth it. Proper prompts for translation also help you control what the model does with your text.

4. Check where data is physically stored¶

DeepL stores data in the EU (Germany, Finland). OpenAI - primarily in the US. For GDPR compliance, personal data shouldn’t be transferred outside the EU without appropriate safeguards. If your clients are in the EU, this matters when choosing a tool.

5. Document your security policy¶

Create a short document: which tools you use, what security measures you apply, how you handle confidential data. This protects you legally and gives you an edge over competitors. Clients increasingly check translator qualifications before ordering - and the question about data security is becoming standard.

6. Consider local models for top-secret documents¶

For critically confidential texts, there are models that run offline on your computer without sending data to the internet. Quality still falls short of cloud solutions, but your data is guaranteed to stay put.

FAQ¶

Is it safe to use DeepL for confidential documents?¶

Free DeepL - no. Texts are stored temporarily and used to improve models. DeepL even prohibits translating texts with personal data in the free version. DeepL Pro (from $8.74/mo) - yes, texts are encrypted and deleted immediately after translation.

Does ChatGPT use my translations for training?¶

By default - yes, on Free and Plus plans. You can disable this in Settings → Data Controls. On Team and Enterprise, training is off by default. Through the API, data is never used for training.

What should I do if I’ve already pasted confidential documents into a free translator?¶

Stop using free versions for confidential texts. Assess the risk - if there’s an NDA, you may need to notify the client. Switch to a paid plan or API. Create an internal data handling policy so this doesn’t happen again.

Yes, if you process personal data of EU residents - and that’s almost always the case when translating documents containing names, addresses, medical or financial information. GDPR applies to any processing of personal data, regardless of business size.

What’s the safest alternative for translating secret documents?¶

For maximum security - local translation models that run offline on your computer. For documents that need legal force - a certified translation from a sworn translator who is bound by professional secrecy and doesn’t use cloud services when handling originals.

Try ChatsControl

AI platform for professional translators

Try for free →