MTPE for annual reports: when AI translation helps—and when it fails

A segment-level decision framework for using machine translation post-editing on your Geschäftsbericht—from 17 reporting seasons of actually doing the work

I translate DE↔EN financial documents for a living, full-time since 2008, and I have watched machine translation go from a punchline to a genuinely useful tool—on the right text. The honest answer to “should I run MTPE on my annual report” isn’t yes or no. It’s: which part of the report? The savings are real in some segments and an illusion in others, and the difference between the two is where most LSP project managers lose money. Here is the framework I actually use.

First, define your three tiers precisely

Most confusion about MTPE comes from people pricing one thing and buying another. There are three distinct products, and only one of them is publishable for a listed company.

Raw MT—no human touch

The machine output, untouched. The mechanical cost here is almost nothing: running a neural engine over roughly 50,000 words—about 200 pages—costs on the order of ten dollars of compute. Turnaround is minutes. But raw MT is a draft for internal reading at best. For a published Geschäftsbericht it is a starting point, never a deliverable. Anyone quoting you a near-zero “translation” price is quoting raw MT.

Light post-editing—fluency pass only

ISO 18587, the post-editing standard, defines light PE as correcting only the most serious errors so the text is understandable—and the standard pointedly does not apply to light PE; it only certifies full post-editing. Light PE typically runs at 40–50% of the full per-word rate, and a post-editor can move 4,000–8,000 words a day. It is fine for an internal management read of a foreign subsidiary’s numbers. It is not fine for anything an investor will see.

Full post-editing—human-translation quality

ISO 18587 sets the bar for full PE as output comparable to professional human translation: no omissions or additions, ambiguous sentences reformulated, grammar, syntax, semantics, terminology, punctuation, and style all checked. This is the only tier I would put near a published report. It runs roughly 50–70% of the full per-word rate, with throughput around 3,000–6,000 words a day against a human baseline near 2,000. Against my own specialist rate—starting at €0.85 per standard line, B2B net—full PE on genuinely repetitive content lands in the area of €0.45–0.60 per line. The qualifier genuinely repetitive is doing all the work in that sentence.

Where MTPE reliably works

The MT-safe zones in a financial report share two properties: high translation-memory match rates and structural repetition. The engine isn’t being asked to think; it’s being asked to reproduce stable, near-identical wording it has seen many times. That’s exactly what neural MT does well.

  • Standard IFRS and HGB notes—accounting-policy language that barely changes year to year and is heavily shared across filers.
  • Governance and remuneration tables—fixed labels, fixed structure, numbers carried straight through.
  • Boilerplate risk disclosures—stable, reusable wording that the company has approved in prior seasons.
  • Repetitive sustainability metrics tables—units and KPIs in a grid, low ambiguity, high repetition.

A segment-type analysis of annual reports backs this up, naming repeated table phrases (“high repetition, low ambiguity”) and boilerplate policies (“stable, reusable wording”) as the safe zones for machine-assisted translation. And there is peer-reviewed evidence specifically in finance: a 2019 study of NMT post-editing in the banking and finance domain found it “allows for substantial time savings and leads to equal or slightly better quality”—even with limited in-domain training data. Specialist providers lean into this with domain-trained engines; KERN, for instance, runs an in-house “finance engine” trained for the annual-report domain inside an MTPE workflow.

In my own projects the pattern is consistent. On an MDAX group’s notes section—call it 30,000 words, much of it recurring season to season against a mature TM—full PE comfortably delivers the 40–60% cost reduction the benchmarks promise. The engine plus a strong translation memory carries the repetition; I post-edit for the deltas. That is MTPE working as advertised.

Where MTPE consistently fails

Now the other half of the report—and the half that defines the reader’s impression of the company:

  • CEO and CFO letters—register and credibility signalling carry the meaning; tone is the message.
  • CSRD and ESRS narrative disclosures—legally defined terminology with regulatory exposure.
  • MD&A qualitative commentary—syntactic ambiguity and judgement on what to emphasise.
  • Forward-looking statements—approved wording that must be preserved exactly.

Here the machine actively costs you money. A practitioner at the financial-translation specialist EnglishBusiness put it bluntly: given the linguistic standards an annual report demands, post-editing machine output “would be more time-consuming than human translation” on their proven workflow. The same segment analysis flags strategy narrative (“nuance and tone carry meaning”), risk factors (“legal exposure if wording drifts”), and forward-looking statements (“approved wording must be preserved”) as human-only zones.

CSRD makes this sharper. Terms like “materiality,” “taxonomy,” and “risks and opportunities” carry clearly defined, legally relevant meaning under the ESRS standards. A plausible-looking mistranslation isn’t a style nit—it’s direct regulatory exposure, alongside the usual MT failure modes of inconsistent terminology and wrong number and unit formats. My own rework rate tells the story: post-editing a CEO letter from raw MT, I keep maybe one sentence in five. Reworking four-fifths is slower than translating the whole thing once, properly, with the tone right from the first draft.

The hidden cost trap

Here is where headline numbers mislead. A 60% reduction on the translation line item almost never produces a 60% reduction on the project. The per-word (or per-line) rate is the most quoted metric in the industry and the most incomplete; treating it as your total investment is how budgets overrun. A realistic total-cost-of-ownership view has to add what the rate leaves out:

  • Linguistic QA, which is frequently a separate billable service requiring its own reviewers and tools.
  • Error-propagation risk in tables—a single misplaced decimal or transposed unit that QA must catch before publication.
  • Client review rounds, which multiply when the machine output reads fluently but says something subtly off.
  • Project management and file engineering—segmenting the document, routing tiers correctly, reassembling the deliverable.

To find your true break-even, don’t apply one blended discount to the whole report. Split the word count by segment type, apply the realistic tier saving only to the MT-safe portion, and add the fixed QA and PM overhead back on top. A report that is 70% templated tables and notes and 30% narrative behaves completely differently from one that is the reverse. And if you are standing up an MTPE programme from scratch rather than buying it ready-made, expect a 6–12 month break-even before the tooling and trained post-editors pay for themselves.

The data-confidentiality red line

Everything above assumes you are allowed to send the text to an engine at all. For a pre-publication annual report under NDA—ad-hoc-relevant numbers before they’re public—the engine choice is a confidentiality decision, not a quality one. The tiers that matter:

  • Consumer (free) DeepL: submitted text may be used to improve the service, which includes model training. Off-limits for material non-public financial data.
  • DeepL Pro API: a contractual commitment that your content isn’t used for training, stored only as long as technically required, with at most a 72-hour encrypted debugging buffer that is then auto-deleted; access logs are scrubbed of any actual source or translated text.
  • ModernMT (enterprise): strict per-customer isolation—your content and TMs sit in your private area, unusable by other customers or by ModernMT itself—on ISO 27001-certified infrastructure, with on-premise deployment available for finance, legal, and other regulated sectors.
  • In-house / on-premise engine: the text never leaves your or your supplier’s controlled environment. The strongest option where data residency is non-negotiable.

One nuance regulated buyers miss: “not training” and “not retaining” are different promises. Paying for Pro removes the training concern, but a true zero-retention guarantee covering backups, caches, and server logs is a contractual matter—you secure it in the data-processing agreement, not by upgrading your subscription. Two practical moves. First, name the permitted engine in the translation brief explicitly; don’t leave it to the post-editor’s default web tool. Second, read the supplier’s DPA for retention duration, sub-processors, and EU data-residency before the first file moves. The segment analysis above adds a third: log which content was machine-assisted, so legal counsel understands the exposure. If you want to know how I handle confidential pre-publication material in my own workflow, that’s on my privacy page and we can put it in writing.

A nine-point MTPE briefing checklist

Hand this to whoever scopes the job. Each item changes either the price or the risk.

  1. File format and CAT-tool compatibility—a clean, segmentable source (not a flattened PDF) is what makes TM and MT leverage possible at all.
  2. Target quality tier—light or full PE, specified per segment type, not one tier for the whole document.
  3. Existing TM and terminology assets—your translation memory and glossary are what make the MT-safe zones actually safe; supply them.
  4. Preferred engine—named explicitly, chosen for both domain fit and confidentiality.
  5. In-house review scope—who on your side reviews, and how many rounds, so the schedule is honest.
  6. Hard deadline—the real filing date, because MTPE’s throughput advantage only helps if the QA gate still fits.
  7. NDA and data-residency status—flagged up front; it dictates the engine, not the other way around.
  8. Subject-matter context for the post-editor—what the company does, the reporting standard in play, last year’s report for reference.
  9. Post-delivery QA gate—who signs off on numbers and terminology before publication, and against what checklist.

Quick-reference decision table

Segment type → recommended tier → indicative price band (against a full specialist line rate starting at €0.85). Ranges are practitioner estimates, not a quote.

  • Standard IFRS / HGB notes → Full PE → ~50–60% of full rate; reliable 40–60% saving.
  • Governance & remuneration tables → Full PE → ~50% of full rate; high repetition, watch number formats.
  • Boilerplate risk disclosures → Full PE → ~50–60% of full rate, only against an approved TM.
  • Repetitive sustainability metrics tables → Full PE → ~50–60% of full rate; verify units rigorously.
  • MD&A qualitative commentary → Human translation → full rate; MTPE saving evaporates.
  • CSRD / ESRS narrative disclosures → Human translation → full rate; legally defined terms, regulatory exposure.
  • CEO / CFO letters → Human translation → full rate; tone and credibility are the deliverable.
  • Forward-looking statements → Human translation → full rate; approved wording must be preserved exactly.

For the record: I don’t offer sworn or certified translation—court-sworn colleagues handle those—and the framework above is about quality and confidentiality, never about being the cheap option. If you want help splitting your next report by segment and pricing each tier honestly, that’s exactly the conversation I like having. See services and pricing or request a quote.