SIX

//Leverage translation assets for MT customization//

Prepare existing data (previous translations & terminology)

–Is format suitable?

–Is content suitable?

  • are there any gaps in the TM aligned pairs?
  • are there any duplicates in the translation memory?
  • how are translation units defined? (word-level, sentence level, paragraph level)
  • are TMs clean from formatting data?
  • are sentences too long?
  • is terminology properly identified?
  • are term equivalents correctly established?

This can really make a difference in MT output quality: from 28% to 33% BLEU score (30% is rated as “understandable” content”). Some reports indicate you can reach up to 64,2% (an experiment in the pair German-English, Schnaider 2012)

Are you feeding wrong data into the system?

SEVEN —>

Anuncios

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s