RE: targetPointer Requirement update from Yves Savourel on 2012-05-07 (public-multilingualweb-lt@w3.org from May 2012)

From: Yves Savourel <ysavourel@enlaso.com>
Date: Mon, 7 May 2012 13:51:00 -0600
To: "'Dave Lewis'" <dave.lewis@cs.tcd.ie>, <public-multilingualweb-lt@w3.org>
Message-ID: <assp.04748b8ecd.assp.0474872d15.005a01cd2c8a$ba045ed0$2e0d1c70$@com>

Hi Dave,

> Where there is already an element structure in 
> the host document that indicates source and target
> content, what is the use case where the implementer 
> wouldn't read the relevant XLIFF or TMX schema 
> document to figure out how to parse this themselves.

When the implementer want to develop a generic tool that rely on ITS, and only ITS, to access the documents it processes. That tool does not want to know anything about XLIFF or TMX specifics other than the information it gets through the ITS rules.


> This seems simpler than defining a new standard tag
> in ITS to essentially explain the schema of XLIFF 
> and TMX.

It's simpler only if you develop just for XLIFF or just for TMX. If you target "any XML format" targetPointer is not only simpler it is the only way to go. If you have the proper ITS rule, you don't need to know each format you are working with. You can make you tool generic, and even work for formats that do not exists yet.

Let's start with the translate rule:

A given XML tool that implements ITS should be able to learn from the ITS rules (and only from them) what part of the text of an XML format ABC is translatable or not. It shouldn't need to know anything about the format ABC.

I assume we all are in agreement with that statement. If not, we need to stop here and debate that specific point, because I think it's one of the foundations of ITS.

Assuming we agree on that... Now, among the various XML formats, some of them do store the same text in several languages. XLIFF and TMX are two examples of such formats. But you have other cases: translation formats like TS, some CMS exports (e.g. Vignette), some types of resource files, etc.

With those type of formats, a given tool may need to know not only where is the translatable text, but also where the translated version of the same text resides in relation to the source. The targetPointer feature would allow that.



> Is there some class of useage of XLIFF and TMX
> that makes the interpretation of their source-target
> binding difficult to parse directly in practice?

The idea is that the tool does not necessarily has to know about XLIFF, TMX, etc. It can work in an abstract way by understanding the ITS rules.

Sure, if the type of work you want to do is complex, it may make sense to actually use a true XLIFF or TMX parser. But we shouldn't assume it's always the case. You can do plenty of things generically. Look at what applications such as ITS-Tool or Rainbow can do with XML documents they know only through their ITS rules.


> Also, consideration non-translation use cases such as 
> semantic tagging or parallel text extraction , it doesn't 
> seem likely that you'd do these without needing either 
> to write to the file or understand say the distinction 
> between translation and an alt-trans - in which case 
> you'd need a working understanding of XLIFF/TMX anyway.

Parallel text is actually a good example: Imagine you write some XSLT-based tool that can take the source and target entries of a XLIFF file and create one plain text file for the source entries and one for the target entries (a bit like the two parallel files needed to train Moses).

You can do it by hard-coding which XLIFF element stores the source and which one stores the target. ...Or you can use the ITS translateRule with its handy targetPointer information to write a generic tool that will work not only on XLIFF, but also TMX, TS, and any other XML formats for which you can define a targetPointer.

Cheers,
-yves

Received on Monday, 7 May 2012 20:01:57 UTC