W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > February 2013

RE: ACTION-447: Make a batch transformation of the test suite to xliff

From: Yves Savourel <ysavourel@enlaso.com>
Date: Tue, 19 Feb 2013 06:10:03 -0700
To: 'Mārcis Pinnis' <marcis.pinnis@Tilde.lv>, "'Multilingual Web LT Public List Public List'" <public-multilingualweb-lt@w3.org>
Message-ID: <004d01ce0ea2$6ec41c80$4c4c5580$@com>
Hi Mārcis,

Thanks for the feedback.


> I believe that either I am missing something (not understanding 
> where the ITS 2.0 data is in the XLIFF documents) or there is 
> some backwards compatibility of content lost when converting 
> from the HTML/XML examples to XLIFF.

You're not missing anything, only information:
You are just looking at the output of a work in progress (it says so in the readme.txt file but it's easy to miss)
The tables here http://www.opentag.com/okapi/wiki/index.php?title=ITS_Components provide the current status of the filters. Quite a few data categories are supported by the engine, but not mapped yet to XLIFF.


> 1. I had a look at the Terminology part and I could not find ITS 
> 2.0 related terminology annotation in the XLIFF documents. 
> I have attached my findings to this e-mail.

It's not mapped yet to ITS.
Currently the filter uses a <note> element with the info. No-one was implementing ITS (or even mtype='term') in XLIFF but we had to provide the info in a way tools could work with it for real projects, so we used <note>.


[from docx file comment]
> This is the only ITS specific mark-up (ecept the xml:lang)
> I see in the XLIFF ... no other presence of ITS is visible in the XLIFF examples.
> Am I missing something? I believe that a lot of information (all ITS 2.0 only 
> data categories) has been lost during the transformation.

As above: Terminology is just not mapped yet to ITS.
The current output is using a <note>:
<note annotates="source">Terms: discoursal point of view [REF:#TDPV];</note>


[from docx file comment]
> This was originally the <a> tag. But ... according to the HTML5 
> ITS mark-up it is also a term. Where did the terminology 
> information disappear (<mrk mtype='term'>
> )?

Same above.


[from docx comment]
> The language is FR – why is the text then in EN? Shouldn’t it 
> be empty as the translation has not yet been performed?

Practicality and remain from the past: Many tools didn't know how to deal with XLIFF (and some still do not do it correctly): the filter has a default option to copy the source text into the target (and I didn't unset it). Seeding the <target> has been useful in many real life scenarios. But technically, for those example, you're correct: it would be better to avoid the copy.


> 2. With the Locale Filter I see that instead of having ITS 2.0 mark-up,
> the whole fragment has been removed and replaced with a placeholder
> (is that because it is not possible to add Locale Filter mark-up in 
> XLIFF at all?). This does not preserve the content, but filters out 
> fragments based on ITS 2.0 consumption/production Use Case scenarios 
> (which is I guess an internal process and not for data exchange purposes).
> And ... it actually does not show an XLIFF document with the Locale Filter 
> data category metadata in it (that was what we wanted to see, but the 
> examples, I believe do not show that). Is this because XLIFF would not be 
> able to handle ITS 2.0 annotation or because of some other reasons 
> (I am a bit confused here ... so I would like to clarify)?

In this case the data category is implemented by the filter. You guessed correctly: The filter filters out the source parts not for the target locale (basically treat them like if they were with translate='no').


> 3. The Language Information as I understand it, will be fully 
> passed on to xml:lang (that is clear).

Actually currently we don't do anything with Language Information.


> 4. The Domain metadata seems to be transformed from ITS into 
> an OKAPI internal structure.

yes, an attribute in a user-defined namespace. we do this because there is no local attribute for Domain in ITS. The namespace we use for now is temporary. One needs to be defined, presumably by the XLIFF TC.


> 5. The Elements Within Text information as I understand it,
> is just structural, so no mark-up is necessary (that is clear).

Yes, like for Locale Filter, the filter applies the data category and uses inline markup for the elements 'within text'. Note that 'nested' elements are not properly handled currently: they should be in separate trans-unit, but are treated like inline for now.


> Maybe I have just misunderstood what the XLIFF examples 
> would contain? I had the understanding that the transformation 
> to XLIFF would preserve ITS 2.0 metadata. Did I understand it wrong?

My understanding is that some ITS data categories should be preserved (mapped in some way), others should be applied.


> Then ... I had a look also at the files in the "roundtrip-example" 
> directory. As I understand from Yves e-mail, these are not valid XLIFF 
> files, right?!

Correct. most are invalid. But, it's mostly syntax problems. I think they are much better examples then trying to map the test files.


> I still had a look at the examples that contained terminology 
> annotation. I believe Terminology is used incorrectly:
> <mrk its:terminology="yes" its:termInfoRef="#ge1">Arizona</mrk>
> The attribute is its:term="yes" rather than terminology... 
> (or am I again missing out some information?)

You're correct. I missed than one.


cheers,
-yves
Received on Tuesday, 19 February 2013 13:10:36 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 9 June 2013 00:25:08 UTC