W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > February 2013

Re: ACTION-447: Make a batch transformation of the test suite to xliff

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 21 Feb 2013 21:43:50 +0100
Message-ID: <51268706.1040809@w3.org>
To: Yves Savourel <ysavourel@enlaso.com>
CC: public-multilingualweb-lt@w3.org
Am 21.02.13 21:24, schrieb Yves Savourel:
>
> > Independent of whether this is normative or not: how would the guidance look like?
>
> Maybe something like:
>
> In order to have identical default behavior, ITS processors SHOULD 
> apply the following default ITS rules when processing HTML5 (and the 
> link to the rules file).
>

OK ... but in the example below Fredrik said that Okapi specifies 
keywords to be translatable as a default. Can we expect that this is a 
default for all HTML filters? And related: Fredrik wrote "One of the 
default global html5 rules (in Okapi) specifies <meta name="keywords"…’s 
content to be translatable". Is this really a global rule (which 
somebody who knows where it is stored could modify), or a default, non 
ITS rules based processing?

Also, would the link be to a rules file in the spec or in the wiki (so 
that more easily it could be updated)? For the need to do that see e.g. 
here: HTML5.1 will have new elements, e.g.
https://dvcs.w3.org/hg/html-extensions/raw-file/tip/maincontent/index.html
and we state we are covering HTML5 or its sucessor with ITS2.

Also ... in the example below two data categories are processed by 
Okapi: Translate and Domain. What do we expect from a tool that only 
implements domain, with regards to the defaults of data categories not 
being implemented?

And asking again (I may have missed the answer, in that case sorry for 
that): why was this not needed for ITS 1.0? See
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Feb/0159.html
("what I still don't get" part).

Finally, let me ask about the

- Felix

> -ys
>
> *From:*Felix Sasaki [mailto:fsasaki@w3.org]
> *Sent:* Thursday, February 21, 2013 1:05 PM
> *To:* public-multilingualweb-lt@w3.org
> *Subject:* Re: ACTION-447: Make a batch transformation of the test 
> suite to xliff
>
> Am 21.02.13 20:55, schrieb Fredrik Liden:
>
>     Hi Marcin and Yves,
>
>     Regarding the comment for domain4html.html. Trans-unit id=1 is
>     actually the extraction of the content of <metaname=*"keywords"…*
>
>     One of the default global html5 rules (in Okapi) specifies
>     <metaname=*"keywords"…*’s content to be translatable and a second
>     rule specifies that it contains the default domain for the entire
>     <html> element.
>
>     The <script> rules applies the combined rules for the <body>
>     element only. So what you’re seeing is the <p> (trans-unit=2) with
>     the combined domains. But <metaname=*"keywords"…* being outside
>     the body only has the default domain applied.
>
>     Another example why we need some guidelines/expectations for html
>     5 behavior. J
>
>
> Independent of whether this is normative or not: how would the 
> guidance look like?
>
> Best,
>
> Felix
>
>
> Cheers,
>
> Fredrik
>
> *//*
>
> */domain4html.html/*
>
> <!DOCTYPE html>
>
> <html>
>
> <head>
>
> <metacharset=utf-8>**
>
> <metaname=*"keywords"*content=*"SPORTS LAW, Judicial Matters"*/>
>
> <metaname=*"x-mykeywords"*content=*"Sport, Law "*/>
>
> <scripttype=*"application/its+xml"*>
>
> *<*its*:*rules xmlns*:*its*=*"http://www.w3.org/2005/11/its" 
> <http://www.w3.org/2005/11/its>xmlns*:*h*=*"http://www.w3.org/1999/xhtml" 
> <http://www.w3.org/1999/xhtml>version*=*"2.0"*>*
>
> *<*its*:*param name*=*"domainParam"*>*keywords</its:param>
>
> <its:domainRule selector="/h:html/h:body" 
> domainPointer="/h:html/h:head/h:meta[@name='x-mykeywords' or 
> @name=$domainParam]/@content" domainMapping="'sports law' LAW, 'labor 
> law' LAW, 'contract law' LAW, 'competition law' LAW,'tort law' LAW"/>
>
> </its:rules>
>
> </script>
>
> </head>
>
> <body>
>
> <p>*Some text about sport and law.*</p>
>
> </body>
>
> </html>
>
> */domain4html.html.xlf/*
>
> <?xmlversion=*"1.0"*encoding=*"UTF-8"*?>
>
> <xliffversion=*"1.2"*xmlns=*"urn:oasis:names:tc:xliff:document:1.2"*xmlns:okp=*"okapi-framework:xliff-extensions"*xmlns:its=*"http://www.w3.org/2005/11/its" 
> <http://www.w3.org/2005/11/its>*>
>
> <fileoriginal=*"/Copy of 
> domain4html.html"*source-language=*"en-us"*target-language=*"fr-fr"*datatype=*"html"*>
>
> <body>
>
> <trans-unitid=*"1"*okp:itsDomain=*"SPORTS LAW, Judicial Matters"*>
>
> <sourcexml:lang=*"en-us"*>*SPORTS LAW, Judicial Matters*</source>
>
> <targetxml:lang=*"fr-fr"*>*SPORTS LAW, Judicial Matters*</target>
>
> </trans-unit>
>
> <trans-unitid=*"2"*okp:itsDomain=*"Sport, Law, SPORTS LAW, Judicial 
> Matters"*>
>
> <sourcexml:lang=*"en-us"*>*Some text about sport and law.*</source>
>
> <targetxml:lang=*"fr-fr"*>*Some text about sport and law.*</target>
>
> </trans-unit>
>
> </body>
>
> </file>
>
> </xliff>
>
> **
>
> -----Original Message-----
> From: Yves Savourel
> Sent: Tuesday, February 19, 2013 6:03 AM
> To: 'Mārcis Pinnis'; 'Multilingual Web LT Public List Public List'
> Subject: RE: ACTION-447: Make a batch transformation of the test suite 
> to xliff
>
> Hi Mārcis,
>
> I missed a few comments in your docx file.
>
> Here are the file with my additional notes (nothing major).
>
> (BTW: your comment about Domain  in domain4html.html is interesting.
>
> I'll try to look at the test output and see if it matches the info 
> output in the XLIFF file.
>
> If it does, this may be an interesting overriding case.)
>
> -ys
>
> -----Original Message-----
>
> From: Mārcis Pinnis [mailto:marcis.pinnis@Tilde.lv]
>
> Sent: Tuesday, February 19, 2013 5:24 AM
>
> To: Yves Savourel; 'Multilingual Web LT Public List Public List'; Dave 
> Lewis (dave.lewis@cs.tcd.ie <mailto:dave.lewis@cs.tcd.ie>)
>
> Cc: Felix Sasaki (fsasaki@w3.org <mailto:fsasaki@w3.org>)
>
> Subject: RE: ACTION-447: Make a batch transformation of the test suite 
> to xliff
>
> Hi Yves, all,
>
> I had a look at the examples. I believe that either I am missing 
> something (not understanding where the ITS 2.0 data is in the XLIFF 
> documents) or there is some backwards compatibility of content lost 
> when converting from the HTML/XML examples to XLIFF.
>
> 1. I had a look at the Terminology part and I could not find ITS 2.0 
> related terminology annotation in the XLIFF documents. I have attached 
> my findings to this e-mail.
>
> 2. With the Locale Filter I see that instead of having ITS 2.0 
> mark-up, the whole fragment has been removed and replaced with a 
> placeholder (is that because it is not possible to add Locale Filter 
> mark-up in XLIFF at all?). This does not preserve the content, but 
> filters out fragments based on ITS 2.0 consumption/production Use Case 
> scenarios (which is I guess an internal process and not for data 
> exchange purposes). And ... it actually does not show an XLIFF 
> document with the Locale Filter data category metadata in it (that was 
> what we wanted to see, but the examples, I believe do not show that). 
> Is this because XLIFF would not be able to handle ITS 2.0 annotation 
> or because of some other reasons (I am a bit confused here ... so I 
> would like to clarify)?
>
> Some other findings (more in the attached file) 3. The Language 
> Information as I understand it, will be fully passed on to xml:lang 
> (that is clear).
>
> 4. The Domain metadata seems to be transformed from ITS into an OKAPI 
> internal structure.
>
> 5. The Elements Within Text information as I understand it, is just 
> structural, so no mark-up is necessary (that is clear).
>
> Maybe I have just misunderstood what the XLIFF examples would contain? 
> I had the understanding that the transformation to XLIFF would 
> preserve ITS 2.0 metadata. Did I understand it wrong?
>
> Then ... I had a look also at the files in the "roundtrip-example" 
> directory. As I understand from Yves e-mail, these are not valid XLIFF 
> files, right?!
>
> I still had a look at the examples that contained terminology 
> annotation. I believe Terminology is used incorrectly:
>
> <mrk its:terminology="yes" its:termInfoRef="#ge1">Arizona</mrk>
>
> The attribute is its:term="yes" rather than terminology... (or am I 
> again missing out some information?)
>
> The files seemed not to have Domain and LocaleFilter metadata in them 
> - it would be great to see these categories in action as well.
>
> Best regards,
>
> Mārcis ;o)
>
> -----Original Message-----
>
> From: Yves Savourel [mailto:ysavourel@enlaso.com]
>
> Sent: Monday, February 18, 2013 4:52 PM
>
> To: 'Multilingual Web LT Public List Public List'
>
> Subject: ACTION-447: Make a batch transformation of the test suite to 
> xliff
>
> Hi all,
>
> I've done this action item.
>
> A batch file as well as the XLIFF output have been added to GitHub:
>
> https://github.com/finnle/ITS-2.0-Testsuite/commit/294018ba576799dcbee7b9566da83837dd69f4ae
>
> Notes:
>
> -- The XLIFF outputs are often identical because the test files are 
> just different ways to markup the same content.
>
> -- The XLIFF output often make little sense because the input 
> exercises only one data category. For example, a storage size 
> limitation set on a span ("inline") element will not show up on an 
> inline element in XLIFF because there is no information in the input 
> file that says the span element is 'within text' (since the test case 
> is about the storage size). IHMO the output are rather useless.
>
> -- Most data categories have output, but only when the extraction use 
> them. For example there is no output for directionality because, while 
> the Okapi ITS engine process and provides that data category, the 
> filter does nothing with it.
>
> Cheers,
>
> -yves
>
Received on Thursday, 21 February 2013 20:44:18 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 9 June 2013 00:25:08 UTC