W3C home > Mailing lists > Public > public-i18n-its-ig@w3.org > October 2014

RE: [xliff] ITS: Preserve space and Language Information

From: Yves Savourel <ysavourel@enlaso.com>
Date: Fri, 24 Oct 2014 08:31:34 -0600
To: "XLIFF Main List" <xliff@lists.oasis-open.org>, "'public-i18n-its-ig'" <public-i18n-its-ig@w3.org>
Message-ID: <00ab01cfef97$369a8470$a3cf8d50$@enlaso.com>
Hi David, all,

> ...
> In case of terminology, we did say that all terminology is encoded as inline,
> even though it may apparently exist at structural elements in various source formats..
> We said that the use case where the whole element is terminology is not statistically 
> significant to warrant different handling.
>
> The situation is opposite but analogical here. IMHO and AFAIK whitespace handling and 
> language information are inherently structural characteristics when encoding natural 
> language text. and we actually do NOT inhibit expressivity of XLIFF by not introducing 
> the truly inline variants that could possibly be transformed into <sm/>/<em/> pairs.
> ...

Going from a structural element to an inline one in the Terminology case is easy: you don't lose anything.
But forcing some inline formatting information to drive segmentation is completely different and very restrictive.
In addition to losing granularity you also assume the segmentation is done by the extractor agent.

I see plenty of technical documents where inline formatting mixes spans of true text with fixed-space sections. Elements like <code>, <var>, <kbd>, etc. in HTML (and their counterparts in DITA, DocBook, etc.) are examples of such spans where the style often requires preserving the spaces. There is no way we can reasonably use segmentation to apply that information.

The bottom line is that if we didn't have <sm/> we would not have this discussion and everyone would see xml:space and xml:lang as perfectly natural in <mrk>. This tells me the issue is how to represent those two features with <sm/>.
Trying to rationalize how we can avoid inline cases is just wishful thinking.

Ideally what we should have done in 2.0 was to allow xml:lang and xml:space in <mrk> and declare XLIFF Core attributes ‘space’ and ‘lang’ for <sm/> to work around the scope issue.

But we are at 2.1 now, and we can't modify the Core. So, in my opinion, using the ITS module to get an inline solution seems to be the best we can do now.

Cheers,
-yves
Received on Friday, 24 October 2014 14:32:04 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:11:31 UTC