RE: [xliff] Preserve Space / Language Info at the inline level

Hi Yves, all

I thought a bit more on this after today's XLIFF call and I'm starting to think that we cannot properly do this without a change to the core specification if we want it as a generic feature and not an ITS module feature. Long discussion bellow. Tl;dr sentence: Put ITS space and language mapping in a separate namespace than the rest of the ITS mapping module, add PR to set "xml:space" to "preserve" on a higher level element if any span need it. 

As it currently stands the validity of an XLIFF document (or adherence to our processing requirements) is not changed if the document is passed through an XML pretty printing application that respect the "xml:space" attribute and uses the schema. The default is "default" on everything except on the <data> element where it is restricted to only ever being "preserve" and defaults to that. Sometimes translatable content need to include leading or trailing whitespace that must be preserved. That can be done by setting "xml:space" to "preserve" on an ancestor of such content. That would also handle <ignorable>'s which a lot of the time will only contain whitespace and quite possibly spaces and newlines mixed up together as it's only content.

If we want to allow an inline notation to control treatment of whitespaces on the type of logical spans we allow across segments there is no way we can tell an arbitrary XML processor about how they work. So a pretty printer (or storage system or some other kind of processor) would never see that we want preservation of spaces on a particular span. Which could lead to for example pretty printing to violate our processing requirements. I doubt you will find any XML pretty printer / formatter that will not make some change to a bunch of whitespace sitting alone in an <ignorable> but protected by a <sm/><em/> pair in other sibling trees.

To make the space handling safe with respect to non XML spans and generic XML processors we should at a minimum make <source> and <target> use "xml:space" set to "preserve" as their default and only possible value. Treating content that has "xml:space" set to "default" as if it was set to "preserve" is not an error, the opposite is an error. So by using standard XML constructs to enforce the more restrictive mode and then allowing a nonstandard mechanic to relax to a less restrictive mode should always be safe.

If we make the change to the default value and add a constraint that it must be set explicitly if set to "preserve" a 2.x document would still be 100% compatible with a 2.0 document. A 2.0 document could not make use of the new non-XML span space handling but a 2.0 processor would not cause harm to a 2.x document. 

The same goes for ITS specified space handling in XLIFF. A non ITS aware 2.1 processor (or a 2.0 processor) would not have any way to know that whitespace in a particular span must be preserved. So the ITS module must allow non ITS module aware processors to do changes to "preserve" tagged whitespace, which seems bad. Or it must require that "xml:space" is set to "preserve" on a higher level if any span contain space that should be preserved is encountered.

I'd like us to eventually have a solution for the space handling that is not tied to ITS and that the ITS module would make use of that feature. And that things like pretty printing would not violate XLIFF processing requirements. If that can't be done right now we need to at least we need to add the higher level setting of "xml:space" to "preserve" to the ITS mapping module to make it safe. 

Perhaps we could break out this (and xml:lang) into a namespace of its own. That way a later version of core could adopt it from the ITS module without semantic changes. Thus avoiding the situation where we would have one core and one module feature do the exact same thing, or needing to make an incompatible change.

Regards,
Fredrik Estreen

> -----Original Message-----
> From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf
> Of Yves Savourel
> Sent: den 18 november 2014 16:20
> To: XLIFF Main List
> Cc: 'public-i18n-its-ig'
> Subject: [xliff] Preserve Space / Language Info at the inline level
> 
> Hi David, all,
> 
> Looking at:
> http://tools.oasis-open.org/version-control/browse/wsvn/xliff/trunk/xliff-
> 21/xliff-core-v2.1.pdf
> 
> Section "5.9.4.1 ITS Preserve Space Annotation"
> 
> I'm not sure if this is the best way to define the annotation for Preserve
> Space, and I assume, Language Information later.
> 
> There are two options:
> 
> a) We have two attributes itsm:space and itsm:lang that can be set in any
> <mrk>/<sm> element, regardless of the type (just like translate).
> 
> In that case we get this type of annotations:
> 
> <mrk id='m1' translate='no' itsm:space='preserve' itsm:space='zxx'>3x + 5y =
> 2</mrk>
> 
> <mrk id='m2' type='term' itsm:lang='fr-CA'>poutine</mrk>
> 
> <mrk id='m3' type='itsm:any' itsm:space='preserve'>[  ]=2s</mrk>
> 
> Etc.
> 
> 
> Or b) we decide to force a specific annotation for Preserve Space and for
> Language Information that are not mixed with others.
> 
> In that second case, the simplest way to define them would be:
> 
> <mrk id='1' type='itsm:space value='preserve'>...</mrk>
> 
> <mrk id='2' type='itsm:lang' value='fr-CA'>...</mrk>
> 
> 
> Also, it seems to me that it would be a lot more clear for the reader to have
> just one ITS Module section (no appendix) and have each data category
> defined there, regardless how they are mapped.
> 
> Cheers,
> -yves
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-
> open.org/apps/org/workgroup/portal/my_workgroups.php

Received on Tuesday, 18 November 2014 19:14:26 UTC