W3C home > Mailing lists > Public > public-i18n-its-ig@w3.org > November 2014

Re: [xliff] Preserve Space / Language Info at the inline level

From: Dr. David Filip <David.Filip@ul.ie>
Date: Thu, 20 Nov 2014 11:16:22 +0000
Message-ID: <CANw5LKkVWMqXieP-oh=TVtbbB4Ggh+-1FWJ=pa==-KEuxvAm5w@mail.gmail.com>
To: "Estreen, Fredrik" <Fredrik.Estreen@lionbridge.com>
CC: Yves Savourel <ysavourel@enlaso.com>, XLIFF Main List <xliff@lists.oasis-open.org>, public-i18n-its-ig <public-i18n-its-ig@w3.org>
Fredrik, thanks for this detailed explanation.

IMHO, we cannot change the core and that is baseline for the following..

1) The point about prettyprinting is well made, still it does not warrant a
core change AFAIK.
We can make a Warning, in which we explain that generic XML processors will
not uderstand inline XLIFF/ITS notations for whitespace handling, so if
people want to make their XLIFF files safe for these, they should make the
set or inherited value of xml:space on <source> and <target> "preserve"

That should do the trick, no matter what the other decisions under 2)

2) I now understand what you mean by an ITS dependency.
IMHO this low risk. Same as the W3C ITS our conformance clause for the
module should say that it is enough to support one data category.
[Similarly slr says it is enough to support just the predefined profiles]
This category can as well be the Preserve Space category..

This said, ITS categories can be thematically grouped and modules made
smaller. This has been done informatively to ease adoption, as 20
categories is a lot..

Or we can decide to make a module for inline handling of xml namespace
attributes. This would have its own namespace and the ITS mapping would be
using it for expressing the Preserve Space and Language information
categories.
Possible names of the module could be:
Inline Handling of Space and Language
OR
Core Supplement [;-)]

Finally,
in the light of the above discussion I think it was a good 2.0 decision to
disallow xml namespace on inlines, as having it on <mrk> and the
specialized xliff handing only on <sm> would even further complicate the
issue of generic XML processors, who would interpret only part of the
inline set and inherited values, i.e. the part on <mrk> elements, ignoring
the values on the set values on <sm> elements that are likely to override
the inherited <mrk> values.
It also shows there is value in the Preserve Space strategies using core
only means.
Hiding the whitesspaces to be preserved in the original data where
xml:space is restricted to preserve seems the only fully expressive and
fool proof way..

Cheers
dF


Dr. David Filip
=======================
OASIS XLIFF TC Secretary, Editor, and Liaison Officer
LRC | CNGL | CSIS
University of Limerick, Ireland
telephone: +353-6120-2781
*cellphone: +353-86-0222-158*
facsimile: +353-6120-2734
http://www.cngl.ie/profile/?i=452
mailto: david.filip@ul.ie

On Tue, Nov 18, 2014 at 7:13 PM, Estreen, Fredrik <
Fredrik.Estreen@lionbridge.com> wrote:

> Hi Yves, all
>
> I thought a bit more on this after today's XLIFF call and I'm starting to
> think that we cannot properly do this without a change to the core
> specification if we want it as a generic feature and not an ITS module
> feature. Long discussion bellow. Tl;dr sentence: Put ITS space and language
> mapping in a separate namespace than the rest of the ITS mapping module,
> add PR to set "xml:space" to "preserve" on a higher level element if any
> span need it.
>
> As it currently stands the validity of an XLIFF document (or adherence to
> our processing requirements) is not changed if the document is passed
> through an XML pretty printing application that respect the "xml:space"
> attribute and uses the schema. The default is "default" on everything
> except on the <data> element where it is restricted to only ever being
> "preserve" and defaults to that. Sometimes translatable content need to
> include leading or trailing whitespace that must be preserved. That can be
> done by setting "xml:space" to "preserve" on an ancestor of such content.
> That would also handle <ignorable>'s which a lot of the time will only
> contain whitespace and quite possibly spaces and newlines mixed up together
> as it's only content.
>
> If we want to allow an inline notation to control treatment of whitespaces
> on the type of logical spans we allow across segments there is no way we
> can tell an arbitrary XML processor about how they work. So a pretty
> printer (or storage system or some other kind of processor) would never see
> that we want preservation of spaces on a particular span. Which could lead
> to for example pretty printing to violate our processing requirements. I
> doubt you will find any XML pretty printer / formatter that will not make
> some change to a bunch of whitespace sitting alone in an <ignorable> but
> protected by a <sm/><em/> pair in other sibling trees.
>
> To make the space handling safe with respect to non XML spans and generic
> XML processors we should at a minimum make <source> and <target> use
> "xml:space" set to "preserve" as their default and only possible value.
> Treating content that has "xml:space" set to "default" as if it was set to
> "preserve" is not an error, the opposite is an error. So by using standard
> XML constructs to enforce the more restrictive mode and then allowing a
> nonstandard mechanic to relax to a less restrictive mode should always be
> safe.
>
> If we make the change to the default value and add a constraint that it
> must be set explicitly if set to "preserve" a 2.x document would still be
> 100% compatible with a 2.0 document. A 2.0 document could not make use of
> the new non-XML span space handling but a 2.0 processor would not cause
> harm to a 2.x document.
>
> The same goes for ITS specified space handling in XLIFF. A non ITS aware
> 2.1 processor (or a 2.0 processor) would not have any way to know that
> whitespace in a particular span must be preserved. So the ITS module must
> allow non ITS module aware processors to do changes to "preserve" tagged
> whitespace, which seems bad. Or it must require that "xml:space" is set to
> "preserve" on a higher level if any span contain space that should be
> preserved is encountered.
>
> I'd like us to eventually have a solution for the space handling that is
> not tied to ITS and that the ITS module would make use of that feature. And
> that things like pretty printing would not violate XLIFF processing
> requirements. If that can't be done right now we need to at least we need
> to add the higher level setting of "xml:space" to "preserve" to the ITS
> mapping module to make it safe.
>
> Perhaps we could break out this (and xml:lang) into a namespace of its
> own. That way a later version of core could adopt it from the ITS module
> without semantic changes. Thus avoiding the situation where we would have
> one core and one module feature do the exact same thing, or needing to make
> an incompatible change.
>
> Regards,
> Fredrik Estreen
>
> > -----Original Message-----
> > From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On
> Behalf
> > Of Yves Savourel
> > Sent: den 18 november 2014 16:20
> > To: XLIFF Main List
> > Cc: 'public-i18n-its-ig'
> > Subject: [xliff] Preserve Space / Language Info at the inline level
> >
> > Hi David, all,
> >
> > Looking at:
> >
> http://tools.oasis-open.org/version-control/browse/wsvn/xliff/trunk/xliff-
> > 21/xliff-core-v2.1.pdf
> >
> > Section "5.9.4.1 ITS Preserve Space Annotation"
> >
> > I'm not sure if this is the best way to define the annotation for
> Preserve
> > Space, and I assume, Language Information later.
> >
> > There are two options:
> >
> > a) We have two attributes itsm:space and itsm:lang that can be set in any
> > <mrk>/<sm> element, regardless of the type (just like translate).
> >
> > In that case we get this type of annotations:
> >
> > <mrk id='m1' translate='no' itsm:space='preserve' itsm:space='zxx'>3x +
> 5y =
> > 2</mrk>
> >
> > <mrk id='m2' type='term' itsm:lang='fr-CA'>poutine</mrk>
> >
> > <mrk id='m3' type='itsm:any' itsm:space='preserve'>[  ]=2s</mrk>
> >
> > Etc.
> >
> >
> > Or b) we decide to force a specific annotation for Preserve Space and for
> > Language Information that are not mixed with others.
> >
> > In that second case, the simplest way to define them would be:
> >
> > <mrk id='1' type='itsm:space value='preserve'>...</mrk>
> >
> > <mrk id='2' type='itsm:lang' value='fr-CA'>...</mrk>
> >
> >
> > Also, it seems to me that it would be a lot more clear for the reader to
> have
> > just one ITS Module section (no appendix) and have each data category
> > defined there, regardless how they are mapped.
> >
> > Cheers,
> > -yves
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe from this mail list, you must leave the OASIS TC that
> > generates this mail.  Follow this link to all your TCs in OASIS at:
> > https://www.oasis-
> > open.org/apps/org/workgroup/portal/my_workgroups.php
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>
>
Received on Thursday, 20 November 2014 11:17:31 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:11:31 UTC