Comment on B.2.1.2 Inline Elements section from Yves Savourel on 2014-12-15 (public-i18n-its-ig@w3.org from December 2014)

From: Yves Savourel <ysavourel@enlaso.com>
Date: Mon, 15 Dec 2014 09:55:22 -0700
To: "XLIFF Main List" <xliff@lists.oasis-open.org>
CC: <public-i18n-its-ig@w3.org>
Message-ID: <002501d01887$ea304d30$be90e790$@enlaso.com>
Hi all,

Currently (Dec-15) the section "B.2.1.2 Inline Elements" related to preserve Space reads:

[[
B.2.1.2 Inline Elements

It is not possble to use [XML namespace] on XLIFF inline elements. It is advised that mixed Preserve Space behavior is NOT used
inline in source formats. In case of extraction of source format inline elements with mixed Preserve Space behavior, it is advised
to extract all discernable portions with uniform whitespace handling into different <unit> elements that can have their whitespace
handling set independently.

Whitespace handling can be also set independently for text segments and ignorable text portions within an Extracted unit and for the
source ad target language within the same <segment> or <ignorable> element using the OPTIONAL xml:space attribute at the <source>
and <target> elements. However, mixed whitespace handling behavior is not likely to survive Segmentation Modification. So this
method is not advised unless the <segment> elements are protected by the canResegment flag value set to or inhrited as no.

Preserved whitespaces can be also extracted as original data stored outside of the translatable content at the unit level and
referenced from placeholder codes. It is importnat to note that the value of the xml:space attribute is restricted to preserve on
the <data> element.

...
]]

I think this section is a bit incorrect and harmful as it is written now.

> It is advised that mixed Preserve Space behavior is 
> NOT used inline in source formats.

I'm not sure what this means. Having mixed PS (preserved space) behavior exists in XML and HTML (and other formats). I don't
understand the statement. Especially since the next sentence says it can occurs.


> In case of extraction of source format inline elements 
> with mixed Preserve Space behavior, it is advised to 
> extract all discernable portions with uniform whitespace 
> handling into different <unit> elements that can have 
> their whitespace handling set independently.

I disagree: Creating different units must never be driven by inline white-spaces behavior, it must be driven by the normal
structural markup of the source document.


> Whitespace handling can be also set independently for text 
> segments and ignorable text portions within an Extracted 
> unit and for the source ad target language within the 
> same <segment> or <ignorable> element using the OPTIONAL 
> xml:space attribute at the <source> and <target> elements.
> However, mixed whitespace handling behavior is not likely 
> to survive Segmentation Modification. So this method is 
> not advised unless the <segment> elements are protected by 
> the canResegment flag value set to or inhrited as no.

It can survive by using the ITS module (or whatever module PS ends up in). The whole point of the section is about just that.


> Preserved whitespaces can be also extracted as original data 
> stored outside of the translatable content at the unit level 
> and referenced from placeholder codes. It is importnat to 
> note that the value of the xml:space attribute is restricted 
> to preserve on the <data> element.

This is, in my opinion, a bad advice. We do not want to preserve whitespace by making them inline codes. The least inline codes
there is the better. There are better way to avoid losing white spaces than doing this.


The section should be how to use the ITS module to preserve (or not) spaces inline and that's all.

If we want to give advice on how to avoid losing white space inline without using the module that should be a note in the section
"4.3.2.2 xml:space" which is about whitespace in the Core.

I would propose to have a paragraph there that says something like this:

"
The xml:space attribute is not available in inline elements, instead use itsm:space.
If you cannot use that attribute (e.g. if your file is processed by tools not supporting the attribute), one of the safest ways to
extract content with mixed preserved space behavior is the following:"
- when creating the extracted unit, normalize the parts of it where whitespace does not need to be preserved
- then set xml:space='preserve' for the whole unit.
"

Cheers,
-yves
Received on Monday, 15 December 2014 16:55:50 UTC