W3C home > Mailing lists > Public > public-i18n-its-ig@w3.org > October 2014

RE: [xliff] ITS: Preserve space and Language Information

From: Yves Savourel <ysavourel@enlaso.com>
Date: Thu, 23 Oct 2014 08:13:48 -0600
To: "'Dr. David Filip'" <David.Filip@ul.ie>
CC: "'XLIFF Main List'" <xliff@lists.oasis-open.org>, "'public-i18n-its-ig'" <public-i18n-its-ig@w3.org>
Message-ID: <003801cfeecb$902db4c0$b0891e40$@enlaso.com>
Hi David, all,

 

While in some cases (like multiple spaces between sentences) using <ignorable> with xml:space could be a solution, that can’t solve all use cases, and, as pointed out, that will cause trouble when re-segmenting.

 

The other solution (using inline codes to store spans of white-spaces) looks like asking for troubles: The main reason for such complicated option would be because xml:space can’t be set in <mrk>. It would also not solve the xml:lang case. In general we do not want to encourage using more inline codes.

 

I think the simplest and most comprehensive solution is to have its:space and its:lang defined and behaving just like xml:space and xml:lang, but with the sm-specific scope. That doesn’t preclude anyone to use the other options if they really want to go that road.

 

It simply means that if you want to handle Preserve Space or Language Information at the inline level, you have to support that part of the ITS module (which is really not complicated when you already have to handle xml:space and xml:lang for the Core). That means one cannot guarantee those features will be preserved by Core-only processors. But it’s already the case in 2.0. 

 

Cheers,

-yves

 

 

From: Dr. David Filip [mailto:David.Filip@ul.ie] 
Sent: Thursday, October 23, 2014 7:04 AM
To: Yves Savourel
Cc: XLIFF Main List; public-i18n-its-ig
Subject: Re: [xliff] ITS: Preserve space and Language Information

 

Thanks, Yves,

 

I was thinking about two possible solutions.

One of them would be as you propose to introduce its attributes that could work with empty markers as span delimiters.

 

Another way would be to use the fact that the two relevant XML namespace attributes are still available on <source> and <target>

Not sure if this is an omission, probably not as we have PR for resegmentation accounting for that.

 

This would be somewhat restrictive but would have the advantage that the related mark up would be always well formed

 

I tried to write up such restrictive solution for Preserve Space in the Current Working draft.

It also notes that you can use originalData to preserve whitespace..

 

I copy paste it here:

 

Preserve Space

Indicates how to handle whitespace in a given content portion. See [ITS] Preserve Space for details.

Structural Elements

 Whitespace handling at the structural level is indicated with xml:space in XLIFF Core and extensions: 

Extraction of preserved whitespace at the structural level

Original:

 

<listing xml:space='preserve'>Line 1

Line 2</listing>

        

Extraction:

 

<unit id='1' xml:space='preserve'>

 <segment>

  <source>Line 1

Line 2</source>

 </segment>

</unit>

        

 

Inline Elements

 It is not possble to use [XML namespace] on XLIFF inline elements. It is advised that mixed Preserve Space behavior is NOT used inline in source formats. In case of extraction of source format inline elements with mixed Preserve Space behavior, it is advised to extract all discernable portions with uniform whitespace handling into different <unit> elements that can have their whitespace handling set independently. 

Whitespace handling can be also set independently for text segments and ignorable text portions within an Extracted unit and for the source ad target language within the same <segment> or <ignorable> element using the optional xml:space attribute at the <source> and <target> elements. However, mixed whitespace handling behavior is not likely to survive Segmentation Modification. So this method is not advised unless the <segment> elements are protected by the canResegment flag value set to or inhrited as no. 

Preserved whitespaces can be also extracted as original data stored outside of the translatable content at the unit level and referenced from placeholder codes. It is importnat to note that the value of the xml:space attribute is restricted to preserve on the <data> element.

Extraction of preserved whitespaces as referenced original data

Original:

 

 <p>

   <span xml:space='preserve'>Item 1      Item 2      Item n+1 

   </span> are all used to build Item n+2.

 </p>

     

Extraction:

 

<unit id='1'>

  <originalData>

    <data id="d1">&lt;span xml:space='preserve'></data>

    <data id="d2">&lt;/span></data>

    <data id="d3">      </data>

    <data id="d4"> 

    </data>

  </originalData>

  <segment>

    <source><pc id="1" dataRefStart="d1" dataRefEnd="d2">Item 1<ph id="2" dataRef="d3">Item 2<ph id="2" dataRef="d3">Item n+1<ph id="2" dataRef="d4"></pc> are all used to build Item n+2.</source>

  </segment>

</unit>

        

 

Not sure really which solution is better, but I'd say we should explore both..

 

Cheers

dF




Dr. David Filip

=======================

OASIS XLIFF TC Secretary, Editor, and Liaison Officer 

LRC | CNGL | CSIS

University of Limerick, Ireland

telephone: +353-6120-2781

cellphone: +353-86-0222-158

facsimile: +353-6120-2734

 <http://www.cngl.ie/profile/?i=452> http://www.cngl.ie/profile/?i=452

mailto:  <mailto:david.filip@ul.ie> david.filip@ul.ie

 

On Thu, Oct 23, 2014 at 1:41 PM, Yves Savourel <ysavourel@enlaso.com <mailto:ysavourel@enlaso.com> > wrote:

Hi all,

It seems to me that we don't have a good solution for the inline cases of the Preserve Space and Language Information data
categories.

In the original draft mapping we used xml:space and xml:lang on <mrk>.
But, as David pointed out, this can't work because these attributes are not allowed on <mrk>/<sm>.
I believe we did this because of <sm>: both xml:lang and xml:space scopes would apply to an empty element.

But we cannot have no inline solution for those two data categories.
So it seems they would fall into the class of the data categories only partially supported directly by the core, and we need
ITS-module attributes to handle them inline. Something like this: <mrk id='1' type="its:any" its:space="preserve" its:lang="iu">.

Cheers,
-yves





---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

 
Received on Thursday, 23 October 2014 14:14:17 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:11:31 UTC