W3C home > Mailing lists > Public > public-i18n-its-ig@w3.org > November 2014

RE: ACTION-54: Try to come up with example of xliff+its test format / output

From: Yves Savourel <ysavourel@enlaso.com>
Date: Thu, 6 Nov 2014 15:13:46 -0700
To: "'Felix Sasaki'" <fsasaki@w3.org>, "'Estreen, Fredrik'" <Fredrik.Estreen@lionbridge.com>
CC: "'public-i18n-its-ig'" <public-i18n-its-ig@w3.org>
Message-ID: <009f01cffa0e$ef54c530$cdfe4f90$@enlaso.com>
What do you mean by 'strip the whitespace'? 
1- delete all whitespace?
2- normalize the whitespace?
	a- trim the leading and trailing?
	b- normalize the leading/trailing?

[  Text  Text ] -> [TextText]

[  Text  Text ] -> [ Text Text ]

[  Text  Text ] -> [Text Text]

Thanks,
-yves


-----Original Message-----
From: Felix Sasaki [mailto:fsasaki@w3.org] 
Sent: Thursday, November 6, 2014 2:40 PM
To: Estreen, Fredrik
Cc: Yves Savourel; public-i18n-its-ig
Subject: Re: ACTION-54: Try to come up with example of xliff+its test format / output

HI Fredrik and Yves, all,

I would calculate the offset based on element textual content, zero is start of the element, tags themselves are not counted, and
the whitespace is always stripped. Since roundtripping is not needed the whitespace stripping does not hurt.
See the NIF conversion at
http://www.w3.org/TR/its20/#conversion-to-nif
including the note about whitespace stripping.

Best,

Felix 

Am 06.11.2014 um 22:24 schrieb Estreen, Fredrik <Fredrik.Estreen@lionbridge.com>:

> Hi Yves, Felix,
> 
> How would this work in cases where xml:space != "preserve"? A generic XML processor might normalize the space and thus invalidate
the offsets if insignificant whitespace is not preserved.
> 
> Regards,
> Fredrik Estreen
> 
>> -----Original Message-----
>> From: Yves Savourel [mailto:ysavourel@enlaso.com]
>> Sent: den 6 november 2014 15:34
>> To: 'Felix Sasaki'
>> Cc: 'public-i18n-its-ig'
>> Subject: RE: ACTION-54: Try to come up with example of xliff+its test 
>> format / output
>> 
>> Hi Felix,
>> 
>> Can you specify a bit more how the offset would be computed?
>> It seems the zero is the start of the element (e.g. <source>) content.
>> But how would we count the inline element?
>> 
>> <source>Text<sm id='1' translate='no'/>data</source>
>> 
>> "Text" = 0,4
>> "data = 31,35
>> 
>> The problem is that we don't always know how long the inline tag is 
>> in the document (you can have extra spaces between attributes, some 
>> attributes with default values may be omitted, etc.)
>> 
>> Or should we count each inline tag as 1 character?
>> 
>> Which would give:
>> 
>> "Text" = 0,4
>> "data = 5,9
>> 
>> 
>> Thanks,
>> -yves
>> 
> 
Received on Thursday, 6 November 2014 22:14:15 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:11:31 UTC