AW: I18N-ISSUE-246: Clarify character encoding behavior when calculating storage size [ITS-20] from Stephan Walter on 2013-02-28 (www-international@w3.org from January to March 2013)

From: Stephan Walter <stephan.walter@cocomore.com>
Date: Thu, 28 Feb 2013 14:20:39 +0000
To: Norbert Lindenberg <w3@norbertlindenberg.com>, Yves Savourel <ysavourel@enlaso.com>, Felix Sasaki <fsasaki@w3.org>
CC: "public-multilingualweb-lt-comments@w3.org" <public-multilingualweb-lt-comments@w3.org>, 'www-international' <www-international@w3.org>
Message-ID: <9F17297605BABC4F8C79050627BCF0F20F8A9D7C@AMSPRD0511MB548.eurprd05.prod.outlook.>

Hi Yves, all,

while I do agree that the property applies to the content I still think that it will generally be interpreted only with respect to translation results, not to source text. I'm not quite sure if I understand when it would make sense to use 'storage size' to say that 'this source data may not be longer than x bytes'. 
At least it seems to me that there is a little context-dependency in what we would expect a consumer of the information to do. In the use case described in the specification, the constraint will normally not really matter as long as you are looking at data that will be, but has not yet been translated. The constraint will be due to a property of the target data store. The source text may be from a data store where the size constraint doesn't apply, and it will never be written to the target data store that has the size limit. The storage size constraint is then intended to be imposed on the result of the translation of the content. 
On the other hand, if the data you are checking is actually the translation result then the constraint applies directly to the content.

So you need to know whether you are looking at source or translation to be able to react appropriately when processing 'storage size' information.

Of course this is only the case when storage size is used as suggested, to transport information about the maximum size available to store a string.

Do you share this view, and is this maybe something that we should make a bit clearer in the explanation what 'storage size' could be used for?

Best
Stephan

On Jan 30, 2013, at 7:23 , Yves Savourel wrote:

> Hi Stephan, Norbert, all,
> 
> Sorry I missed looking at the text before. I have two small corrections:
> 
> - the property applies to the content (source or translation, not just 
> its translation)
> - the limit is a maximum (thus inclusive -> s/shorter/not longer/)
> 
> This would make the text:
> 
> ==>
> The storage size is expressed in bytes and is provided along with the character set encoding, and optionally the line break type, that will be used when the content will be stored. In order to check if the content will fit inside the storage limit given by storageSize, it has to be calculated whether the byte sequence that results from encoding this content is not longer than the given limit. This byte sequence is to be produced using the specified encoding (and possibly line break type).
> <==
> 
> -yves
> 
> 
> From: Stephan Walter [mailto:stephan.walter@cocomore.com]
> Sent: Tuesday, January 29, 2013 5:34 AM
> To: public-multilingualweb-lt-comments@w3.org
> Subject: Re: I18N-ISSUE-246: Clarify character encoding behavior when 
> calculating storage size [ITS-20]
> 
> Dear Norbert,
> 
> 
> this concerns issues 106 [1] and 107 [2] which you reported on the draft specification of the ITS 2.0 Tagset Standard.
> 
> At our last working group meeting we had a discussion on the topic. We came to the conclusion that the specification need not impose any requirements on implementations regarding supported character encodings, behavior on encoding problems or representation of line breaks.
> 
> In the context addressed by your issues this was not intended. The storageEncoding and lineBreakType will enable an implementation to make use of the storageSize information in a sensible way. The details of how this information is used are however up to the implementation.  We agreed to make this clearer by adding  a clarification to the end of the definition of Storage Size (8.21.1) as follows:
> 
> ===
> The storage size is expressed in bytes and is provided along with the character set encoding used to store the content.
> 
> ==>
> The storage size is expressed in bytes and is provided along with the character set encoding, and optionally the line break type, that will be used when the content will be stored. In order to check if the translation of the content will fit inside the storage limit given by storageSize, it has to be calculated whether the byte sequence that results from encoding this translation is shorter than the given limit. This byte sequence is to be produced using the specified encoding (and possibly line break type).
> ===
> 
> Would you agree that this answers the questions raised in your issues in an appropriate way?
> 
> Best regards
> 
> Stephan Walter
> 
> [1] 
> https://www.w3.org/International/multilingualweb/lt/track/issues/106, 
> http://www.w3.org/International/track/issues/246
> 
> [2] 
> https://www.w3.org/International/multilingualweb/lt/track/issues/107, 
> http://www.w3.org/International/track/issues/247
> 
> ________________________________________
> Dr. Stephan Walter, Senior IT-Consultant
> Tel.: +49 69 972 69 x Fax: +49 69 972 69 x; E-Mail: 
> stephan.walter@cocomore.com Cocomore AG, Gutleutstraße 30, D-60329 
> Frankfurt
> Internet: http://www.cocomore.de Facebook: 
> http://www.facebook.com/cocomore Google+: http://plus.cocomore.de 
> Cocomore ist aktives Mitglied im World Wide Web Consortium (W3C) und 
> im Bundesverband Digitale Wirtschaft (BVDW) Cocomore is active member 
> of the World Wide Web Consortium (W3C)
> Vorstand: Dr. Hans-Ulrich von Freyberg (Vors.), Dr. Jens Fricke, Marc 
> Kutschera, Vors. des Aufsichtsrates: Martin Velasco, Sitz: 
> Frankfurt/Main, Amtsgericht Frankfurt am Main, HRB 51114
> 
> 
>

Received on Thursday, 28 February 2013 14:21:25 UTC