- From: Michael Kruppa <Michael.Kruppa@cocomore.com>
- Date: Mon, 27 Aug 2012 15:29:18 +0000
- To: Yves Savourel <ysavourel@enlaso.com>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
Hi Yves, all, OK. Now I got it! ;-) Thanks Yves for taking the time to explain this to me. Cheers Michael ________________________________________ Dr. Michael Kruppa, Senior IT-Consultant Tel.: +49 69 972 69 189 Fax: +49 69 972 69 204; E-Mail: michael.kruppa@cocomore.com Cocomore AG, Gutleutstraße 30, D-60329 Frankfurt Internet: http://www.cocomore.de Facebook: http://www.facebook.com/cocomore Google+: http://plus.cocomore.de Cocomore ist aktives Mitglied im World Wide Web Consortium (W3C) und im Bundesverband Digitale Wirtschaft (BVDW) Cocomore is active member of the World Wide Web Consortium (W3C) Vorstand: Dr. Hans-Ulrich von Freyberg (Vors.), Dr. Jens Fricke, Marc Kutschera, Vors. des Aufsichtsrates: Martin Velasco, Sitz: Frankfurt/Main, Amtsgericht Frankfurt am Main, HRB 51114 dmexco 2012 in Köln: Besuchen Sie unseren Messestand auf der internationalen Leitmesse für die Digitale Wirtschaft am 12. und 13. September 2012. Sie finden uns in Halle 7, Stand E057. dmexco 2012 in Cologne: Come to see us on September 12 and 13 at the Digital Marketing Exposition and Conference (hall 7, stand E057). -----Ursprüngliche Nachricht----- Von: Yves Savourel [mailto:ysavourel@enlaso.com] Gesendet: Samstag, 25. August 2012 21:46 An: public-multilingualweb-lt@w3.org Betreff: RE: Call for consensus - storageSize and displaySize Hi Michael, > I'm still confused about storage size. In my > understanding: If I state a storage size limit in bytes than I'm done. > I would interpret this limit as: Whatever content you put here, it > shall not exceed the maximum number of bytes. > Whether I use encoding A or B should be irrelevant, since the I have > to ensure that my text using my encoding does not exceed the byte > limit. > I think, one would only need the additional encoding attribute if we > would base storage on character counts. > Or is this a totally wrong understanding? It seems you got it backward: You wouldn't need the encoding if the unit of storage was the character (presumably the Unicode code point), but you do need it when the unit is byte. And for storage one can only use byte as a unit. Let's say you have a storage field that cannot take more than 11 bytes. Let's say your original English text is: "It's summer" (11 Unicode code points) Let's say your file/db/whatever is using UTF-8 to store the field. "It's summer" gives you: 49,74,27,73,20,73,75,6d,6d,65,72 = 11 bytes. Now we are translating into French. The text is: "C'est l'été" (11 Unicode code points) In UTF-8 that is encoded as: 43,27,65,73,74,20,6c,27,c3,a9,74,c3,a9 = 13 bytes. It's too long to fit into your field! If the encoding used to store the field was ISO-8859-1 we would have: 43,27,65,73,74,20,6c,27,e9,74,e9 = 11 bytes The difference is the two 'é': in ISO-8859-1 it's encoded in one byte (0xE9), but in UTF-8 it's encoded in two bytes (0xC3,0xA9). That's why we have two 'extra' bytes in UTF-8. That is why when a tool checks if a given text fits the storage it must know what encoding is used, otherwise it simply cannot calculate it. Those byte/char/encoding-related matters are often confusing, I hope this helps. Cheers, -yves
Received on Monday, 27 August 2012 15:29:56 UTC