- From: Norbert Lindenberg <w3@norbertlindenberg.com>
- Date: Wed, 27 Feb 2013 00:26:47 -0800
- To: Yves Savourel <ysavourel@enlaso.com>
- Cc: Norbert Lindenberg <w3@norbertlindenberg.com>, <public-multilingualweb-lt-comments@w3.org>, "'www-international'" <www-international@w3.org>
On Feb 21, 2013, at 8:58 , Yves Savourel wrote: > Hi Norbert, > > Related to: > http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Feb/0028.html > >> I don't see in your messages any justification why the standard should >> not require at least support for UTF-8, and why it should not specify >> error handling for commonly occurring situations. Can you please explain? >> If an application can't rely on any encoding being supported, can't >> find out whether a requested encoding is supported, and can't rely >> on unsupported characters being handled in some reasonable way, >> then using this data category seems quite risky. > > The original comments were: > >> Several aspects of the interpretation of the character encoding >> given as storageEncoding need to be clarified: >> - Which character encodings is an implementation required to support? >> Support for UTF-8 must be mandatory. >> - What's the required behavior if storageEncoding specifies a character >> encoding that the implementation doesn't support? >> - What's the required behavior if the selected nodes contain characters >> that the specified character encoding cannot represent? > > I'm not sure why support for UTF-8 would be mandatory (I'm not against, but just asking why). > The encoding is the one used in the store where the data resides and can be anything (resource file, database, etc.), not necessarily some XML-based system. What would be the rational to force support for an arbitrary encoding?. UTF-8 isn't arbitrary; it's the Unicode encoding most commonly used in files. If databases are involved, we might relax that a bit and require that any implementation support at least one of UTF-8, UTF-16, or UTF-32. I think allowing implementations to support no Unicode encoding at all and risk data loss is no longer acceptable. If this were an IETF standard, I'd point to RFC 2277; the W3 Character Model isn't quite as strongly worded. > One can imagine a user having the data stored in Latin-1, the data extracted to some XML export format (in UTF-8) where the storage size encoding would be set to iso-8859-1 and his checking tool supporting only that encoding. Why would such user have to implement support for UTF-8 if he doesn't use it? Do you really want to let systems that can represent less than 1% of Unicode advertise themselves as ITS 2.0 conformant? > Note also that we have no way to check conformance of the applications using the ITS data for such mandatory support: ITS processors just pass the data along, they don't act on them (in the case of this data category). So who does actually act if a string is too long to fit into the specified storage? Norbert
Received on Wednesday, 27 February 2013 08:27:16 UTC