- From: Shaun McCance <shaunm@gnome.org>
- Date: Wed, 15 Aug 2012 12:54:30 -0400
- To: public-multilingualweb-lt@w3.org
On Mon, 2012-08-13 at 09:30 +0000, Michael Kruppa wrote: > Dear all, > > > > please find attached a first draft of the data categories: Display Size: I think I missed the discussion on this. Is it really intended that it limits the number of characters for each word, and not for the whole string? Is word a well-defined concept in east Asian languages? What's an actual use-case for this? The proposal says it an be used to "limit the maximum number of characters to be used for each word", which just restates the description. I'm not clear what real-world things would require you to do word-per-word character limits. Forbidden Characters: "list of pointers to unicode code points identifying chars which may not be used" The word "pointers" threw me. I looked and looked for a term for the U+ lexical representation and couldn't find one. Perhaps "list of Unicode code points using the U+ representation"? At any rate, I do think that *at least* some sort of basic ranges are going to be necessary. Ideally we should enable well-defined character classes so people don't have to reinvent them. Storage Size: Storage seems like an inherently byte-based thing. If it's giving a maximum number of characters, I don't see why you would specify the storage encoding. Of course, XML files could be reserialized using any character encoding. I assume the point of this category is to say "This data will be pushed to another medium using this character encoding, and when stored in that encoding, this is the maximum number of bytes". Is that correct? -- Shaun
Received on Wednesday, 15 August 2012 16:54:54 UTC