Re: [ACTION 189] Split special requirements into several data categories

On Mon, 2012-08-13 at 09:30 +0000, Michael Kruppa wrote:
> Dear all,
> please find attached a first draft of the data categories:

Display Size:
I think I missed the discussion on this. Is it really intended that
it limits the number of characters for each word, and not for the
whole string? Is word a well-defined concept in east Asian languages?

What's an actual use-case for this? The proposal says it an be used
to "limit the maximum number of characters to be used for each word",
which just restates the description. I'm not clear what real-world
things would require you to do word-per-word character limits.

Forbidden Characters:
 "list of pointers to unicode code points identifying chars which
  may not be used"
The word "pointers" threw me. I looked and looked for a term for the
U+ lexical representation and couldn't find one. Perhaps "list of
Unicode code points using the U+ representation"?

At any rate, I do think that *at least* some sort of basic ranges
are going to be necessary. Ideally we should enable well-defined
character classes so people don't have to reinvent them.

Storage Size:
Storage seems like an inherently byte-based thing. If it's giving
a maximum number of characters, I don't see why you would specify
the storage encoding. Of course, XML files could be reserialized
using any character encoding. I assume the point of this category
is to say "This data will be pushed to another medium using this
character encoding, and when stored in that encoding, this is the
maximum number of bytes". Is that correct?


Received on Wednesday, 15 August 2012 16:54:54 UTC