- From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
- Date: Wed, 04 Aug 1993 02:02:28 +0900 (JST)
- To: lwj@cs.kun.nl (Luc Rooijakkers)
- Cc: harald.t.alvestrand@delab.sintef.no, ietf-charsets@INNOSOFT.COM
> This means that we should stay out of areas that may be touched by ISO > (in terms of encoding space), even so-called "reserved for private use" > areas, since ISO seems to have a habit of retracting such reservations later. > It follows that whatever encoding we agree on, should have the "UCS" space > totally separated from the "extended" space. We don't have to. As there are 2^31 encoding space, it is unlikely that the ISO needs to change the private space. Moreover, even if the private space is remapped, it does not matter at all. If we use UCS like encoding, it will be used outside of a processor. We may also use our own internal representation. And, then, we may, for the meaningless compatibility sake to ISO 10646, define a mapping from our internal representation to UCS4 representation. The mapping should map our non-standard characters to the private use area of UCS4. Then, even if ISO changes private area, it only means that the meaningless mapping needs change without changing our internal or external form. The only requirement is that the private use area of ISO 10646 should be larger than we need (2^21 or, at most 2^24, I think). As the current private use area is 2^27 (or 2^28, I'm not sure), it is unlikely that the area is shunk so much. So, don't mind. > Otha, what happened to your "uniqueness" constraint, i.e., the > requirement that some class of characters have only one representation? What? I have never required such uniqueness. Rather, I said, in the last bof, the uniqueness is unnecessary. What I said is perhaps, "equality", which means that the equality between two text must be defined, which is too much of course, so I omit it this time. It should be noted, though, that, ISO 10646 level 2 or 3 only defines the equality between two "characters" (with its own definition of "character"), and not between two strings or texts, which makes almost all text processing impossible without further profiling. > Although not in general achievable, it is *very* useful for a restricted > class, e.g. all of 8859-X. It is very useful, if the uniqueness is not achievable, to have some short notation of regular expressions to represent all the equivalent characters. masataka Ohta --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Tuesday, 3 August 1993 10:07:25 UTC