RE: General policy from Masataka Ohta on 1993-08-03 (ietf-charsets@w3.org from July to September 1993)

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Date: Wed, 04 Aug 1993 02:02:28 +0900 (JST)
To: lwj@cs.kun.nl (Luc Rooijakkers)
Cc: harald.t.alvestrand@delab.sintef.no, ietf-charsets@INNOSOFT.COM
Message-id: <9308031702.AA07850@necom830.cc.titech.ac.jp>

> This means that we should stay out of areas that may be touched by ISO
> (in terms of encoding space), even so-called "reserved for private use"
> areas, since ISO seems to have a habit of retracting such reservations later.
> It follows that whatever encoding we agree on, should have the "UCS" space
> totally separated from the "extended" space.

We don't have to. As there are 2^31 encoding space, it is unlikely that
the ISO needs to change the private space. Moreover, even if the
private space is remapped, it does not matter at all. If we use
UCS like encoding, it will be used outside of a processor. We
may also use our own internal representation. And, then, we may,
for the meaningless compatibility sake to ISO 10646, define a
mapping from our internal representation to UCS4 representation.
The mapping should map our non-standard characters to the
private use area of UCS4.

Then, even if ISO changes private area, it only means that the meaningless
mapping needs change without changing our internal or external form.

The only requirement is that the private use area of ISO 10646 should
be larger than we need (2^21 or, at most 2^24, I think).

As the current private use area is 2^27 (or 2^28, I'm not sure), it
is unlikely that the area is shunk so much.

So, don't mind.

> Otha, what happened to your "uniqueness" constraint, i.e., the
> requirement that some class of characters have only one representation?

What? I have never required such uniqueness. Rather, I said, in the last
bof, the uniqueness is unnecessary.

What I said is perhaps, "equality", which means that the equality
between two text must be defined, which is too much of course, so
I omit it this time. It should be noted, though, that, ISO 10646
level 2 or 3 only defines the equality between two "characters" (with
its own definition of "character"), and not between two strings or
texts, which makes almost all text processing impossible
without further profiling.

> Although not in general achievable, it is *very* useful for a restricted 
> class, e.g. all of 8859-X.

It is very useful, if the uniqueness is not achievable, to have
some short notation of regular expressions to represent all the
equivalent characters.

						masataka Ohta

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Tuesday, 3 August 1993 10:07:25 UTC