Re: Proposed changes to UTF-8 draft

At 21:05 03/01/10 +0100, Keld J$BS(Bn Simonsen wrote:
>On Fri, Jan 10, 2003 at 02:47:53PM -0500, Francois Yergeau wrote:
> > Keld J$BS(Bn Simonsen wrote:
> > > I think we should keep ourselves to open standards whenever possible,
> > > and avoid industry standards like Unicode if we can.
> >
> > I dispute the characterization of ISO standards as open.  The
> > standardization process is totally closed (only National Bodies can play)
> > and the standards themselves, with few exceptions not including 10646, are
> > available only for money.
>
>I would characterize that as FUD.

FUD or not, to about the same degree as your comments
about Unicode.


>You can join the national bodies,
>and that is feasible even for one-person firms. That they are national
>means that you can influence the specifications without having big
>travel expenses.

I have certainly influenced the Unicode standard without travel
expenses, and I know of people who have had much more influence
than me, without travel expenses. Attending ISO meetings isn't
without travel expenses, or is it?
Another relevant point is that participation in ISO member
organizations may be very different depending on country.
It's definitely not in all cases free of fees and travel
expenses.


>Yes, most ISO standards cost money, but if ypu want the
>information, the standards are available to you in most countries via
>the public library systems, for free.

Ok. Most public libraries these days have some kind of Internet
access, and I guess most would order the Unicode Standard
(or make it accessible through interlibrary loan), too.


> > Please re-read Annex D.  The only mention is this Note:
> >
> >   NOTE 1 - Values of x in the range 0000 D800 .. 0000 DFFF
> >   are reserved for the UTF-16 form and do not occur in UCS-4.
> >   The values 0000 FFFE and 0000 FFFF also do not occur
> >   (see clause 8). The mappings of these code positions in
> >   UTF-8 are undefined.
> >
> > There's a later section D.7 "Incorrect sequences of octets: Interpretation
> > by receiving devices" which is totally silent on decoding surrogates and
> > overlong sequences.
>
>It is becacuse UTF-8 in the ISO 10646 definition only encodes characters
>defined in 10646. And "surrogates" are not characters. So they "do not
>occur" in UTF-8.

Ok. So formally speaking, this might be okay. But what's important
here is not that somebody can figure things out after some detailed
research, but that people get the message up front.


Regards,     Martin.

Received on Friday, 10 January 2003 16:17:42 UTC