Re: revised "generic syntax" internet draft from Martin J. Duerst on 1997-04-21 (uri@w3.org from April 1997)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Mon, 21 Apr 1997 14:53:41 +0200 (MET DST)
To: Chris Newman <Chris.Newman@innosoft.com>
Cc: John C Klensin <klensin@mci.net>, IETF URI list <uri@bunyip.com>
Message-Id: <Pine.SUN.3.96.970421145201.245I-100000@enoshima>

On Tue, 15 Apr 1997, Chris Newman wrote:

> On Tue, 15 Apr 1997, John C Klensin wrote:

[About length problems with UTF-8.]

> UTF-8 requires 2 octets to encode characters from the 8859-1 set which
> normally take 1 octet.  UTF-8 requires 3 octets to encode ideographic
> characters from UCS-2 which normally require 2 octets.  So
> western Europeans take a worse storage hit from UTF-8 than ideographic
> languages do.

This is not exactly true. Western European languages contain many
characters from ASCII, and only occasionally a character that needs
two bytes in UTF-8. But anyway, I think we agree that the size
of UTF-8 is not really an issue.

Regards,	Martin.

Received on Monday, 21 April 1997 08:54:49 UTC