W3C home > Mailing lists > Public > ietf-charsets@w3.org > January to March 1999

RE: draft-hoffman-utf16-01.txt available

From: Martin J. Duerst <duerst@w3.org>
Date: Thu, 04 Feb 1999 00:10:52 +0900
To: Francois Yergeau <yergeau@alis.com>
Cc: medavis2@us.ibm.com, Larry Masinter <masinter@parc.xerox.com>, Paul Hoffman / IMC <phoffman@imc.org>, MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>, ietf-charsets@iana.org
Message-id: <199902031755.CAA11531@sh.w3.mag.keio.ac.jp>
At 22:14 99/02/02 -0500, Francois Yergeau wrote:
> タ 13:21 02/02/99 -0800, medavis2@us.ibm.com a 馗rit :

> Two problems here: 1) use UTF-16 only if you *know* that the source has a
> BOM or that it's BE, otherwise you might send LE without a BOM; "might
> come" is not enough; not every Windows file will have a BOM, it depends on
> the creating application.  2) Are we sure  that we want to drop the more
> specific BE|LE tags, and thereby force the receiver to peek inside the
> object, just because we know there is a BOM?  I'm not convinced yet.

I don't understand this. Either some processor is going to do something
with the entity for where it matters whether the entity is BE or LE.
If it's doing this, it has to work on the content anyway. Otherwise,
it's going to do something where endianness doesn't matter, and it
won't have to peek inside. Do I miss something?


> > If it doesn't (any other Unicode string!), then I will send
> >UTF-16BE/LE (depending on the polarity).
> 
> And perhaps generate illegal stuff, because there turns out to be a BOM
> after all.  I'm questionning the enforceability of forbidding BOMs.  Let's
> not make a rule that cannot be obeyed by reasonable implementations in most
> cases.

I agree. But I think going the other way would be equally fatal:
Not defining things clear enough, and just do handwaving.


> Facing this, there doesn't seem to be any ideal solution, guaranteed to
> always work. We have to decide on the best compromise.  I think it's
> slightly better not to forbid BOMs completely with the BE/LE tags, but I'm
> willing to go for forbidding them if that's the consensus.  At this point,
> the most important thing is to register those $%?&* tags in a workable
> manner and make UTF-16 useable on the Internet.

I agree. I think we have a compromize. The compromize is to agree that
there are two different ways of handling things, and that if they are
two different ways, they should be clearly separated, and not muddled.


Regards,   Martin.


#-#-#  Martin J. Du"rst, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org
Received on Wednesday, 3 February 1999 13:02:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 June 2006 15:10:50 GMT