RE: XHTML and charset's [was: Re: XHTML questions] from Ian Graham on 2000-06-30 (www-html@w3.org from June 2000)

From: Ian Graham <igraham@smaug.java.utoronto.ca>
Date: Fri, 30 Jun 2000 09:29:08 -0400
To: Christian Smith <csmith@barebones.com>
cc: www-html@w3.org, Chris Croome <chris@webarchitects.co.uk>, Ian Graham <ian.graham@utoronto.ca>
Message-ID: <Pine.SGI.4.05.10006300924400.123455-100000@smaug.java.utoronto.ca>

On Thu, 29 Jun 2000, Christian Smith wrote:

> On Thursday, June 29, 2000 at 16:35, igraham@ic-unix.ic.utoronto.ca (Ian Graham) wrote:
> 
> > Bertilo is correct -- things are fine if your documet only
> > contains ASCII characters, as they map onto the same byte
> > sequence in UTF-8.
> > 
> > HOwever, things go wrong if you hav non-ascii characters
> > in the document. They also fail (on Navigator 4 and earlier)
> > if you have charcter references in the document that 
> > references non-latin-1 characters. For example, character
> > references like 
> > 
> > &#3124;
> > 
> > (this is a made up number I'm afraid), which references the
> > 3124th character in Unicode, will only work if you explicitlyu
> > set UTF-8 using a META element.
> 
> And if you save a file as UTF-8 and include the UTF8 byte order mark, IE
> for the Macintosh at least doesn't deal with this very well (it renders
> the byte order mark as a garbage character). I don't know how well other
> browsers handle this.
> 

I think you mean UTF-16 (the two-byte encoding). UTF-8 doesn't use /
require a byte order mark, as all characters are encoded as a
stream of one, two, or more bytes, and the encoding rules uniquely 
define the ordering of the bytes (a byte stream). 

Ian

Received on Friday, 30 June 2000 09:29:11 UTC