W3C home > Mailing lists > Public > www-validator@w3.org > April 2008

Re: Fallback to UTF-8

From: Michael Adams <linux_mike@paradise.net.nz>
Date: Thu, 01 May 2008 00:42:42 +1200
To: www-validator@w3.org
Message-id: <20080501004242.3dd2cd16.linux_mike@paradise.net.nz>

On Wed, 30 Apr 2008 13:05:23 +0200
Frank Ellermann wrote:

> Michael Adams wrote:
> > I found the ECMA tie-in to ISO 8859-1 Latin-1, dated March 85
> > and June 86. http://en.wikipedia.org/wiki/ISO_8859-1#History
> Yes, I used the ECMA standards because I'm not aware of a free
> version for the later ISO standards.

All ISO standards are free as in libre. It chafes me that they charge
for the PDFs as well. But i agree with Jukka; ECMA don't set
worldwide (internet) standards, they propose them to ISO. And the ECMA
standard you quote is outdated by the ISO ones.

> > I also see this, perhaps surprising, paragraph which is a 
> > little thin on references for it's claims:
> > *** quote ***
> > ISO-8859-1 is (according to the standards at least) the default
> > encoding of documents delivered via HTTP with a MIME type
> > beginning with "text/".
> Yes, you could add RFC 2616 as a reference...

*** quote 3.7.1 ***
The "charset" parameter is used with some media types to define the
   character set (section 3.4) of the data. When no explicit charset
   parameter is provided by the sender, media subtypes of the "text"
   type are defined to have a default charset value of "ISO-8859-1" when
   received via HTTP. Data in character sets other than "ISO-8859-1" or
   its subsets MUST be labeled with an appropriate charset value. See
   section 3.4.1 for compatibility problems.
*** end quote ***

> > (HTML 4.0, however, is based on Unicode).
> > *** end quote ***
> ...but that is incomplete, HTML i18n specified in RFC 2070, a
> predecessor of HTML 4, already had the same concept.
> The hyphen in iso-8859-1 is merely a trick in the IANA charset
> registry to avoid a space.  That's why you can use a parameter
> charset=iso-8859-1 for MIME, with a space you would have to put
> it in quotes charset="iso 8859-1", but IIRC the IANA registry
> wisely doesn't allow to register charset names with spaces. ;-)

Well, i definately learned something today. Before this discussion i was
not aware that these are eight bit encodings thinking instead they were
16 bit or more.

I'll back out as i don't really have anything of value to add to the
discussion,merely a learning exercise for me.


All shall be well, and all shall be well, and all manner of things shall
be well

 - Julian of Norwich 1342 - 1416
Received on Wednesday, 30 April 2008 12:43:00 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:59:07 UTC