W3C home > Mailing lists > Public > www-validator@w3.org > April 2008

Re: Fallback to UTF-8

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Wed, 30 Apr 2008 13:05:23 +0200
To: www-validator@w3.org
Message-ID: <fv9jkj$uor$1@ger.gmane.org>

Michael Adams wrote:
 
> I found the ECMA tie-in to ISO 8859-1 Latin-1, dated March 85
> and June 86. http://en.wikipedia.org/wiki/ISO_8859-1#History

Yes, I used the ECMA standards because I'm not aware of a free
version for the later ISO standards.
  
> I also see this, perhaps surprising, paragraph which is a 
> little thin on references for it's claims:
 
> *** quote ***
> ISO-8859-1 is (according to the standards at least) the default
> encoding of documents delivered via HTTP with a MIME type
> beginning with "text/".

Yes, you could add RFC 2616 as a reference...

> (HTML 4.0, however, is based on Unicode).
> *** end quote ***

...but that is incomplete, HTML i18n specified in RFC 2070, a
predecessor of HTML 4, already had the same concept.

The hyphen in iso-8859-1 is merely a trick in the IANA charset
registry to avoid a space.  That's why you can use a parameter
charset=iso-8859-1 for MIME, with a space you would have to put
it in quotes charset="iso 8859-1", but IIRC the IANA registry
wisely doesn't allow to register charset names with spaces. ;-)

The <http://www.iana.org/assignments/character-sets> entry is:
| Name: ISO_8859-1:1987                         [RFC1345,KXS2]
| MIBenum: 4
| Source: ECMA registry
| Alias: iso-ir-100
| Alias: ISO_8859-1
| Alias: ISO-8859-1 (preferred MIME name)
| Alias: latin1
| Alias: l1
| Alias: IBM819
| Alias: CP819
| Alias: csISOLatin1

The registrant used RFC 1345 as source, and that includes the 
normal control characters (the ISO 8859-1 standard doesn't) -
but please note that RFC 1345 is not very reliable as source.

ISO-IR-100 is obscure as alias, but nice to find "official"
ISO sources:  <http://www.itscj.ipsj.or.jp/ISO-IR/2-3.htm>

It starts to get seriously confusing when you try to track
down what the octets 0x00 .. 0x9F are supposed to mean "in"
iso-8859-1, you need at least four other standards to arrive
at first conclusions not necessarily covering 0x80 .. 0x9F. 

 Frank
-- 
OT, apart from Mr. Prilop, who won't need the info anymore,
I hope it is clear what a Reply-To address is.
Received on Wednesday, 30 April 2008 11:03:49 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:29 GMT