Re: Fallback to UTF-8 from Jukka K. Korpela on 2008-04-30 (www-validator@w3.org from April 2008)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Wed, 30 Apr 2008 13:11:14 +0300
To: <www-validator@w3.org>
Message-ID: <010c01c8aaaa$869086c0$0500000a@DOCENDO>

Michael Adams wrote:

> I found the ECMA tie-in to ISO 8859-1 Latin-1, dated March 85 and June
> 86. http://en.wikipedia.org/wiki/ISO_8859-1#History

References to ECMA standards are irrelevant when discussing "Latin 1" in 
the Internet context, since the authority on encoding names for Internet 
use is IANA, and the IANA registery does not name any ECMA-based _name_ 
for ISO-8859-1, even though it cites "ECMA registry" as the source.

> Though not the definitive authority,

Wikipedia is not an authority of any kind, and it's surely just 
confusing in many confused issues like this.

> There
> are two encodings: "ISO 8859-1" and "ISO-8859-1" note the hyphen
> versus space after ISO.

That's nonsense. Registered encoding names have no spaces. When "ISO 
8859-1" is used, it refers to the _standard_ (proper) defining the 
encoding. It might be a good idea to accept, as error recovery, it as 
meaning "ISO-8859-1" as a charset parameter value, but it would still be 
an error.

> I merely wanted to acquaint myself
> with the issues being debated here to see what the fuss is about.

I'm afraid the wikipedia stuff just adds to the general confusion.

In this discussion, loose names like "Latin-1" should not be used at 
all. This is about encodings, so "ISO-8859-1" and "windows-1252" are the 
preferred names for the two encodings that people have in their minds, 
informally called ISO Latin 1 and Windows Latin 1. (For the latter, 
"windows-1252" is the _only_ registered name, oddly enough, since names 
like "cp-1252" or "cp1252" have often been used, and it would cause no 
harm to define them as aliases; but I digress.)

Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Received on Wednesday, 30 April 2008 10:11:29 UTC