Make C1 range non-errors with ISO-8859-1 declared from Henri Sivonen on 2008-03-16 (public-html@w3.org from March 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Sun, 16 Mar 2008 17:08:04 +0200
To: HTML WG <public-html@w3.org>
Message-Id: <F791FDE0-9E20-4210-8B65-49884213F185@iki.fi>

> 17:30 < hsivonen> I regret suggesting that 0x80-0x9F bytes be errors  
> when
>                   ISO-8859-1 is declared
> 17:32 < annevk> i don't think it makes much sense
> 17:32 < annevk> iso-8859-1 is just an another alias
> 17:33 < annevk> just need IANA to fix the registry :)
> 17:33 < hsivonen> It'll probably create a huge amount of error noise
> 17:34 < hsivonen> Hixie: can I just say I was wrong and ask this  
> detail to be
>                   reversed? (especially since doing the consistent  
> thing with
>                   GBK would be a PITA)
> 05:37 < Hixie> hsivonen: and you can just display something next to  
> your line
>                reporting your encoding, e.g. "Encoding: Windows-1252  
> (but
>                incorrectly labelled as ISO-8859-1)" and "Valid HTML5  
> except for
>                _encoding errors_" where the "encoding errors" link  
> shows some
>                more details
> 05:54 < Hixie> the encoding issue is easy for me to change in the  
> spec, but the
>                hard work would be all yours in implementation, so  
> let me know
>                if you want me to relax that (by, again, e-mailing  
> the list)

I know I previously suggested making 0x80-0x9F bytes be errors when  
ISO-8859-1 is declared. I think my previous suggestion was a mistake.  
Since then, my views of what HTML5 conformance checking should be like  
have become even more user-oriented and less Charmod-oriented.

ISO-8859-1 is consistently treated as an alias of Windows-1252 in HTML  
browsers. Therefore, using 0x80-0x9F is reliable in practice. Since it  
is reliable, having a conformance checker whine about those bytes  
would not really help anyone and would devalue conformance checking  
messages.

As for putting encoding errors behind a link, if some errors are so  
useless that they can be hidden behind a link, why have them as errors  
at all? Besides, it would be silly to have to develop more complex UI  
and more complex decoder mechanics for errors that will be hidden from  
view.

Although it would be tempting to give an error about ISO-8859-1 not  
being the preferred IANA name for Windows-1252, I think the right way  
is to emit a warning saying that ISO-8859-1 is treated as  
Windows-1252. (Same for the Thai and Simplified Chinese encodings as  
appropriate.)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Sunday, 16 March 2008 15:08:42 UTC