[Bug 23646] "us-ascii" should not be an alias for "windows-1252"


--- Comment #26 from Addison Phillips <addison@lab126.com> ---
(In reply to Paul Eggert from comment #23)
> It was never common practice to use charset="us-ascii" when the text was
> actually Latin-1 or some other extension to ASCII. The default was Latin-1,
> and some validators would recommend charset="us-ascii" when the text was
> limited to characters in the range 00-7F. So the longstanding meaning of
> charset="us-ascii" was "This document is not using any characters outside
> the ASCII range, and I've checked it and that's what I want".

Look at it from the browser (or search engine or document consumer) point of
view. If you have a document that declares "us-ascii", but, in fact, contains
non-ASCII byte values, what should happen to those byte values when the
document is interpreted? 

I find myself writing text here that I already said in or around comment 1, so
I won't repeat myself. 

> Again, I'm not asking that the standard be *changed*, only that this issue
> be *explained*. Currently this stuff is entirely a mystery to a non-expert
> (and it appears, even to some experts). That's not right.

I agree that an explanation is desirable. There is no discussion of superset
encodings or why any of this occurs in the Encoding spec. A note is probably
called for so that it won't be a mystery. Perhaps just after the "violation of
UTS#22" note in section 4.2:

In many cases the legacy single-byte encoding selected has a larger character
repertoire than that of the label actually used in the document. For example,
both the "iso8859-1" and "us-ascii" labels use the "windows-1252" encoding.
This is because user-agents historically have applied the larger "super-set"
encoding in practice because document authors tend to be imprecise in
identifying the correct label.

You are receiving this mail because:
You are on the CC list for the bug.

Received on Monday, 30 June 2014 15:57:15 UTC