[Bug 23646] "us-ascii" should not be an alias for "windows-1252" from bugzilla@jessica.w3.org on 2014-07-01 (www-international@w3.org from July to September 2014)

From: <bugzilla@jessica.w3.org>
Date: Tue, 01 Jul 2014 09:42:34 +0000
To: www-international@w3.org
Message-ID: <bug-23646-4285-8e5vDyQK9Y@http.www.w3.org/Bugs/Public/>

https://www.w3.org/Bugs/Public/show_bug.cgi?id=23646

Henri Sivonen <hsivonen@hsivonen.fi> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hsivonen@hsivonen.fi

--- Comment #36 from Henri Sivonen <hsivonen@hsivonen.fi> ---
(In reply to Jirka Kosek from comment #18)
> I agree that all should agree on how encodings works, but it seems that in
> this quest for unification everything except browsers is ignored.

I think the Encoding Standard should describe the required behavior for
implementations of the Web Platform--i.e. browser engines. Other software that
wants to be Web-compatible is welcome to implement the spec, too. However, I
think it would be wrong to change the spec and browser implementations to make
pre-existing non-browser behaviors "correct" per spec.

(In reply to Jirka Kosek from comment #9)
> Consider the following example. I have page containing copyright symbol
> (U+00A9). I want to save it in "us-ascii" encoding using XHTML syntax of
> HTML5.

You want something that's hostile to interop, then. The spec should not
accommodate what you want.

XML doesn't require implementations to support "us-ascii". UTF-8 support,
however, is required. If you are generating XML and you use an encoding that
XML processors aren't required to support, you are engaging in an activity
that's hostile to interop compared to using an encoding that XML processors are
required to support.

In other words, the right solution is to always use UTF-8 when you create XML
documents. 

(In reply to Paul Eggert from comment #27)
> I realize that in this context many browsers interpret non-ASCII bytes using
> a unibyte encoding for legacy reasons, but some newer browers do treat it as
> UTF-8.  I just now tried eww (which will be part of the next GNU Emacs
> release; see <http://www.emacswiki.org/emacs/eww>) and that's how it works. 

You should be doing your Web compat reasoning from browsers with substantial
market share. eww isn't one.

> The standard should allow this behavior.  More generally, the standard
> should allow the browser to heuristically decode invalid bytes in ways
> appropriate for the current user and context.

I strongly disagree. We should move towards determinism from the heuristic mess
we have instead of having more heuristics.

- -

I request this Bugzilla item be WONTFIXed.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Received on Tuesday, 1 July 2014 09:42:36 UTC