[Bug 23646] "us-ascii" should not be an alias for "windows-1252" from bugzilla@jessica.w3.org on 2014-07-01 (www-international@w3.org from July to September 2014)

From: <bugzilla@jessica.w3.org>
Date: Tue, 01 Jul 2014 10:51:48 +0000
To: www-international@w3.org
Message-ID: <bug-23646-4285-UR6q8tlq8R@http.www.w3.org/Bugs/Public/>

https://www.w3.org/Bugs/Public/show_bug.cgi?id=23646

--- Comment #40 from Henri Sivonen <hsivonen@hsivonen.fi> ---
(In reply to Jirka Kosek from comment #39)
> But are there any pages using us-ascii encoding in a wild?

It would be extremely surprising if there weren't.

> If no, then there
> is no problem with having different aliases for decoding/encoding.

As noted in my previous comment, when e.g. submitting a form, browser use the
encoding of the submitting document. The document stores the identity of the
encoding. It doesn't store the original label, so you don't have a chance to
re-resolve the label according to a different mapping.

As for the TextEncoding API, it doesn't support non-UTF-* encodings anyway, so
the issue of "us-ascii" is moot. 

> > > In ideal world yes, but when you have other constraints and you know that
> > > receiver can handle us-ascii then why it should be broken?
> > 
> > What "other constraints"?
> 
> For example 15 years old POS terminal with no UTF-8 support.

Without UTF-8 support, they can't have conforming XML support. It's not the
Encoding Standard's problem to accommodate XML interchange with fundamentally
XML-non-conforming legacy systems.

> > If you know what the receiver can handle, you don't need specs to bless your
> > bilateral arrangement.
> 
> If I'm asking encoder to produce us-ascii output I'm not expecting getting
> bytes with value larger then 127 in my output. 

The point where things go wrong is asking an encoder to produce something other
than UTF-8. :-)

> > > Please note that the
> > > Encoding Standard changes how us-ascii encoding behaved in the past, so this
> > > change must be justified and well reasoned. 
> > 
> > Citation-needed for the Encoding Standard describing a change compared to
> > pre-Encoding Standard browser behavior.
> 
> I think that definition of US-ASCII is pretty clear, it's 7-bit encoding.

I said "browser behavior"--not (de jure) "definition".

> I'm talking about us-ascii in general not only in browsers because the
> Encoding Standard seems to apply to everything, not only to browsers. If the
> scope is narrowed to browsers only, then do as you wish. But it would be
> silly to have two different definitions of us-ascii -- one for browsers and
> second for other environments.

I think we should focus the spec on the Web Platform--i.e. browsers. As other
systems find the need to consume Web content, they'll eventually grow Encoding
Standard-compliant encoding subsystems.

It's clear that there exist encoding libraries whose label handling is
IANA-oriented. Those will probably stick around for a long time for
compatibility with their old selves. It's unfortunate that the Web behavior and
e.g. the IANA-oriented JDK behavior differ, but we should just admit the
existence of two different legacies and not try to mix e.g. the JDK legacy into
Web specs.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Received on Tuesday, 1 July 2014 10:51:50 UTC