W3C home > Mailing lists > Public > www-international@w3.org > April to June 2014

[Bug 23646] "us-ascii" should not be an alias for "windows-1252"

From: <bugzilla@jessica.w3.org>
Date: Sat, 28 Jun 2014 10:47:42 +0000
To: www-international@w3.org
Message-ID: <bug-23646-4285-L4FIBEaYRE@http.www.w3.org/Bugs/Public/>
https://www.w3.org/Bugs/Public/show_bug.cgi?id=23646

--- Comment #11 from Jirka Kosek <jirka@kosek.cz> ---
(In reply to Anne from comment #10)
> Why would the XML parser result in an error? Surely it should use the same
> encoding layer.

Because 0xA9 is invalid sequence in 7-bit encoding. I have tried two randomly
chosen XML parser and both choke on this example:

$ cat test.xml
<?xml version="1.0" encoding="us-ascii"?>
<test>©</test>

$ xmllint --noout test.xml 
I/O error : encoder error
test.xml:2: parser error : Premature end of data in tag test line 2
<test>
      ^

$ xjparse test.xml
Attempting validating, namespace-aware parse
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Byte
"169" is not a member of the (7-bit) ASCII character set.

So in my opinion Encoding spec breaks compatibility with existing content and
implementations in regard to "us-ascii" encoding.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Received on Saturday, 28 June 2014 10:47:43 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:41:05 UTC