W3C home > Mailing lists > Public > public-xml-core-wg@w3.org > June 2014

Re: REMINDER: Last Call for: "Encoding"

From: Jirka Kosek <jirka@kosek.cz>
Date: Fri, 27 Jun 2014 10:00:38 +0200
Message-ID: <53AD24A6.3080402@kosek.cz>
To: Paul Grosso <paul@paulgrosso.name>, public-xml-core-wg@w3.org
On 26.6.2014 17:44, Paul Grosso wrote:
> Has anybody--or is anybody willing to--review this document?

I have read document, although I haven't checked all boring
decoding/encoding algorithms written in a HTML5 spec way.

From XML point of view I see one possible problem here. Encoding spec says:

"In particular, this specification defines the encodings, their
algorithms to go from bytes to code points and back, and their canonical
names and identifying labels. ...

Historically encodings and their specifications (if any) were kept track
of by the IANA Character Sets registry. This specification renders that
registry obsolete."

XML specification is not explicit about how encoding/decoding works. For
UTF-* encodings it can be found in referenced Unicode standard, for
other encodings (like ISO-8859-*, windows-125*) this is largely
undefined. If we in a future decide to reference this Encoding spec in
order to fix this, there is a problem -- Encoding spec defined both
decoding and encoding. For encoding there is a special support for HTML
in step 5 of encoding process:

http://www.w3.org/TR/encoding/#concept-encoding-process

This step guarantees that if some character is not available in output
encoding it is replaced by numeric character reference. If we think that
we might use Encoding in future, we can ask for similar feature also for
XML.

Second issue I have found is that "us-ascii" encoding is just alias for
"windows-1252". This is correct mapping for decoding, but for encoding
it's wrong, "us-ascii" encoder should halt once character with code
point 128 and higher is present. In windows-1252 there 128 such characters.

This problem was already marked as WONTFIX in Bugzilla, I reopened this
issue (https://www.w3.org/Bugs/Public/show_bug.cgi?id=23646).

					Jirka

-- 
------------------------------------------------------------------
  Jirka Kosek      e-mail: jirka@kosek.cz      http://xmlguru.cz
------------------------------------------------------------------
       Professional XML consulting and training services
  DocBook customization, custom XSLT/XSL-FO document processing
------------------------------------------------------------------
 OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 rep.
------------------------------------------------------------------
    Bringing you XML Prague conference    http://xmlprague.cz
------------------------------------------------------------------


Received on Friday, 27 June 2014 08:01:17 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:16:47 UTC