Re: XML 1.0 5th Ed. PER: Unicode upgrade from Richard Tobin on 2008-09-10 (xml-editor@w3.org from July to September 2008)

From: Richard Tobin <richard@inf.ed.ac.uk>
Date: Wed, 10 Sep 2008 15:05:23 +0100 (BST)
To: xml-editor@w3.org
Cc: John Boyer <boyerj@ca.ibm.com>
Message-Id: <20080910140523.08301446449@macpro.inf.ed.ac.uk>

(I'm replying to a rather old message here, it's
http://lists.w3.org/Archives/Public/xml-editor/2008JanMar/0004.html)

> Some have commented that they believed the sentence "XML processors MUST 
> accept the UTF-8 and UTF-16 encodings of Unicode 3.1" meant that encodings 
> for characters not in Unicode 3.1 were not allowed. 

I don't think this sentence is about what characters are allowed.
It's about what encodings must be supported.

It originally (in 1st edition) said "the UTF-8 and UTF-16 encodings of
10646", where "of 10646" was I think just to fully name them - at that
time many people were unfamiliar with Unicode and wouldn't have known
that they were encodings of the 10646/Unicode character set.

"10646" was changed to "Unicode 3.1" in third edition; again I don't
think this was intended to imply anything about the allowed
characters, but was simply an updated reference.

The allowed characters are specified in production 2 immediately
above, and are the same in 5th edition as they were in 1st.  All
Unicode code points except for most C0 controls, surrogates, and
FFFE/FFFF are allowed.  That is, all non-C0 potential Unicode
characters have always been allowed, and we have not and are not
changing that.

The right change would be to remove the phrase "of Unicode X.Y"
and if desired replace it with references to the definitions of
those encodings.

-- Richard

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Received on Wednesday, 10 September 2008 14:09:27 UTC