pesonal comments about http://www.w3.org/TR/xml11/ (fwd) from John Cowan on 2003-03-11 (www-xml-blueberry-comments@w3.org from March 2003)

From: John Cowan <cowan@mercury.ccil.org>
Date: Mon, 10 Mar 2003 20:55:54 -0500 (EST)
To: www-xml-blueberry-comments@w3.org
Message-Id: <E18sYzi-00040n-00@mercury.ccil.org>
----- Forwarded message from Yung-Fong Tang -----

From ftang@netscape.com Mon Mar 10 20:47:15 2003
Return-path: <ftang@netscape.com>
Envelope-to: cowan@localhost
Received: from localhost ([127.0.0.1] ident=cowan)
	by mercury.ccil.org with esmtp (Exim 3.35 #1 (Debian))
	id 18sYrL-0003kW-01
	for <cowan@localhost>; Mon, 10 Mar 2003 20:47:15 -0500
Received: from mail.reutershealth.com [65.246.141.36]
	by localhost with POP3 (fetchmail-5.9.11)
	for cowan@localhost (single-drop); Mon, 10 Mar 2003 20:47:15 -0500 (EST)
Received: from netscape.com (c3po.aoltw.net [64.236.137.25])
	by mail.reutershealth.com (Pro-8.9.3/Pro-8.9.3) with ESMTP id UAA16360
	for <jcowan@reutershealth.com>; Mon, 10 Mar 2003 20:42:24 -0500 (EST)
Received: from dredd.mcom.com (dredd.nscp.aoltw.net [10.169.8.48])
	by netscape.com (8.10.0/8.10.0) with ESMTP id h2B1igl29430
	for <jcowan@reutershealth.com>; Mon, 10 Mar 2003 17:44:42 -0800 (PST)
Received: from netscape.com ([10.169.97.117]) by dredd.mcom.com
          (Netscape Messaging Server 4.15) with ESMTP id HBKA5R00.A76;
          Mon, 10 Mar 2003 17:44:15 -0800 
Message-ID: <3E6D3F1D.1030306@netscape.com>
Date: Mon, 10 Mar 2003 17:42:53 -0800
From: ftang@netscape.com (Yung-Fong Tang)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0 (nscd2)
X-Accept-Language: en-us, en
To: jcowan@reutershealth.com
CC: unicode@unicode.org
Subject: pesonal comments about http://www.w3.org/TR/xml11/
X-UIDL: 0%Q!!YHl"!-~T"!`[K"!

Dear John Cowan:

Here is my personal comments about
http://www.w3.org/TR/xml11/
"Extensible Markup Language (XML) 1.1
W3C Candidate Recommendation 15 October 2002"

1. in 2.2 Characters, you exclude characters U+FFFE and U+FFFF
according to Unicode 3.1, the list of non characters are extended
see


      Noncharacters

There are 34 specific code points in Unicode 3.0 that are characterized 
as noncharacters. Unicode 3.1 adds an additional 32 noncharacters. To 
clarify the status of all 66, a definition (page 41) is added, and 
conformance rules C5 and C10 (pages 38, 39) are amended as follows:

    D7b Noncharacter: a code point that is permanently reserved for
    internal use, and that should never be interchanged. In Unicode 3.1,
    these consist of the values U+nFFFE and U+nFFFF (where n is from 0
    to 1016) and the values U+FDD0..U+FDEF.

        * For more information, see the discussions under "Special
          Noncharacter Values" in Section 2.7, Special Character and
          Noncharacter Values, and under "Noncharacters" in Section
          13.6, Specials.
        * These code points are permanently reserved as noncharacters.
          In the future, it is possible that additional code points may
          be specified to represent noncharacters.

    C5 A process shall not interpret either U+FFFE or U+FFFF a
    noncharacter code point as an abstract character.

        * The code points may be used internally, such as for sentinel
          values or delimiters, but should not be exchanged publicly.

    C10 A process shall make no change in a valid coded character
    representation other than the possible replacement of character
    sequences by their canonical-equivalent sequences or the deletion of
    noncharacter code points, if that process purports not to modify the
    interpretation of that coded character sequence.

        * If a noncharacter which does not have a specific internal use
          is unexpectedly encountered in processing, an implementation
          may signal an error or delete or ignore the noncharacter. If
          these options are not taken, the noncharacter should be
          treated as an unassigned code point. For example, an API that
          returned a character property value for a noncharacter would
          return the same value as the default value for an unassigned
          code point.


in the http://www.unicode.org/reports/tr27/

Therefore, should the following session changed from
[2]    Char    ::=    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] 
| [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate 
blocks, FFFE, and FFFF. */

to
[2]    Char    ::=    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFDCF] 
| [#xFDF0-#xFFFD] | [#x10000-#x1FFFD] | [#x20000-#x2FFFD] | 
[#x30000-#x3FFFD] | [#x40000-#x4FFFD] | [#x50000-#x5FFFD] | 
[#x60000-#x6FFFD] | [#x70000-#x7FFFD] | [#x80000-#x8FFFD] | 
[#x90000-#x9FFFD] | [#xA0000-#xAFFFD] | [#xB0000-#xBFFFD] |  
[#xC0000-#xCFFFD] | [#xD0000-#xDFFFD] | [#xE0000-#xEFFFD] |  
[#xF0000-#xFFFFD] | [#x10000-#x1FFFD]  /* any Unicode character, 
excluding the surrogate blocks, FDD0 to FDEF nFFFE, and nFFFF. */


2. similar thing should apply to
[4] NameStartChar
#xFDD0-#xFDEF should not be allowed in NameStartChar
nFFFE nor nFFFF should not be allowed in NameStartChar neither

It looks the NameStartChar do not allow private use area 
[#xE000-#xF8FF]. If we follow that principal, then  [#xF0000-#x10FFFF] 
should neither be in NameStartChar since 
http://www.unicode.org/Public/3.2-Update/Blocks-3.2.0.txt defined them 
as Supplementary Private Use Area

F0000..FFFFF; Supplementary Private Use Area-A
100000..10FFFF; Supplementary Private Use Area-B

Also, I doubt we should allow

E0000..E007F; Tags

to be used as NameStartChar



Frank Yung-Fong Tang

----- End of forwarded message from Yung-Fong Tang -----

-- 
John Cowan           http://www.ccil.org/~cowan              cowan@ccil.org
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
        --_The Hobbit_
Received on Monday, 10 March 2003 20:55:58 UTC