Re: Unicode character classes and XML parser from David Brownell on 2001-04-19 (www-dom@w3.org from April to June 2001)

From: David Brownell <david-b@pacbell.net>
Date: Thu, 19 Apr 2001 15:08:58 -0700
To: "John G. Spragge" <jgs@dancing-cat-software.com>, www-dom@w3.org
Message-id: <073c01c0c91d$561dc500$6800000a@brownell.org>

Trust those obnoxious large tables in the grammar productions,
not the text explaining how they were derived from Unicode tables.

- Dave


----- Original Message -----
From: "John G. Spragge" <jgs@dancing-cat-software.com>
To: <www-dom@w3.org>
Sent: Thursday, April 19, 2001 1:18 PM
Subject: Unicode character classes and XML parser


> Sorry if this doesn't belong in the DOM forum, but the two addresses available for the XML list
from the W3C web site don't work.
>
> This has to do with parsing XML using Unicode. On page 29 of the (printed) specification, (at
http://www.w3.org/TR/2000/REC-xml-20001006#CharClasses), it says, and I quote:
>
> Characters which have a font or compatibility decomposition (i.e. those with a "compatibility
formatting tag" in field 5 of the database -- marked by field 5 beginning with a "<") are not
allowed.
> Question from an implementor of the parser: does this mean xml excludes characters with
decompositions altogether (presumably to avoid normalisation issues), or does it mean xml
identifiers exclude such characters?
> Thanks...
>
>
>
> ----
> J. G. Spragge
> Dancing Cat Software -- http://www.dancing-cat-software.com
>
>

Received on Thursday, 19 April 2001 18:17:48 UTC