Unicode character classes and XML parser from John G. Spragge on 2001-04-19 (www-dom@w3.org from April to June 2001)

From: John G. Spragge <jgs@dancing-cat-software.com>
Date: Thu, 19 Apr 2001 16:18:11 -0400
To: "'www-dom@w3.org'" <www-dom@w3.org>
Message-ID: <01C0C8EC.54C04E20@RUBIN>

Sorry if this doesn't belong in the DOM forum, but the two addresses available for the XML list from the W3C web site don't work. 

This has to do with parsing XML using Unicode. On page 29 of the (printed) specification, (at http://www.w3.org/TR/2000/REC-xml-20001006#CharClasses), it says, and I quote:

Characters which have a font or compatibility decomposition (i.e. those with a "compatibility formatting tag" in field 5 of the database -- marked by field 5 beginning with a "<") are not allowed.
Question from an implementor of the parser: does this mean xml excludes characters with decompositions altogether (presumably to avoid normalisation issues), or does it mean xml identifiers exclude such characters?
Thanks...



----
J. G. Spragge 
Dancing Cat Software -- http://www.dancing-cat-software.com

Received on Thursday, 19 April 2001 16:24:22 UTC