W3C home > Mailing lists > Public > www-dom@w3.org > April to June 2001

Unicode character classes and XML parser

From: John G. Spragge <jgs@dancing-cat-software.com>
Date: Thu, 19 Apr 2001 16:18:11 -0400
Message-ID: <01C0C8EC.54C04E20@RUBIN>
To: "'www-dom@w3.org'" <www-dom@w3.org>
Sorry if this doesn't belong in the DOM forum, but the two addresses available for the XML list from the W3C web site don't work. 

This has to do with parsing XML using Unicode. On page 29 of the (printed) specification, (at http://www.w3.org/TR/2000/REC-xml-20001006#CharClasses), it says, and I quote:

Characters which have a font or compatibility decomposition (i.e. those with a "compatibility formatting tag" in field 5 of the database -- marked by field 5 beginning with a "<") are not allowed.
Question from an implementor of the parser: does this mean xml excludes characters with decompositions altogether (presumably to avoid normalisation issues), or does it mean xml identifiers exclude such characters?

J. G. Spragge 
Dancing Cat Software -- http://www.dancing-cat-software.com
Received on Thursday, 19 April 2001 16:24:22 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 20 October 2015 10:46:08 UTC