- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Wed, 02 Mar 2022 09:53:51 +0000
- To: public-ixml@w3.org
- Message-ID: <87tucghaz7.fsf@saxonica.com>
Norm Tovey-Walsh <norm@saxonica.com> writes: > I haven’t worked out precisely what’s outside the intersection, but I Challenge accepted. It turns out, in fact, to be quite interesting. The fifth edition of XML ostensibly allows a huge range of characters to appear in names. But it doesn’t appear that parsers (at least not Xerces which is what I have at hand) allow them. When I implemented checking, I took the XML 5e list of name characters and excluded Unicode “non characters” and surrogates. (I don’t recall where I got the inspiration for doing that.) With those rules, there are only three characters allowed by ixml that are not allowed by XML: 00AA;FEMININE ORDINAL INDICATOR;Lo;0;L;<super> 0061;;;;N;;;;; 00B5;MICRO SIGN;Ll;0;L;<compat> 03BC;;;;N;;;039C;;039C 00BA;MASCULINE ORDINAL INDICATOR;Lo;0;L;<super> 006F;;;;N;;;;; Apache excludes great rafts of characters that should be allowed. I wouldn’t be surprised to discover that they’ve never updated from the rules that preceded the fifth edition. According to Apache, there are 21,277 characters allowed by ixml that aren’t allowed in XML. :-( I think I’d be reluctant to dramatically reduce the name characters allowed by ixml just to suit the whims of the Apache implementation. Especially since many of the excluded characters are clearly letters. Be seeing you, norm -- Norm Tovey-Walsh Saxonica
Received on Wednesday, 2 March 2022 11:04:38 UTC