Re: Proposal: remove “.” from namefollower

Norm Tovey-Walsh <norm@saxonica.com> writes:
> I haven’t worked out precisely what’s outside the intersection, but I

Challenge accepted.

It turns out, in fact, to be quite interesting. The fifth edition of XML
ostensibly allows a huge range of characters to appear in names. But it
doesn’t appear that parsers (at least not Xerces which is what I have at
hand) allow them. 

When I implemented checking, I took the XML 5e list of name characters
and excluded Unicode “non characters” and surrogates. (I don’t recall
where I got the inspiration for doing that.) With those rules, there are
only three characters allowed by ixml that are not allowed by XML:

  00AA;FEMININE ORDINAL INDICATOR;Lo;0;L;<super> 0061;;;;N;;;;;
  00B5;MICRO SIGN;Ll;0;L;<compat> 03BC;;;;N;;;039C;;039C
  00BA;MASCULINE ORDINAL INDICATOR;Lo;0;L;<super> 006F;;;;N;;;;;

Apache excludes great rafts of characters that should be allowed. I
wouldn’t be surprised to discover that they’ve never updated from the
rules that preceded the fifth edition.

According to Apache, there are 21,277 characters allowed by ixml that
aren’t allowed in XML. :-(

I think I’d be reluctant to dramatically reduce the name characters
allowed by ixml just to suit the whims of the Apache implementation.
Especially since many of the excluded characters are clearly letters.

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Wednesday, 2 March 2022 11:04:38 UTC