>RE delenda est.
>I am not a number. I am an undefined character.

His first concern is valid, since files come in lines, and since RE's are
arguably one of SGML's stickiest tar-pits.

His second concern is vastly overblown.  Given the use of the 10646
repertoire, the population of characters that are needed but undefined
falls dramatically - those that do appear are either

 a) so exotic that a bit of extra work in encoding them seems a minor
    concern, or
 b) really graphics in disguise, like for example a Xerox trademark.

I have no problem acknowledging that XML may not make it particularly easy 
or natural to deal with characters outside of the hundred thousand or so 
that 10646 provides.

Why is this problem important? [Not being sarcastic, I'd really like 
to get some input on this].

Cheers, Tim Bray
