[Bug 3838] fn:codepoints-to-string should allow any infoset character from bugzilla@wiggum.w3.org on 2006-10-16 (public-qt-comments@w3.org from October 2006)

From: <bugzilla@wiggum.w3.org>
Date: Mon, 16 Oct 2006 01:02:48 +0000
To: public-qt-comments@w3.org
CC:
Message-Id: <E1GZGsS-0004T7-IG@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=3838





------- Comment #2 from per@bothner.com  2006-10-16 01:02 -------
(In reply to comment #1)
> There are of course use cases for such an enhancement, but this is not the time
> to be considering enhancements.

I can certainly understand that.

I might suggest making it implementation-defined what happens if a codepoint is
not an XML character, rather than requiring an error to be raised, but I
understand there would be reluctance even for that.

> There are also considerable technical complications in allowing the string data
> type to use a wider character set than XML permits. For example, it would
> require a careful look at the rules for regular expressions.

I can see one would want to read through the rules to avoid any
inconsistencies, but I can't imagine any fundamental problems.  Java (and Perl)
regular expressions obviously aren't restricted to XML characters.

> There's also a
> potentially significant performance penalty if characters in a string have to
> be checked for XML-validity at the time text or attribute nodes are
> constructed, rather than at the time the string is constructed.

Alternatively, you do the checking for XML-validity at serialization time.  You
have to do that anyway, to check which characters need to be encoded or
escaped.  

This makes sense if one extends the datamodel to match the infoset model, in
allowing arbitrary characters in text modes (and also attribute values,
comments, and processing-instructions).  But I agree the issues to be
considered are too big for this point..

> Apart from that, the XML working group chose consciously to disallow certain
> characters, and we should respect that decision unless there are very
> compelling reasons. Sometimes it's more important to be consistent than to be
> right.

The XML working group seems to have disallow many fewer characters in XML 1.1,
now only disallowing 0, surrogages, FFFE, and FFFF.

> The XPath data model should have XML as its foundation.

I have mixed feelings about this philosophy.  A more general data model that
just happens to subsume XML infoset model can also be very useful.

One issue is being able to use general-purpose libraries that do not have to be
specially-written for XQuery/XPath.  For example one would like to use common
libraries for regular expressions or more generally string-hamdling across
multiple languages rather than writing XPath-specific string functions.  That
is often difficult because of minor specification differences.

If there is a list of issues to be considered for XQuery 1.1, perhaps this
could be added to it?  Apart from that, I won't object to closing this issue.

Received on Monday, 16 October 2006 01:02:53 UTC