- From: <bugzilla@jessica.w3.org>
- Date: Thu, 17 Feb 2011 09:11:57 +0000
- To: public-qt-comments@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12105 Summary: [XDM 3.0] Allow any Unicode character in a string Product: XPath / XQuery / XSLT Version: Working drafts Platform: PC OS/Version: Windows NT Status: NEW Severity: enhancement Priority: P2 Component: Data Model 3.0 AssignedTo: ndw@nwalsh.com ReportedBy: mike@saxonica.com QAContact: public-qt-comments@w3.org This is an enhancement request to enhance the data model so that any Unicode character is allowed in a string. It is raised in response to an action from the XSL Working Group. In practice the proposed change means (a) all XML 1.1 characters are allowed by all processors, and (b) the Unicode NUL character (x0) is allowed by all processors. Serialization would fail if a string contains a character not permitted in the version of XML that is the target of serialization. Tree construction, however, will not reject any characters as invalid. Parsing of lexical XML is still free to use XML 1.0 or XML 1.1 rules an implementor discretion. Justification: we allow input from sources that are not constrained by the XML rules, notably by using unparsed-text() or codepoints-to-string(), or by calling external functions. Restricting the character set that can be returned by these functions creates work for implementors, imposes a performance penalty, and restricts what users can do with the language, all quite unnecessarily. We want to allow import of JSON data, with full round-tripping. This is hampered by the fact that JSON strings allow characters that are not legal in XDM. The alternative is to hold such strings in escaped form, which is very inconvenient for users. Casting to string will not reject characters disallowed by XML. For validation of XDM nodes (e.g. using [xsl:]validation or XQuery validate{}) it will be implementation-defined whether the character set allowed in xs:string values is XML 1.0, XML 1.1, or the full XDM set. This preserves the freedom of implementations to use an off-the-shelf validation engines. [For the avoidance of doubt, "any character" does not include unpaired surrogates. It is of course possible that some external data sources will supply pseudo-strings containing unpaired surrogates. This is analogous to supplying a string that is supposed to be encoded in UTF-8 but contains bytes that cannot be decoded: it is not possible to interpret what is returned as a sequence of characters. An interface that wishes to handle octet streams containing such oddities must handle it as a sequence of integers, or as hexBinary)]. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Thursday, 17 February 2011 09:11:59 UTC