[Bug 28151] New: [F+O 3.1] Minor inconsistencies between parse-json() and json-to-xml()

https://www.w3.org/Bugs/Public/show_bug.cgi?id=28151

            Bug ID: 28151
           Summary: [F+O 3.1] Minor inconsistencies between parse-json()
                    and json-to-xml()
           Product: XPath / XQuery / XSLT
           Version: Candidate Recommendation
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Functions and Operators 3.1
          Assignee: mike@saxonica.com
          Reporter: mike@saxonica.com
        QA Contact: public-qt-comments@w3.org

There are some minor inconsistencies between the functions parse-json() and
json-to-xml() (newly transferred from XSLT 3.0) which ought to be addressed.

I have fixed those that are purely editorial. The others are:

A. parse-json() accepts an empty sequence as the first argument, json-to-xml()
does not. 

RECOMMENDATION: accept an empty sequence.

B. json-to-xml() explicitly permits a byte-order-mark at the start of the
content. parse-json() says nothing on the subject, except by reference to RFC
7159. The RFC says "In the interests of interoperability, implementations that
parse JSON texts MAY ignore the presence of a byte order mark rather than
treating it as an error."

RECOMMENDATION: accept (and ignore) a byte order mark.

C. Duplicate keys in maps. parse-json() provides three options: reject,
use-first, and use-last. json-to-xml() says nothing on the subject, other than
what can be deduced (a) from the option validate=true, which validates the
resulting XML against a supplied schema, which prohibits duplicates, and (b) a
statement that the mapping from JSON to XML preserves order.

I don't think it's appropriate to make json-to-xml() provide exactly the same
options as parse-json() here, firstly because of the interaction with schema
validation, and secondly because we want the conversion to be streamable and
therefore order-preserving. I think the appropriate options for json-to-xml()
would be retain, reject, and use-first: retain means that duplicates are
retained in the result (making it invalid against the schema, so this option is
incompatible with validate=true); reject and use-first have the same meaning as
for parse-json().

RECOMMENDATION: In json-to-xml(), add the option
duplicates=retain|reject|use-first. If the effective value of validate is true,
duplicates defaults to reject; otherwise it defaults to retain. The value
duplicates=retain is incompatible with validate=true.

D. Unescaping and invalid characters. Both functions have options whether to
unescape JSON escape sequences, and both default to substituting invalid
characters with xFFFD. The json-to-xml() function has an additional option
("fallback") to supply a user-written function to handle invalid characters.

RECOMMENDATION: Add the fallback option to parse-json().

Note also, for both functions the substitution of xFFFD happens only for
non-XML characters represented as JSON escape sequences. The fallback option
also handles non-XML characters that are represented directly in unescaped form
(for example, C1 control characters, or cp1252 characters masquerading as C1
control characters). 

RECOMMENDATION: Both functions should handle non-XML characters represented in
unescaped form in the same way as if they were written in escaped form (that
is, the handling depends on the unescape and fallback options).

The specification of json-to-xml() in the "Errors" section contains a spurious
error condition for invalid characters, which does not match what is said in
the rules. I will correct this.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Received on Friday, 6 March 2015 09:17:34 UTC