W3C home > Mailing lists > Public > public-qt-comments@w3.org > April 2015

[Bug 28476] New: [SER 3.1]JSON serialization: escaping strings

From: <bugzilla@jessica.w3.org>
Date: Sun, 12 Apr 2015 21:50:04 +0000
To: public-qt-comments@w3.org
Message-ID: <bug-28476-523@http.www.w3.org/Bugs/Public/>
https://www.w3.org/Bugs/Public/show_bug.cgi?id=28476

            Bug ID: 28476
           Summary: [SER 3.1]JSON serialization: escaping strings
           Product: XPath / XQuery / XSLT
           Version: Last Call drafts
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Serialization 3.1
          Assignee: cmsmcq@blackmesatech.com
          Reporter: mike@saxonica.com
        QA Contact: public-qt-comments@w3.org

Section 9 says:

* An atomic valueXP31 in the data model instance of any other type is
serialized to a JSON string by outputting the result of applying the fn:string
function to the item.

* A node in the data model instance is serialized as a JSON string by
outputting the result of serializing the node using the method specified by the
json-node-output-method parameter. If the json-node-output-method parameter is
set to xml or xhtml then the node is serialized with the additional
serialization parameter omit-xml-declaration set to yes.

In both cases it fails to mention the need to enclose the string in quotes, and
the need to escape special characters (such as quotes) to make them legal JSON.

Less obviously, it fails to describe the detailed escaping rules.

Suitable rules can be found in the xml-to-json function:

* Any occurrence of backslash (\) is replaced by \\.

* Any occurrence of quotation mark, backspace, form-feed, newline, carriage
return, or tab is replaced by \", \b, \f, \n, \r, or \t respectively, and any
other codepoint in the range 1-31 or 127-159 is replaced by an escape in the
form \uHHHH where HHHH is the hexadecimal representation of the codepoint
value.

I wonder if we should reconsider the rule in 9.1.3: "If the instance of the
data model contains a character that cannot be represented in the encoding that
the serializer is using for output, the serializer MUST signal a serialization
error [err:SERE0008]." Would it not be friendlier to escape any such character?
It seems reasonable to ask for JSON in US-ASCII encoding, with the intent that
all non-ASCII characters should be represented using \u escape sequences.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
Received on Sunday, 12 April 2015 21:50:06 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 12 April 2015 21:50:07 UTC