Escaped characters in RDF-1.1 N-Triples literals for Canonical documents from Peter Ansell on 2013-11-17 (public-rdf-comments@w3.org from November 2013)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Mon, 18 Nov 2013 09:50:07 +1100
To: "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
Message-ID: <CAGYFOCR+ESkg2OSZapLP3gOxTcXz8aod=3PDpXszCuU+oG1dmw@mail.gmail.com>

The Conformance section (Section 4) of the RDF-1.1 N-Triples Candidate
Recommendation (05 November 2013) specifies that for a canonical
document [1] :

    "Characters not allowed directly in STRING_LITERAL_QUOTE (U+0022,
U+005C, U+000A, U+000D) MUST use ECHAR not UCHAR. "

However, the escape sequences in ECHAR do not seem to include U+005C "\" [2]:

    [153s] ECHAR ::= '\' [tbnrf"']

That is, ECHAR defines escapes for \t \b \n \r \f \" \' , but it
doesn't appear that \\ is allowed for in that grammar. It could be
escaped using UCHAR as \u005C, but that seems to violate the canonical
rule that specifically mentions it.

In addition, is it intentional that the list of characters mentioned
in the canonical section [1] does not include all of the characters
with escapes defined in ECHAR [2]? Should the characters that appear
in ECHAR [2] but not in the list in [1] be escaped using UCHAR in
Canonical documents or be represented using their raw UTF-8 values.

Cheers,

Peter

[1] http://www.w3.org/TR/2013/CR-n-triples-20131105/#conformance
[2] http://www.w3.org/TR/2013/CR-n-triples-20131105/#grammar-production-ECHAR

Received on Sunday, 17 November 2013 22:50:34 UTC