W3C home > Mailing lists > Public > public-rdf-comments@w3.org > November 2013

Re: Escaped characters in RDF-1.1 N-Triples literals for Canonical documents

From: Andy Seaborne <andy@apache.org>
Date: Mon, 18 Nov 2013 11:09:49 +0000
Message-ID: <5289F57D.60805@apache.org>
To: public-rdf-comments@w3.org
On 17/11/13 22:50, Peter Ansell wrote:
> The Conformance section (Section 4) of the RDF-1.1 N-Triples Candidate
> Recommendation (05 November 2013) specifies that for a canonical
> document [1] :
>      "Characters not allowed directly in STRING_LITERAL_QUOTE (U+0022,
> U+005C, U+000A, U+000D) MUST use ECHAR not UCHAR. "
> However, the escape sequences in ECHAR do not seem to include U+005C "\" [2]:
>      [153s] ECHAR ::= '\' [tbnrf"']
> That is, ECHAR defines escapes for \t \b \n \r \f \" \' , but it
> doesn't appear that \\ is allowed for in that grammar. It could be
> escaped using UCHAR as \u005C, but that seems to violate the canonical
> rule that specifically mentions it.
> In addition, is it intentional that the list of characters mentioned
> in the canonical section [1] does not include all of the characters
> with escapes defined in ECHAR [2]? Should the characters that appear
> in ECHAR [2] but not in the list in [1] be escaped using UCHAR in
> Canonical documents or be represented using their raw UTF-8 values.
> Cheers,
> Peter
> [1] http://www.w3.org/TR/2013/CR-n-triples-20131105/#conformance
> [2] http://www.w3.org/TR/2013/CR-n-triples-20131105/#grammar-production-ECHAR

Hi Peter,

Thanks for pointing that out.  It looks a systematic bug in the tool 
chain that we failed to squash.

I've recorded it on the WG comments:


This is not a formal response to your comment.

I have fixed the documents (which is all subject to WG approval) as 
follows and if you are satisfied, please do send an early confirmation 
of dealing with your comment to your satisfaction.


N-Triples and N-Quads:

ECHAR 	::= 	'\' [tbnrf"\]

which does not include ' because strings can't use '-quoting in 
N-Triples and N-Quads and there is a desire to minimise the number of 
ways of writing the same thing.

In addition, I've checked Turtle and TriG (Turtle already had a related 
fix recently) to put the characters in the same order because \" is 
confusing (it is not escaping a " in the grammar itself).

ECHAR 	::= 	'\' [tbnrf'"\]

(Turtle and TriG have a ' as well)

Links to the rule in the grammar in the editors' drafts:








Received on Monday, 18 November 2013 11:10:27 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:59:43 UTC