Omissions, Errors, and Misleading Prose in the N-Triples Specification from Sean B. Palmer on 2003-09-05 (www-rdf-comments@w3.org from July to September 2003)

From: Sean B. Palmer <sean@mysterylights.com>
Date: Fri, 5 Sep 2003 05:23:02 +0100
To: "RDF Comments" <www-rdf-comments@w3.org>
Message-ID: <022e01c37365$65da43c0$1054ff3e@z5n9x1>

All section references are to rdf-testcases [1].

1) Section 3.2: "[t]he characters outside the US-ASCII range are made
available by \-escape sequences as follows". However, some of the
characters in the table are *inside* the US-ASCII range; i.e. #x5C,
#x22, #x0A, #x0D, and #x09.

2) It is not clear that #x0A, #x0D, and #x09 need to be encoded,
except that they are not allowed in the character production of
section 3.1.

3) #x5C and #x22 (backslash and quote marks) are not disallowed from
strings by the grammar, and there is no clear prose that disallows
them either. Therefore, it is not stated that they are to be encoded
within literals. This means that "\x" is a valid N-Triples literal,
and "\" and """ are very ambiguous, and possibly valid.

4) "#X5C" in the table in section 3.2 should be "#x5C".

5) Since, as stated in section 3.1, the employed "EBNF cannot perform
the counting required by the Primary-subtag and Subtag productions",
perhaps it would be useful to either a) switch to an EBNF that *can*
perform the counting, or b) note the counting in prose, and state
whether conformant N-Triples parsers are required to perform such
counting.

6) Conformance levels are not clearly specified. Does a conformant
N-Triples parser have to fully check URI syntax, for example?
Primary-subtag and Subtag counting?

7) It is not clear that the absoluteURI production in N-Triples
exactly matches (or imports) the absoluteURI production from RFC 2396,
though the RFC is cited.

8) Section 3.3: "[c]haracters above the US-ASCII range are made
available by the \u or \U escapes". I am aware that this has been
raised before, but this section should be removed, and UTF-8 + %HH
encoding or non-US-ASCII characters used for synchronicity with the
IRI mechanism (being employed in, e.g., XPointer, XInclude, and XML
Base).

9) Please indicate whether or not a charset parameter may or must not
be used in conjunction with the text/plain MIME type, since according
to section 3.1 the only allowed encoding is us-ascii.

Note that many of the comments above are based on implementor
experience, in building a Python RDF API that includes N-Triples
tools.

Thanks,

[1] http://www.w3.org/TR/rdf-testcases/
- W3C Working Draft 23 January 2003

--
Sean B. Palmer, <http://purl.org/net/sbp/>
"phenomicity by the bucketful" - http://miscoranda.com/

Received on Friday, 5 September 2003 00:26:11 UTC