Review of N-Triples draft from Gregory Williams on 2013-07-15 (public-rdf-comments@w3.org from July 2013)

From: Gregory Williams <greg@evilfunhouse.com>
Date: Mon, 15 Jul 2013 20:25:13 +0300
To: "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
Message-Id: <B57DBF50-B29D-4717-8451-DE89DB198FB7@evilfunhouse.com>

After seeing Gavin indicate on the WG mailing list that N-Triples is nearing review/publication, I thought I'd send along some comments after reading through the current ED. These are in addition to my previous comments on N-Triples [1,2] where I have objected to the major proposed changes adding complexity to N-Triples, and continue to believe that these changes are a mistake for N-Triples.

== 1. Introduction

"When parsed by a Turtle parser, data in the N-Triples format will produce exactly the same triples as a parser for the restricted N-triples language." What is the 'restricted N-triples language'? This is the only place in the document it is mentioned.

== 2.2 IRIs

"IRIs are enclosed in '<' and '>' and may contain numeric escape sequences (described below)." The angle brackets used here should be styled similarly to all other token values (e.g. in orange, tt text). There are other places in the document where similar styling issues may be observed.

== 2.3 RDF Literals

"Literals may not contain the characters ", LF, or CR." Surely this needs rephrasing, as a literal can contain these characters, while the literal's serialized lexical form must ensure that these characters are escaped(?).

"If there is no datatype IRI and no language tag, the datatype is xsd:string." Neither the xsd prefix, nor xsd:string is defined in this document, and no link is provided to its definition or fully qualified value.

== 2.4 RDF Blank Nodes

I believe the reference to "digits" in discussing the liberalization of PN_CHARS_BASE should be replaced with an orange, tt text '[0-9]' as in the unicode context, "digit" is insufficiently precise.

== 3. Changes from RDF Test Cases format

Is "Subset of Turtle rather than Notation 3" meant to hide grammar changes? If so, please explicitly enumerate the actual changes. For example, the BLANK_NODE_LABEL token in the new grammar allows bnode IDs to start with a digit, while the old RDF Test Cases N-Triples used the 'nodeID' production which in turn allowed bnode IDs with the 'name' production which had to start with [A-Za-z].

== 4. Conformance

The description of a "canonical N-Triple document" must use MUST instead of SHOULD normative language to have any meaning. Otherwise any N-Triples document whatsoever is a valid "canonical N-Triples" document.

What is the rationale for a "canonical N-Triples" document encoding characters "directly and not by UCHAR"? This means that any existing N-Triples document that includes non-ASCII data is by definition not canonical, correct?

What is the rationale for disallowing a space after the object of a triple? A much simpler, and more regular rule for serializers wishing to produce "canonical N-Triples" would be that the only use of the WS token should be a single space after every term.

There should be another constraint on "canonical N-Triples" documents indicating when either of the two forms of UCHAR must be used. (Or, better, require *all* n-triples documents, whether canonical or not, to conform to such a constraint as the old RDF Test Cases N-Triples format did.)

== 6.1 RDF Term Constructors

Several of these descriptions reference productions or tokens that are not used in the relevant grammar rules, or simply do not exist in the current grammar.

For example, the prodedure listed for handling the BLANK_NODE_LABEL production says: "The string matching the second argument, PN_LOCAL, is a key in bnodeLabels. If there is no corresponding blank node in the map, one is allocated." However, the BLANK_NODE_LABEL does not reference PN_LOCAL (which is not defined in the grammar). It is currently defined as:

'_:' (PN_CHARS_U | [0-9]) ((PN_CHARS | '.')* PN_CHARS)?

== A. N-Triples Internet Media Type, File Extension and Macintosh File Type

Why is the new media type for N-Triples "application/n-triples" and not "text/n-triples"? This format is explicitly described as a "plain text format" in the abstract of the document.

thanks,
.greg

[1] http://lists.w3.org/Archives/Public/public-rdf-comments/2013Apr/0063.html
[2] http://lists.w3.org/Archives/Public/public-rdf-comments/2013Jul/0019.html

Received on Monday, 15 July 2013 17:25:40 UTC