Re: Review of N-Triples draft from Gavin Carothers on 2013-07-15 (public-rdf-comments@w3.org from July 2013)

From: Gavin Carothers <gavin@carothers.name>
Date: Mon, 15 Jul 2013 13:00:32 -0700
To: Gregory Williams <greg@evilfunhouse.com>
Cc: "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
Message-ID: <CAPqY83zyWZfKkfsc2j+bZZ+hZ6YaEjc_HW6iBWe5NUak1OTnSQ@mail.gmail.com>
On Mon, Jul 15, 2013 at 10:25 AM, Gregory Williams <greg@evilfunhouse.com>wrote:

> After seeing Gavin indicate on the WG mailing list that N-Triples is
> nearing review/publication, I thought I'd send along some comments after
> reading through the current ED. These are in addition to  my previous
> comments on N-Triples [1,2] where I have objected to the major proposed
> changes adding complexity to N-Triples, and continue to believe that these
> changes are a mistake for N-Triples.
>

Thank you very much for your review, specific comments are answered inline
bellow, the wider "major proposed changes adding complexity to N-Triples"
is not addressed in this response. If your other comments are sufficiently
addressed please reply to this email with a subject starting with
"[RESOLVED]".


>
>
> == 1. Introduction
>
> "When parsed by a Turtle parser, data in the N-Triples format will produce
> exactly the same triples as a parser for the restricted N-triples
> language." What is the 'restricted N-triples language'? This is the only
> place in the document it is mentioned.
>

Fixed. Left over prose from when N-Triples was inside the Turtle spec.
Thanks for catching that.


>
> == 2.2 IRIs
>
> "IRIs are enclosed in '<' and '>' and may contain numeric escape sequences
> (described below)." The angle brackets used here should be styled similarly
> to all other token values (e.g. in orange, tt text). There are other places
> in the document where similar styling issues may be observed.
>

Fixed. Will check for other instances.


>
> == 2.3 RDF Literals
>
> "Literals may not contain the characters ", LF, or CR." Surely this needs
> rephrasing, as a literal can contain these characters, while the literal's
> serialized lexical form must ensure that these characters are escaped(?).
>

Fixed.


>
> "If there is no datatype IRI and no language tag, the datatype is
> xsd:string." Neither the xsd prefix, nor xsd:string is defined in this
> document, and no link is provided to its definition or fully qualified
> value.
>

Fixed.


>
> == 2.4 RDF Blank Nodes
>
> I believe the reference to "digits" in discussing the liberalization of
> PN_CHARS_BASE should be replaced with an orange, tt text '[0-9]' as in the
> unicode context, "digit" is insufficiently precise.
>

Done.


>
> == 3. Changes from RDF Test Cases format
>
> Is "Subset of Turtle rather than Notation 3" meant to hide grammar
> changes?


No. It's meant to deal with this sentence:

     It was designed to be a fixed subset of
N3[N3<http://www.w3.org/TR/rdf-testcases/#n3>]
[N3-Primer <http://www.w3.org/TR/rdf-testcases/#n3_primer>] and hence N3
tools such as cwm [CWM <http://www.w3.org/TR/rdf-testcases/#ref_cwm>],
n-triples2kif [N-TRIPLES2KIF<http://www.w3.org/TR/rdf-testcases/#ref_ntriples2kif>],
and Euler [EULER <http://www.w3.org/TR/rdf-testcases/#ref_euler>] can be
used to read and process it. cwm can output this format when invoked as
"cwm -ntriples".

As mentioned elsewhere N-Triples is now a subset of Turtle. What N3 tools
do with it is not defined or of great interest to the WG.



> If so, please explicitly enumerate the actual changes. For example, the
> BLANK_NODE_LABEL token in the new grammar allows bnode IDs to start with a
> digit, while the old RDF Test Cases N-Triples used the 'nodeID' production
> which in turn allowed bnode IDs with the 'name' production which had to
> start with [A-Za-z].
>

Added bnode IDs to start with a digit to list of changes.


>
> == 4. Conformance
>
> The description of a "canonical N-Triple document" must use MUST instead
> of SHOULD normative language to have any meaning. Otherwise any N-Triples
> document whatsoever is a valid "canonical N-Triples" document.
>

Agreed.


>
> What is the rationale for a "canonical N-Triples" document encoding
> characters "directly and not by UCHAR"? This means that any existing
> N-Triples document that includes non-ASCII data is by definition not
> canonical, correct?
>
> What is the rationale for disallowing a space after the object of a
> triple? A much simpler, and more regular rule for serializers wishing to
> produce "canonical N-Triples" would be that the only use of the WS token
> should be a single space after every term.
>

Simplicity of explaining in the current grammar. Was going for if the
grammar requires some whitespace require it to be a space, otherwise
require no whitespace. I think the rule is reasonably simple. The
optionality of whitespace after object is in the original n-triples
definition as well.


>
> There should be another constraint on "canonical N-Triples" documents
> indicating when either of the two forms of UCHAR must be used. (Or, better,
> require *all* n-triples documents, whether canonical or not, to conform to
> such a constraint as the old RDF Test Cases N-Triples format did.)
>

I'm sorry, I don't understand this comment. Is this addressing the
capitalization of HEX? If so that's already mentioned. If it's about \u vs
\U that's also addressed... ah, perhaps that's the issue?
Something along the lines of:

[#x7F-#xFFFF] \u*HHHH*
4 required hexadecimal digits *HHHH* encoding Unicode character *u*
[#10000-#x10FFFF] \U*HHHHHHHH*
8 required hexadecimal digits *HHHHHHHH* encoding Unicode character *u*
for serialization?


>
> == 6.1 RDF Term Constructors
>
> Several of these descriptions reference productions or tokens that are not
> used in the relevant grammar rules, or simply do not exist in the current
> grammar.
>
> For example, the prodedure listed for handling the BLANK_NODE_LABEL
> production says: "The string matching the second argument, PN_LOCAL, is a
> key in bnodeLabels. If there is no corresponding blank node in the map, one
> is allocated." However, the BLANK_NODE_LABEL does not reference PN_LOCAL
> (which is not defined in the grammar). It is currently defined as:
>
>   '_:' (PN_CHARS_U | [0-9]) ((PN_CHARS | '.')* PN_CHARS)?
>

Fixed that one, will check the others.


>
> == A. N-Triples Internet Media Type, File Extension and Macintosh File Type
>
> Why is the new media type for N-Triples "application/n-triples" and not
> "text/n-triples"? This format is explicitly described as a "plain text
> format" in the abstract of the document.
>


Summary of WG discussion on the issue:

* N-triples is less readable than Turtle and more directed to machine
processing.
* text/* would default to ISO-8859-1 encoding, which is not the goal.
* application/* subtypes unknown to an implementation MUST be treated as
binary data.
* Opening text/* in a browser causes it to be displayed, while opening
application/* causes it to be downloaded.
* Mixed precedent. JSON uses application/*, HTML uses text/*, CSV uses
text/*.

In light of the issue, the WG has chosen to use 'application/n-triples' for
N-Triples as it tends to serve as a database dump format and not human
readable text.


>
>
>
> thanks,
> .greg
>
> [1]
> http://lists.w3.org/Archives/Public/public-rdf-comments/2013Apr/0063.html
> [2]
> http://lists.w3.org/Archives/Public/public-rdf-comments/2013Jul/0019.html
>
>
>
>
Received on Monday, 15 July 2013 20:01:00 UTC