Re: Review of N-Triples draft from Gregory Williams on 2013-07-15 (public-rdf-comments@w3.org from July 2013)

From: Gregory Williams <greg@evilfunhouse.com>
Date: Mon, 15 Jul 2013 23:27:53 +0300
To: Gavin Carothers <gavin@carothers.name>
Cc: "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
Message-Id: <21F91002-EB10-43A3-8C0E-ACC3057D02EC@evilfunhouse.com>
Hi Gavin. Thanks for the very quick response. Happy with the state of most of these now. A few comments on ones I think still need discussion below.


On Jul 15, 2013, at 11:00 PM, Gavin Carothers <gavin@carothers.name> wrote:

>> What is the rationale for a "canonical N-Triples" document encoding characters "directly and not by UCHAR"? This means that any existing N-Triples document that includes non-ASCII data is by definition not canonical, correct?
>> 
>> What is the rationale for disallowing a space after the object of a triple? A much simpler, and more regular rule for serializers wishing to produce "canonical N-Triples" would be that the only use of the WS token should be a single space after every term.
> 
> Simplicity of explaining in the current grammar. Was going for if the grammar requires some whitespace require it to be a space, otherwise require no whitespace. I think the rule is reasonably simple. The optionality of whitespace after object is in the original n-triples definition as well.

FWIW, I found the current text regarding whitespace to be confusing, and had to read it several times to understand that it meant one space between s-p, and one between p-o, but none afterwards. E.g. the apparent conflict between "Space between terms (WS+) should be a single space" and "Space after or before terms (WS*) should be empty". I understand the trailing whitespace is optional in both this and the original N-Triples. I was hoping for insight into why "Canonical N-Triples" shouldn't just say "one space following every term" which I believe to be simpler both in describing the constrained grammar and in implementation.

>> There should be another constraint on "canonical N-Triples" documents indicating when either of the two forms of UCHAR must be used. (Or, better, require *all* n-triples documents, whether canonical or not, to conform to such a constraint as the old RDF Test Cases N-Triples format did.)
> 
> I'm sorry, I don't understand this comment. Is this addressing the capitalization of HEX? If so that's already mentioned. If it's about \u vs \U that's also addressed... ah, perhaps that's the issue? 
> Something along the lines of:
> 
> [#x7F-#xFFFF]	\uHHHH
> 4 required hexadecimal digits HHHH encoding Unicode character u
> [#10000-#x10FFFF]	\UHHHHHHHH
> 8 required hexadecimal digits HHHHHHHH encoding Unicode character u           
> 
> for serialization?

Yes, the latter. The original N-Triples did this: you didn't have a choice between \u and \U forms. The codepoint value dictated which escape form had to be used. I remain convinced that the flexibility in the new n-triples is a terrible idea, but if it has to stay in, I think the "canonical N-Triples" definition must include a rule like this which constrains the choice of escape form.

>> == A. N-Triples Internet Media Type, File Extension and Macintosh File Type
>> 
>> Why is the new media type for N-Triples "application/n-triples" and not "text/n-triples"? This format is explicitly described as a "plain text format" in the abstract of the document.
> 
> 
> Summary of WG discussion on the issue:
> 
> * N-triples is less readable than Turtle and more directed to machine processing. 
> * text/* would default to ISO-8859-1 encoding, which is not the goal. 

It would default to that without the spec saying otherwise, but since the spec *does* say otherwise (in many places, but most relevantly in the "N-Triples Internet Media Type, File Extension and Macintosh File Type" section), I believe that should be enough per RFC 6657.

> * application/* subtypes unknown to an implementation MUST be treated as binary data.
> * Opening text/* in a browser causes it to be displayed, while opening application/* causes it to be downloaded.

If the WG feels this is important, I guess I can understand that. I've always found text/* to be much easier to deal with as 1) it's trivial to force a link to download in a browser with an extra key-press and 2) it *allows* peeking inside the file in the browser if desired (which is often impossible with application/*).



thanks,
.greg
Received on Monday, 15 July 2013 20:28:22 UTC