Comment on N-Triples document (was: Re: Comments regarding "Turtle and N-Triples Synaxes for RDF")

RDF WG,

I had submitted comments on N-triples back when it was a part of the Turtle document pre-LC, and Gavin indicated that N-Triples wasn't ready for publication and that my comments would hopefully be addressed in the future. In checking in on this a few days ago, I noticed that N-Triples now seems to be Note-track ("First Public Working Group Note", though I'm not sure what that actually means). The Note includes changes to N-Triples from the previous RDF Test Cases format. This concerns me (as I have already mentioned to Gavin on twitter) as I think the path of least resistance for the WG may be to leave this Note up without requiring the process involved in a REC-track document. I don't mean to suggest that the WG is intentionally trying to avoid addressing feedback, but that it may simply be hard to motivate the required work at this point.

My main concern was in the (IMO significant) change from having a single way to encode string codepoints in the old RDF Test Cases format (the encoding table in section 3.2) to the new format where many codepoints have multiple valid encodings (direct utf8 data, \u and \U escaped forms with mixed case hex numbers, direct escaped forms like "\r", etc.). Although I haven't gone through the most recent Note text in detail, I believe my comments still stand, and would appreciate a response on them.

thanks,
.greg


On Jul 11, 2012, at 7:50 AM, Gavin Carothers <gavin@carothers.name> wrote:

> On Sat, May 19, 2012 at 11:22 AM, Gregory Williams
> 
>> === 12 N-Triples
>> 
>> "These may be seperated by white space (spaces #x20 or tabs #x9)."
>> I assume "these" here refer to the RDF terms, not the triples?
> 
> N-Triples all gone from Turtle document. Will attempt to address
> issues before FPWD of N-Triples document.
> 
>> === 12.3 Grammar
>> 
>> I'm not happy with the change to make N-Triples a unicode format. This change means that tools interacting with N-Triples will have to be unicode aware, and support the \u style of unicode escapes used in N-Triples. This is a big change from the old N-Triples format, where command line tools such as sort/uniq/cut/join could be used to easily parse and perform simple processing of N-Triples data. With the unicode change, this strategy is now much more likely to not work, as a single value now has many equivalent syntactic forms (e.g. "Spīdermann" vs. "Sp\u00EFdermann"). Moreover, even the unicode escapes now have many equivalent forms, as the HEX production in the grammar has been made case insensitive, accepting [0-9A-Fa-f] instead of the old [0-9A-F] (e.g. "Sp\u00EFdermann" vs. "Sp\u00efdermann"). As mentioned above, this is also an issue with case insensitive language tags. Can you provide a pointer to any discussion that occurred in the WG about the reasoning behind this change?
>> 
>> 
>> No mention is made of comments in the N-Triples grammar section. They are mentioned in the introduction (section 1), used in the N-Triples example in section 12, and as a change from the test cases format (in section 12.2), but there are no specifics given. If N-Triples comment handling is intended to be identical to that of Turtle, this should be stated explicitly.
>> 
>> 
>> "[1]            ntriplesDoc             ::=     (triple)? (EOL triple)* (EOL)?"
>> This rule seems oddly restrictive. For example, it seems to forbid an N-Triples document with consecutive newline characters. The turtle grammar has a sub-section describing white space handling, but no such section exists for the N-Triples grammar. This makes it tough to know exactly how to interpret this rule.
> 
> Lots of stuff, here. In general N-Triples was NOT ready for Last Call.
> Will use as input into the FPWD of N-Triples.

Received on Thursday, 25 April 2013 09:49:02 UTC