Re: n-quads & Turtle Levels from Ivan Herman on 2012-05-30 (public-rdf-wg@w3.org from May 2012)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 30 May 2012 11:31:06 +0200
To: Sandro Hawke <sandro@w3.org>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
Message-Id: <60BD36EB-3624-4008-99A7-F4E81BFF74B7@w3.org>
On May 30, 2012, at 01:52 , Sandro Hawke wrote:

> On Mon, 2012-05-28 at 14:01 +0100, Andy Seaborne wrote:
>> 
>> On 28/05/12 13:11, Ivan Herman wrote:
>>>> I don't see why. The only spec that has any reason to mention quads
>>>> is N-Quads. (Well, JSON-LD may too but it uses a definition that's
>>>> different from Sandro's.) Other uses of quads are implementation
>>>> strategies and those don't belong into the specs.
>>> Correct. My question was whether this WG would define NQuads as well
>>> or not. If we do define NQuads (and I do not believe this has been
>>> decided pro or con) then we have to properly define Quads and that in
>>> relations to any formalism we have on named graphs. If we decide that
>>> NQuads are not to be formally defined by this WG, then indeed this
>>> section may become unnecessary.
>>> 
>>> Ivan
>>> 
>> 
>> Firstly, I think we really ought to define N-Quads; it's in use and 
>> extending the N-Triples work to N-Quads is valuable.
> 
> I thought so too -- which is why I wrote it up for the rdf-spaces
> document, but the discussion with Manu in the last telecon gave me
> second thoughts.  
> 
> He was arguing how bad it was to be proliferating syntaxes.  I'm very
> sensitive to his criticism: in the OWL WG, having OWL 2 QL, EL, and RL,
> with the Direct and RDF-Based Semantics, ... it all made so much sense
> and seemed so necessary.  Outside the OWL WG?  Not so much.)
> 
> So I was thinking we might frame it as:
> 
> Turtle Level 0 --- canonical n-triples
> Turtle Level 1 --- what we're now calling Turtle
> Turtle Level 2 --- something like Trig that's a superset of Turtle

- I am fine with something like Turtle Level 0 but I think that in the subtitle the term N-Triples should appear, so that current N-Triple users would not feel disenfranchised.

- I must admit I am not convinced, at this moment, about the necessity of this canonical N-Triples. What is the real use case we are trying to solve? I have not seen major complaints in the community about that. Any programming environment these days have some sort of a regular expression library; describing, via this regular expression, the way to split a line in N-Triples into the three constituents does not look like a very difficult operation to me. 

(In some cases this might be even simpler. In Python, I believe that line.split() would just do the job; 'split' takes all those white space characters into account, merging them as one and then splitting the string into an array.)

At the moment, until I am convinced otherwise, I would prefer to simple separate N-Triples into a separate document (already done), call it, eg, 'Turtle Level 0 - formerly known as N-Triples', or something like that, and move on.


> 
> I'm not sure N-Triples as currently defined even needs a name in the new
> regime; it could be Level 0.1 I guess.
> 
> So, the problem with N-Quads is that it doesn't fit into this scheme.

Indeed:-(


> It's an extension to a subset, forking the neat sequence.  I dunno; just
> a thought.   There's a lot to be said for having some trivial quad
> syntax.
> 
> Another thought about canonical syntaxes: let's specify a single TAB
> between terms.  And we'll require any tabs inside strings be escaped in
> this canonical form.  That way a TSV parser will correctly put the terms
> into the right columns, even for N-Quads, where the graph name goes
> after the literal.   (I think I'd put a tab before the trailing dot, so
> the last field doesn't end up in the last column's data.)  I believe
> this gets us past grep(1) all the way to join(1) and friends (sort, cut,
> uniq, ...).   Not that I've used join(1) in the past 20 years....

I agree with Richard's comments. Requiring TAB over other whitespaces might be a serious issue in some editors.

Ivan


> 
> (FWIW I'm not adamant about quads being in the spec; I think they make
> it easier to read and talk about and think about.)
> 
>     -- Sandro
> 
>> Secondly, it does not mean we have to give quads as first class items in 
>> the extended data model.
>> 
>> N-Quads-the-format can be defined by:
>> 
>> <s> <p> <o> <g> .
>> 
>> is just a way of saying triple <s> <p> <o> in space <g>.  That fits 
>> nicely into the way Turtle use state variables to explain parsing.
>> 
>> We do not strictly need to define a quad and then define how it is 
>> associated with a graph pair - just do it in one step.
>> 
>> It's a matter of simplicity - if quads are defined as a first class 
>> concept, we have to keep the dataset-based part of the specs in step 
>> with the quads-based parts (e.g. the empty graph case) .  c.f. MT and 
>> the rules.
>> 
>> SPARQL Query does not mention quads.
>> 
>> SPARQL syntax does for update (it's a rule name in the grammar)
>> 
>> SPARQL Update uses this as explanation for templates in the form
>> { ... GRAPH .... } and constructs a dataset out of them.
>> 
>> The definition of Graph Store doesn't mention quads.
>> 
>> 	Andy
>> 
>> 
> 
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Wednesday, 30 May 2012 09:27:24 UTC