Re: n-quads & Turtle Levels from Andy Seaborne on 2012-05-30 (public-rdf-wg@w3.org from May 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Wed, 30 May 2012 10:27:01 +0100
To: Sandro Hawke <sandro@w3.org>
CC: public-rdf-wg@w3.org
Message-ID: <4FC5E7E5.7030101@epimorphics.com>
On 30/05/12 00:52, Sandro Hawke wrote:
> On Mon, 2012-05-28 at 14:01 +0100, Andy Seaborne wrote:
>>
>> On 28/05/12 13:11, Ivan Herman wrote:
>>>> I don't see why. The only spec that has any reason to mention quads
>>>> is N-Quads. (Well, JSON-LD may too but it uses a definition that's
>>>> different from Sandro's.) Other uses of quads are implementation
>>>> strategies and those don't belong into the specs.
>>> Correct. My question was whether this WG would define NQuads as well
>>> or not. If we do define NQuads (and I do not believe this has been
>>> decided pro or con) then we have to properly define Quads and that in
>>> relations to any formalism we have on named graphs. If we decide that
>>> NQuads are not to be formally defined by this WG, then indeed this
>>> section may become unnecessary.
>>>
>>> Ivan
>>>
>>
>> Firstly, I think we really ought to define N-Quads; it's in use and
>> extending the N-Triples work to N-Quads is valuable.
>
> I thought so too -- which is why I wrote it up for the rdf-spaces
> document, but the discussion with Manu in the last telecon gave me
> second thoughts.
>
> He was arguing how bad it was to be proliferating syntaxes.

There are two facets to proliferation:

1/ RDF/XML / RDFa / turtle syntaxes have no family relationship.

2/ Turtle / N-Triples do have a family relationship (same DNA - IRIs and 
<....>, literals in long form are in common).

(and "we" expect Turtle for humans and N-Triples for dumps?)

> I'm very
> sensitive to his criticism: in the OWL WG, having OWL 2 QL, EL, and RL,
> with the Direct and RDF-Based Semantics, ... it all made so much sense
> and seemed so necessary.  Outside the OWL WG?  Not so much.)
>
> So I was thinking we might frame it as:
>
> Turtle Level 0 --- canonical n-triples
> Turtle Level 1 --- what we're now calling Turtle
> Turtle Level 2 --- something like Trig that's a superset of Turtle

A dataset is a set { default graph , (IRI, graph) }

A graph is not a dataset in the same way a triple or an IRI is not a graph.

> I'm not sure N-Triples as currently defined even needs a name in the new
> regime; it could be Level 0.1 I guess.

N-triples is a dump format that systems like to use.  It is used, it has 
utility.  It needs a name - it has a name - and it needs a content type.

> So, the problem with N-Quads is that it doesn't fit into this scheme.
> It's an extension to a subset, forking the neat sequence.  I dunno; just
> a thought.   There's a lot to be said for having some trivial quad
> syntax.
>
> Another thought about canonical syntaxes: let's specify a single TAB
> between terms.  And we'll require any tabs inside strings be escaped in
> this canonical form.  That way a TSV parser will correctly put the terms
> into the right columns, even for N-Quads, where the graph name goes
> after the literal.   (I think I'd put a tab before the trailing dot, so
> the last field doesn't end up in the last column's data.)  I believe
> this gets us past grep(1) all the way to join(1) and friends (sort, cut,
> uniq, ...).   Not that I've used join(1) in the past 20 years....

I agree with Richard.

And I would add it invalidates existing data for no benefit to users.

While grep etc exist, are they the tools of choice of a majority of RDF 
applications?  I doubt it - or rather I hope not.

 Andy
Received on Wednesday, 30 May 2012 09:27:47 UTC