Re: n-quads & Turtle Levels from Sandro Hawke on 2012-05-30 (public-rdf-wg@w3.org from May 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 30 May 2012 07:20:40 -0400
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-wg@w3.org
Message-ID: <1338376840.2332.224.camel@waldron>
On Wed, 2012-05-30 at 07:17 -0400, Sandro Hawke wrote:
> On Wed, 2012-05-30 at 10:27 +0100, Andy Seaborne wrote:
> > On 30/05/12 00:52, Sandro Hawke wrote:
> > > On Mon, 2012-05-28 at 14:01 +0100, Andy Seaborne wrote:
> > >>
> > >> On 28/05/12 13:11, Ivan Herman wrote:
> > >>>> I don't see why. The only spec that has any reason to mention quads
> > >>>> is N-Quads. (Well, JSON-LD may too but it uses a definition that's
> > >>>> different from Sandro's.) Other uses of quads are implementation
> > >>>> strategies and those don't belong into the specs.
> > >>> Correct. My question was whether this WG would define NQuads as well
> > >>> or not. If we do define NQuads (and I do not believe this has been
> > >>> decided pro or con) then we have to properly define Quads and that in
> > >>> relations to any formalism we have on named graphs. If we decide that
> > >>> NQuads are not to be formally defined by this WG, then indeed this
> > >>> section may become unnecessary.
> > >>>
> > >>> Ivan
> > >>>
> > >>
> > >> Firstly, I think we really ought to define N-Quads; it's in use and
> > >> extending the N-Triples work to N-Quads is valuable.
> > >
> > > I thought so too -- which is why I wrote it up for the rdf-spaces
> > > document, but the discussion with Manu in the last telecon gave me
> > > second thoughts.
> > >
> > > He was arguing how bad it was to be proliferating syntaxes.
> > 
> > There are two facets to proliferation:
> > 
> > 1/ RDF/XML / RDFa / turtle syntaxes have no family relationship.
> > 
> > 2/ Turtle / N-Triples do have a family relationship (same DNA - IRIs and 
> > <....>, literals in long form are in common).
> > 
> > (and "we" expect Turtle for humans and N-Triples for dumps?)
> > 
> > > I'm very
> > > sensitive to his criticism: in the OWL WG, having OWL 2 QL, EL, and RL,
> > > with the Direct and RDF-Based Semantics, ... it all made so much sense
> > > and seemed so necessary.  Outside the OWL WG?  Not so much.)
> > >
> > > So I was thinking we might frame it as:
> > >
> > > Turtle Level 0 --- canonical n-triples
> > > Turtle Level 1 --- what we're now calling Turtle
> > > Turtle Level 2 --- something like Trig that's a superset of Turtle
> > 
> > A dataset is a set { default graph , (IRI, graph) }
> > 
> > A graph is not a dataset in the same way a triple or an IRI is not a graph.
> 
> I happen to disagree with this. 

I should read better, sorry.

I disagree with the "IRI" analogy there, but I'm happy with the "triple"
one.   A triple is very much like a graph, to me.

    -- Sandro

> I think the relationship is much closer
> to   graph : dataset :: file : tar file
> 
> Or better:     graph : dataset  ::   character : string
> 
> In some languages (eg C, C++) the type for character and string are
> completely different.  In others (Python, Javascript), there are no
> characters -- you just use strings of length 1.     
> 
> I think the Python/JS approach works in RDF APIs and languages, too,
> saying that when you want to work with a graph, just use a dataset that
> doesn't happen to have any named graphs.
> 
> It's with these glasses that I think a turtle document can/should just
> be an instance of our multiple-graph syntax which doesn't happen to have
> any named graphs.
>  
> > > I'm not sure N-Triples as currently defined even needs a name in the new
> > > regime; it could be Level 0.1 I guess.
> > 
> > N-triples is a dump format that systems like to use.  It is used, it has 
> > utility.  It needs a name - it has a name - and it needs a content type.
> > 
> > > So, the problem with N-Quads is that it doesn't fit into this scheme.
> > > It's an extension to a subset, forking the neat sequence.  I dunno; just
> > > a thought.   There's a lot to be said for having some trivial quad
> > > syntax.
> > >
> > > Another thought about canonical syntaxes: let's specify a single TAB
> > > between terms.  And we'll require any tabs inside strings be escaped in
> > > this canonical form.  That way a TSV parser will correctly put the terms
> > > into the right columns, even for N-Quads, where the graph name goes
> > > after the literal.   (I think I'd put a tab before the trailing dot, so
> > > the last field doesn't end up in the last column's data.)  I believe
> > > this gets us past grep(1) all the way to join(1) and friends (sort, cut,
> > > uniq, ...).   Not that I've used join(1) in the past 20 years....
> > 
> > I agree with Richard.
> > 
> > And I would add it invalidates existing data for no benefit to users.
> >
> > While grep etc exist, are they the tools of choice of a majority of RDF 
> > applications?  I doubt it - or rather I hope not.
> 
> I thought I heard consensus or near-consensus that we should define a
> canonical form of N-Triples.
> 
> I guess not.
> 
>    -- Sandro
> 
> 
> 
>
Received on Wednesday, 30 May 2012 11:20:55 UTC