Re: RDF-ISSUE-82 (TriG repeated graph iris): How should repeated graph iri labels be handled in TriG [RDF Turtle] from Sandro Hawke on 2012-01-04 (public-rdf-wg@w3.org from January 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 03 Jan 2012 21:36:16 -0500
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-wg@w3.org
Message-ID: <1325644576.2589.128.camel@waldron>
On Wed, 2011-12-21 at 18:57 +0000, Andy Seaborne wrote:
> 
> On 21/12/11 18:01, Gavin Carothers wrote:
> 
> Good issue.
> 
> > On Wed, Dec 21, 2011 at 9:53 AM, RDF Working Group Issue Tracker
> > <sysbot+tracker@w3.org>  wrote:
> >>
> >> RDF-ISSUE-82 (TriG repeated graph iris): How should repeated graph iri labels be handled in TriG [RDF Turtle]
> >>
> >> http://www.w3.org/2011/rdf-wg/track/issues/82
> >>
> >> Raised by: Gavin Carothers
> >> On product: RDF Turtle
> >>
> >> There are a number of ways of handling the case of multiple instances of a graph iri labelling a number graph statements.
> >>
> >> Sample TriG Document:
> >>
> >> @base<http://example.com/>
> >> <graph>  {<s>  <p>  <o>  . }
> >> <graph>  {<s2>  <p>  <o2>  . }
> >>
> >> 1) Disallowed (TriG input document behaviour)
> >>
> >> "In a TriG document a graph IRI must not be used to label more then one graph."
> >>
> >> Result: Parse Error
> >
> > This is my personal preference, and what the original TriG input
> > document said. A merge based syntax would be N-Quads which -has- to be
> > merge based. But this is not a strongly held opinion.
> 
> 0 - Tolerable.
> 
> >
> >>
> >> 2) Merge
> >>
> >> "In a TriG document graph statements with the same graph IRI should be merged to form a single RDF Graph."
> >>
> >> Result:
> >> @base<http://example.com/>
> >> <graph>  {<s>  <p>  <o>  .
> >>           <s2>  <p>  <o2>  . }
> >>
> >> Note: BlankNode labels in each graph statement would either result in shared blank nodes or independent blank nodes (??)
> >
> > Some implementations do this already.
> 
> +1
> 
> Yes :-) The Jena RIOT TriG parser does.  But it will change to whatever 
> the WG decides.
> 
> This does not affect the fact that one IRI labels on graph - each {} 
> block is a part of a graph.
> 
> The graph slot is setting the graph-label-slot in any quads generated.
> 
> This is my preferred design because:
> 
> 1/ Tracking state over a parser run limits scalability
>     A parser that did generate errors needs to track previous use of 
> label IRIs. (e.g. the error checking ids in RDF/XML).
> 
> 2/ Sometimes the data you want to write does not come in 
> graph-sorted-clumps and converting to a graph-grouped form leads to an 
> additional pass over the data before writing starts.
> 
> 3/ (extreme of 2)
> 
> <graph> { <s1> <p1> <o1> }
> <graph> { <s2> <p2> <o2> }
> <graph> { <s3> <p3> <o3> }
> 
> is a cheap syntax that is both TriG and single line.
> 
> 4/ It can be made to look neater: so if the default graph is the 
> manifest, producing like this:
> 
> <graph1> { <s1> <p1> <o1> }
> {
>     <event1> :seenAs "2012-12-06" ;
>              :observed <graph1> ;;
>              .... .
> }
> 
> <graph2> { <s2> <p2> <o2> }
> {
>     <event2> :seenAs "2012-12-25" ;
>              :observed <graph2> ;
>              .... .
> }
> 
> is convenient for placing the info near other stuff.
> 
> 
> 
>  >> Note: BlankNode labels in each graph statement would either result 
> in shared blank nodes or independent blank nodes (??)

That's pretty compelling, but I'm still waiting to see how the use cases
can even be addressed by something like TriG, before I get into details
like this.

(The use case we heard about last time -- shared crawling -- works fine
with TriG.  But what about others, like metadata about particular RDF
Graphs (not Graph Containers)?)

> My preference is blank node labels are scoped to the file because than 
> one graph can be a subgraph of another.

Yes.  I think we've got some clear use cases for this, including
explanations of reasoning, and dumping SPARQL datasets, which are
allowed to have bnodes shared between graphs...  I think.  (I'm told
they do that.  I'm having trouble figuring out where in the spec it
would say.)

    -- Sandro


> 
> >
> >>
> >> 3) Replace
> >>
> >> "Upon encountering a graph statement with the same graph IRI of another graph statement, the most recently parsed RDF Graph should replace the earlier one in the RDF Dataset."
> >>
> >> Result:
> >> @base<http://example.com/>
> >> <graph>  {<s2>  <p>  <o2>  . }
> >
> > I am unaware of any implementations that do replacement this way with TriG.
> 
> -1
> 
> Seems "unhelpful" and "confusing" at best. File order matters.
> 
> >>
> >> 4) Ignore
> >>
> >> "Graph statements with a repeated graph IRI are ignored. Only the first graph statement is added to the RDF Dataset."
> >>
> >> Result:
> >> @base<http://example.com/>
> >> <graph>  {<s>  <p>  <o>  . }
> >
> > While some implementations have done this from time to time, I'm
> > reasonably sure this was a BUG.
> 
> -1
> 
> Seems "unhelpful" at best.
> 
> 
> >
> >>
> >> 5) Document Decides
> >>
> >> Apply one of 1-4 on the basis of a directive "@policy". Default to Disallowed.
> >
> > Not really thrilled with the idea. But would allow Disallow and Merge
> > to co-exist. Default could go either way.
> 
> -1
> 
> Add complexity and cost (impl, testing; validation of data) insufficient 
> utility.
> 
>  Andy
> 
>
Received on Wednesday, 4 January 2012 02:36:28 UTC