Re: RDF-ISSUE-82 (TriG repeated graph iris): How should repeated graph iri labels be handled in TriG [RDF Turtle] from Steve Harris on 2011-12-21 (public-rdf-wg@w3.org from December 2011)

From: Steve Harris <steve.harris@garlik.com>
Date: Wed, 21 Dec 2011 23:33:21 +0000
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-wg@w3.org
Message-Id: <C16642D4-6C1E-4883-A926-589C9C82F5B3@garlik.com>
On 21 Dec 2011, at 18:57, Andy Seaborne wrote:
> 
> 
> On 21/12/11 18:01, Gavin Carothers wrote:
> 
> Good issue.
> 
>> On Wed, Dec 21, 2011 at 9:53 AM, RDF Working Group Issue Tracker
>> <sysbot+tracker@w3.org>  wrote:
>>> 
>>> RDF-ISSUE-82 (TriG repeated graph iris): How should repeated graph iri labels be handled in TriG [RDF Turtle]
>>> 
>>> http://www.w3.org/2011/rdf-wg/track/issues/82
>>> 
>>> Raised by: Gavin Carothers
>>> On product: RDF Turtle
>>> 
>>> There are a number of ways of handling the case of multiple instances of a graph iri labelling a number graph statements.
>>> 
>>> Sample TriG Document:
>>> 
>>> @base<http://example.com/>
>>> <graph>  {<s>  <p>  <o>  . }
>>> <graph>  {<s2>  <p>  <o2>  . }
>>> 
>>> 1) Disallowed (TriG input document behaviour)
>>> 
>>> "In a TriG document a graph IRI must not be used to label more then one graph."
>>> 
>>> Result: Parse Error
>> 
>> This is my personal preference, and what the original TriG input
>> document said. A merge based syntax would be N-Quads which -has- to be
>> merge based. But this is not a strongly held opinion.
> 
> 0 - Tolerable.
> 
>> 
>>> 
>>> 2) Merge
>>> 
>>> "In a TriG document graph statements with the same graph IRI should be merged to form a single RDF Graph."
>>> 
>>> Result:
>>> @base<http://example.com/>
>>> <graph>  {<s>  <p>  <o>  .
>>>          <s2>  <p>  <o2>  . }
>>> 
>>> Note: BlankNode labels in each graph statement would either result in shared blank nodes or independent blank nodes (??)
>> 
>> Some implementations do this already.
> 
> +1
> 
> Yes :-) The Jena RIOT TriG parser does.  But it will change to whatever the WG decides.

+1

I'm mildly confident that this is what 4store and 5store does too, but I'd have to check.

Could live with it being an error, the options below don't seem great.

I think I have a preference for blank node labels to be scoped to the { }, but I'd have to think about it, and open to being persuaded otherwise.

- Steve

> This does not affect the fact that one IRI labels on graph - each {} block is a part of a graph.
> 
> The graph slot is setting the graph-label-slot in any quads generated.
> 
> This is my preferred design because:
> 
> 1/ Tracking state over a parser run limits scalability
>   A parser that did generate errors needs to track previous use of label IRIs. (e.g. the error checking ids in RDF/XML).
> 
> 2/ Sometimes the data you want to write does not come in graph-sorted-clumps and converting to a graph-grouped form leads to an additional pass over the data before writing starts.
> 
> 3/ (extreme of 2)
> 
> <graph> { <s1> <p1> <o1> }
> <graph> { <s2> <p2> <o2> }
> <graph> { <s3> <p3> <o3> }
> 
> is a cheap syntax that is both TriG and single line.
> 
> 4/ It can be made to look neater: so if the default graph is the manifest, producing like this:
> 
> <graph1> { <s1> <p1> <o1> }
> {
>   <event1> :seenAs "2012-12-06" ;
>            :observed <graph1> ;;
>            .... .
> }
> 
> <graph2> { <s2> <p2> <o2> }
> {
>   <event2> :seenAs "2012-12-25" ;
>            :observed <graph2> ;
>            .... .
> }
> 
> is convenient for placing the info near other stuff.
> 
> 
> 
> >> Note: BlankNode labels in each graph statement would either result in shared blank nodes or independent blank nodes (??)
> 
> My preference is blank node labels are scoped to the file because than one graph can be a subgraph of another.
> 
> 
>> 
>>> 
>>> 3) Replace
>>> 
>>> "Upon encountering a graph statement with the same graph IRI of another graph statement, the most recently parsed RDF Graph should replace the earlier one in the RDF Dataset."
>>> 
>>> Result:
>>> @base<http://example.com/>
>>> <graph>  {<s2>  <p>  <o2>  . }
>> 
>> I am unaware of any implementations that do replacement this way with TriG.
> 
> -1
> 
> Seems "unhelpful" and "confusing" at best. File order matters.
> 
>>> 
>>> 4) Ignore
>>> 
>>> "Graph statements with a repeated graph IRI are ignored. Only the first graph statement is added to the RDF Dataset."
>>> 
>>> Result:
>>> @base<http://example.com/>
>>> <graph>  {<s>  <p>  <o>  . }
>> 
>> While some implementations have done this from time to time, I'm
>> reasonably sure this was a BUG.
> 
> -1
> 
> Seems "unhelpful" at best.
> 
> 
>> 
>>> 
>>> 5) Document Decides
>>> 
>>> Apply one of 1-4 on the basis of a directive "@policy". Default to Disallowed.
>> 
>> Not really thrilled with the idea. But would allow Disallow and Merge
>> to co-exist. Default could go either way.
> 
> -1
> 
> Add complexity and cost (impl, testing; validation of data) insufficient utility.
> 
> 	Andy
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Wednesday, 21 December 2011 23:33:54 UTC