Re: reconsidering: blank nodes as named-graph labels from Pat Hayes on 2013-05-11 (public-rdf-wg@w3.org from May 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Sat, 11 May 2013 12:59:18 -0500
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-wg@w3.org
Message-Id: <8CF778CF-DE5E-4939-B56F-6CB3F3772064@ihmc.us>
Well, I can't speak to all the SPARQL complications, but as far as the semantics are concerned, it is easy to fix. The use of a bnodeID as a graph label has the semantic specification that that bnode always denotes the graph it labels, in any satisfying interpretation. Which means in practice that these bnodes are more like IRIs than like other bnodes, semantically speaking, except that (of course) their scope would be local to the dataset. They are Local Resource Identifiers, LRIs rather than IRIs, if you like. 

Other comments inline below.

Pat

On May 11, 2013, at 10:51 AM, Andy Seaborne wrote:

> LDP has not committed to use a TriG-ish format.  It's one possibility and this particular variant has some issues, raised before, that this proposal ignores.
> 
> Why not use a restricted SPARQL update?
> 
> DELETE DATA { .... }
> INSERT DATA { .... }
> 
> A big practical advantage is that a mix of DELETE/INSERT can be done and has the obvious meaning of applying in order given.
> 
> The TriG design requires that all changes are known before the TriG is written (client) or processed (server).  At scale, this c an be a large burden.
> 
> A (DELETE|INSERT)* style can be created by recording changes as they happen - so it scales both at the client and at the server.  The declarative nature of the TriG design is a practical disadvantage here.
> 
> Restricted SPARQL update opens to more of SPARQL if the LDP-WG or an LDP-engine so chooses.  Useful ones being CLEAR and the shorthand form DELETE WHERE { }.
> 
> On 11/05/13 04:08, Pat Hayes wrote:
>> I entirely agree.  I also note that SPARQL engines can surely just
>> treat the bnodeIDs as if they were skolemized IRIs, and nothing would
>> break. All that matters in either case is the ability to check
>> identity of identifiers.
> 
> Won't such systems would have to skolemize all bNodes - a bnode can be be used in the graph data as well as be used for a graph label.

Yes, but my point is that there is no need to actually generate the skolem IRIs. (Seriously: why would a processor care whether an identifier starts with "_" or not? )

> 
> And have to do it all the time because an incoming document could have the manifest last (seems really quite sensible after all the data is known about):
> 
> -----------------------------------------------------
> @prefix ldp: <http://www.w3.org/ns/ndp#> .
> 
> <#i1> {  ... triples to add ... uses _:b0 ... }
> 
> _:b0 { ... triples to delete ... }
> 
> <#i1> {  ... more triples to add ... }
> 
> _:b0 { ... more triples to delete ... but before any inserts ... }
> 
> {
> [] a ldp:Patch
>   ldp:delete _:b0;
>   ldp:insert <#i1>.
> }
> -----------------------------------------------------
> 
> so it does not know which are deletes and which are inserts until the end nor whether it skolemization is necessary so it has to do it all the time.

I don't really follow this, but I can't see why having bnodes would be any different than doing the same thing using IRI labels. (Except of course that there woujld actually be a semantics behind it, whereas with our current WG decisions, using an IRI inside one graph and also as a graph label means that the two uses are semantically unrelated and have nothing to do with one another.)

> If there are additional syntactic restrictions on the TriG (e.g. exactly two graph blocks, manifest first) then it's not helpful to use TriG.
> 
>>> At the last LDP F2F we talked about it and the group was
>>> overwhelmingly in favor of a dataset-based design.  They're very
>>> happy with the idea of patches that look something like this:
>>> 
>>> prefix ldp: <http://www.w3.org/ns/ndp#>
>>> # ... application data prefixes ...
>>> 
>>> prefix ldp: <http://www.w3.org/ns/ndp#>
>>> # ... application data prefixes ...
>>> 
>>> [] a ldp:Patch
>>>    ldp:delete <#d1>;
>>>    ldp:insert <#i1>.
> 
> This is not valid TriG.
> 
>>> 
>>> <#d1> { ... triples to delete ... }
>>> <#i1> {  ... triples to add ... }
>>> 
>>> So I've been working out the details for how to do that, and mostly
>>> I think it'll work great.
> 
> 
>>> Thinking about why we decided against blank nodes, the main thing I
>>> believe was the SPARQL spec says that in datasets the labels are
>>> IRIs.   I think it's not a huge problem to live with two different
>>> kinds of datasets like this.
> >> It would mean some compliant SPARQL
>>> systems can only handle SPARQL 1.1 datasets, not full RDF
>>> Datasets.    People who wanted to use blank node graph names in
>>> SPARQL 1.1 would have to either lobby to get that extension put into
>>> their favorite SPARQL system (some have it already),
> 
> Which ones?
> 
>>> or they'd have
>>> to make do with Skolemization.   That's a bit painful, but the
>>> alternative is to require every client who wants this functionality
>>> (even non-SPARQL LDP ones) to Skolemize or psuedo-Skolemize with a
>>> UUID; that seems even more painful.
> 
> As has been pointed out, some systems do specific optimisations knowing that a position can only be a URI (Jena does not; 4Store & 5Store were mentioned).  

The actual syntactic difference between IRIs and bnodeIDs seems almost trivial. How hard would it be to adapt code used for the former to include both? 

But in any case, this kind of argument is a dead hand on any changes. Its bad enough when someone says, our implementation requires that you don't change X, but when they say, our *optimization* requires that you don't change X, it time to push back. 

> You seen to have skipped that bit and other concerns.
> 
> As I recall it, not allowing bNodes also means we don't have to fact impossible (future) formal semantics and its that area that means the safer, restricted choice.

There are no formal semantic problems. Allowing bnodeIDs as graph labels does not change the current semantics of RDF, it just adds one new constraint. Its a semantic extension, just like RDFS or D-entailment. 

Pat

> 
> 	Andy
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Saturday, 11 May 2013 17:59:50 UTC