Re: Using bnode identifiers for predicates, graph names from Pat Hayes on 2013-02-06 (public-rdf-wg@w3.org from February 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Tue, 5 Feb 2013 23:19:23 -0600
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: public-rdf-wg@w3.org
Message-Id: <0663011E-FB1F-44B2-B239-1637A9854DEA@ihmc.us>
On Feb 5, 2013, at 8:27 PM, Manu Sporny wrote:

> On 02/05/2013 05:27 PM, Pat Hayes wrote:
>>>>> 
>>>>> Could we introduce the concept of a 'blank graph identifier'?
>>>> 
>>>> Sure - (1) create your own URI scheme or (2) a systematic way to
>>>> generate UUIDs based on doc and position of the graph in the
>>>> doc
>>> 
>>> If we do #1 or #2, we're basically going to re-invent blank node 
>>> identifiers.
>> 
>> No, if you do #1, you will be using IRIs, which is in conformance 
>> with (the letter of the) specs. So why not just do that? Its not hard
>> to do: just use a short alphanumeric string instead of "_" before the
>> colon.
> 
> Okay, in this case, we are going to use "_g:". Would that be okay?

I'm not sure, but I think you would have to escape the _ character in an IRI. Why not use, say, graph:1, graph:2, etc? I *think* those are legal IRIs according to RFC3987. Nobody else on the planet, I am going to guess, will have yet registered the IRI scheme "graph", and if they have, then invent something else. Since you invented it, you own it, and you can specify how it is supposed to be used. OK, so using your IRI's "locally" is not cool, but you are free to be uncool, seems to me. 

> 
>> Why are you making so much fuss about it?
> 
> _g: is effectively a blank-node-like identifier. We could generalize
> this and merge blank graph identifiers and blank node identifiers into a
> unified concept called the "dataset-local identifier". I think that
> would be a cleaner design than the "_g:" hack.

I think that would be a disaster, because it does not make sense. First, the basic RDF graph model does not in fact have bnodeIDs: it has blank nodes, which are, well, blank.  Second, you are wanting to use these things as actual labels, but bnodeIDs aren't labels in the same sense at all. Any syntax that used bnodeIDs to identify anything that has a syntax of its own, is not using bnoideIDs properly. BnodeIDs don't really identify, they co-identify. The only point of using something like _:x is so that when you use it again, it means that those two uses are the same node. It doesn't identify something *else*, such as a graph.

> 
> More importantly, "_g:" is not just going to be the solution we use for
> the Web Payments / PaySwarm work, but it will be a part of another W3C
> specification called "RDF Dataset Normalization", which explains how to
> normalize an RDF Dataset, assigning blank graph identifiers to graphs
> that were not provided a name by their producers.

How can they not have a name? The specification of an RDF dataset requires that all the graphs in it (other than the default graph) have names. In fact, that they are given names in the form of IRIs. And this is in the SPARQL spec, not the RDF spec, by the way.

> So, this decision will
> bleed from JSON-LD, into RDF Dataset Normalization.
> 
> I want the RDF WG to be fully aware of the ramifications of their
> decisions (or non-decisions) before we march forward and implement
> things in this way.

Seems to me that defining your algorithm so as to conform to the existing specs is the best way you could proceeed in order to keep everything harmonious and working together. 

> 
> I'm also making a fuss about it because JSON-LD is being unnecessarily
> limited by having to align too closely to the RDF data model. I'd be
> just as happy if this is somewhere that we didn't have to align with the
> RDF data model, but that just kicks the can down the road because when
> we go to serialize these blank graph identifiers, we're going to
> serialize them to something NQuads like... and so NQuads will have to
> support blank graph identifiers at that point as well. So, the decision
> will bleed from RDF Dataset Normalization to NQuads. It won't stop
> there, because when RDFa Next comes along, it too will absorb the
> ability to use blank graph identifiers since both JSON-LD, RDF Dataset
> Normalization, and NQuads supports it.
> 
> I'm making a fuss to try to make sure that the RDF WG makes a good
> design decision here, both for the data model and for developers. :)

Seems to me that all of these very good points, but that their moral is more bearing on your design decision. It is seems obvious that to align with the existing standards by using IRIs (perhaps using creatively, but using them) is the only viable design choice for you, and that to introduce a new notion, which breaks at least three existing specifications for no particularly good reason, would be seriously irresponsible. 

Frankly, the idea of "blank graph identifiers" does not make sense.  The idea of local graph IDs does make sense, but (in spite of TimBL's writings) IRIs are used in contextual ways all over the Web. You can take this and run with it, while working within the legal syntactic rules so that software does not break. If you do use something like graph:1, etc., and the normalization gets widely used, then no sane person is going to screw things up by using the 'graph' domain for anything else, so everything will work fine. Feels like a no-brainer to me. 

Pat

> 
> -- manu
> 
> -- 
> Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
> Founder/CEO - Digital Bazaar, Inc.
> blog: Aaron Swartz, PaySwarm, and Academic Journals
> http://manu.sporny.org/2013/payswarm-journals/
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 6 February 2013 05:19:56 UTC