Re: Using bnode identifiers for predicates, graph names from Manu Sporny on 2013-02-05 (public-rdf-wg@w3.org from February 2013)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Tue, 05 Feb 2013 15:20:42 -0500
To: public-rdf-wg@w3.org
Message-ID: <5111699A.2010509@digitalbazaar.com>
On 02/02/2013 02:44 PM, Andy Seaborne wrote:
>> On 01/31/2013 06:41 AM, Sandro Hawke wrote:
>>> How is using bnodes to identify graphs any more absurd than
>>> using them to identify people (the canonical example)?    Blank
>>> nodes make prefect logical sense as local (file scope)
>>> identifiers. They are clearly useful.
> 
> bnodes don't identify people : bnode + IFP or bnode + some
> properties identify.
> 
> [] foaf:name  "Andy" .
> 
> is not uniquely me.  It can be anyone called "Andy"

I think what Sandro is saying is that you can use this sort of
identifier to refer to a bag of properties in the same way that you can
use this sort of identifier to refer to a bag of nodes.

At some conceptual level, they're effectively the same thing. The only
thing that doesn't make them the same thing right now is the language in
the RDF Concepts document, which seems to be a bit misguided.

>> [ { "@context": "http://example.org/mycontext.jsonld", "@graph": { 
>> "name": "Sandro" } }, { "@context": 
>> "http://example.org/mycontext.jsonld", "@graph": { "name": "Pat" } 
>> } ]
> 
> Tangent: how do you know that is 2 graphs, and not 2 fragment of one
>  graph?

Yes, good question. We've always interpreted this as two different blank
graphs that are NOT the default graph. If the array only held one
member, it /could/ refer to the default graph, but I don't think we've
ever made a decision on that.

It doesn't make sense to look at the markup above and state that one of
them is the default graph and another one is a blank graph, because in
that case, which one is the default graph.

You could also interpret the markup above as both referring to the
default graph. However, even if you do that, it doesn't escape the
question of whether or not blank-node-like identifiers make sense for
graph names.

>> We need to digitally sign the document via the RDF Graph 
>> Normalization algorithm and generate something like this to 
>> digitally sign:
>> 
>> _:bnode1 <http://schema.org/name> "Pat" _:graph1 . _:bnode2 
>> <http://schema.org/name> "Sandro" _:graph2 .
> 
> How do you sign the document with bnode subjects and objects without
>  the same issue? (presumably by label and a deterministic 
> allocation).

Exactly. One of the purposes of the RDF Graph Normalization Algorithm,
which really should be renamed to the RDF Dataset Normalization
Algorithm, is to deterministically label blank nodes and blank graphs.

> (the Normalization document isn't REC track is it?)

No, it isn't. We didn't have enough time to get that done and get the
REC-track JSON-LD 1.0 documents done. The algorithm is done, it's going
in a production financial system in the next few weeks, it's just not
clearly documented.

>> However, now we can't name it _:graph1, or anything else like
>> that, right?
> 
> Internally (within a JSON LD processor), you can call them "graph1" 
> and "graph2" if you want does not have to a bnode, bnode label, 
> string or even RDF thing of any kind.

Internally we can do whatever we want, yes. However, the problem is with
the quad representation of a blank graph identifier, because that is the
byte stream that is digitally signed.

> It will/will not round trip with signing any more or less than bnode 
> subjects do.

With the RDF Graph Normalization Algorithm as it stands right now, bnode
subjects are labeled deterministically, and do round-trip.

> Do you need to name them at all for signing?

Yes, because the quad output, which is signed, needs /something/ in the
graph position.

> Why not sign the two graphs, and combine the signings?

See the e-mail response to Gavin:

http://lists.w3.org/Archives/Public/public-rdf-wg/2013Feb/0011.html

> if order matters (and then it's not an RDF Dataset anyway), add a 
> counter to the combine step

I don't think that works. The whole problem is that order doesn't
matter, yet you need to deterministically order the graph based on the
content of each node, and linkages between nodes and graphs.

> Is it different to the same doc with the graphs reversed in the JSON 
> array?

Well, by definition, yes :). A document that contains the same
information, but doesn't include names for the graphs, would be a
different document. I'm afraid that doesn't answer your question,
though, does it?

>> So we need to come up with another naming scheme that is 
>> deterministic and it needs to match an IRI. It seems kind of 
>> strange to introduce a mechanism for something that is already 
>> basically there.
> 
> This seems to be the heart of it : bnodes don't match an IRI (see 
> above)

Yes, that's one of the problems. The other problem is that RDF Concepts
goes to great lengths to state that a blank node identifier can only be
used to label a blank node.

I think we should generalize blank node identifier to something that can
be used to label nodes or graphs that are local to the document.

> You seem to want more than document scoped labels - you want labels 
> that are stable across multiple parses of the document.

I don't think that's a requirement of RDF Concepts or JSON-LD. I do want
document-scoped labels that can be applied to nodes and graphs. I don't
expect those labels to be stable across multiple parses of the document
by an RDF serialization processor.

I do expect the RDF Dataset Normalization algorithm to generate stable
document-scoped labels across multiple runs on the same Dataset.

>> Even stranger, the IRI that will be generated will inevitably 
>> conflict with some other normalized graph IRI because it isn't 
>> scoped to the document. These identifiers need to be scoped to the
>>  document if the RDF graph normalization algorithm is going to work
>>  fairly cleanly.
>> 
>> Could we introduce the concept of a 'blank graph identifier'?
> 
> Sure - (1) create your own URI scheme or (2) a systematic way to 
> generate UUIDs based on doc and position of the graph in the doc

If we do #1 or #2, we're basically going to re-invent blank node
identifiers. We are then going to standardize the re-invention of blank
node identifiers by pushing the RDF Dataset Normalization document to
REC through W3C.

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
President/CEO - Digital Bazaar, Inc.
blog: Aaron Swartz, PaySwarm, and Academic Journals
http://manu.sporny.org/2013/payswarm-journals/
Received on Tuesday, 5 February 2013 20:21:12 UTC