Re: Blank Nodes Re: Toward easier RDF: a proposal from David Booth on 2018-11-30 (semantic-web@w3.org from November 2018)

From: David Booth <david@dbooth.org>
Date: Thu, 29 Nov 2018 20:07:48 -0500
To: W3C Semantic Web IG <semantic-web@w3.org>
Cc: Nathan Rixham <nathan@webr3.org>
Message-ID: <c03bf4e6-3ccf-3caa-97ac-7f98d8bf7db5@dbooth.org>

On 11/28/18 11:14 AM, Nathan Rixham wrote:
> . . . if we referred to 
> this address thing as an unidentified object, and looked in our 
> databases, documents, code, apis, we'd find a huge portion of them are 
> comprised of these unidentified objects, where the set of property value 
> pairs is their identity, an identity that's good enough for purpose.

Yes, exactly.  Their properties form a composite key.  In my experience 
this is *very* common, especially because it very advisable to use 
properties that uniquely identify each thing, just as it is very 
advisable in relational tables to have a primary key -- possibly 
composite -- for every table.

> Under this unidentified object scenario, to be considering identifiers 
> for unidentified things seems like a strange question, as the whole 
> point is that it's unidentified.

But it *is* identified, by its properties that form a composite key. 
The author just didn't bother to give it an explicit URI.

> Realistically, saying we require everything to have a 
> name/identifier/uri is just a no go. Immediate real world first 
> responses would be (a) invalid rdf as the IDs would be ommitted, or (b) 
> encoding of objects in strings as string values, as in a chunk of json 
> or xml frag in a string property.

I think there's a middle-way possibility here though.  I agree that we 
don't want to burden users with explicitly creating URIs for everything. 
  But it could be helpful for the tooling to do this under the hood.

> Now, IMHO there's merit in generating IDs for bnodes, but behind the 
> interface not over wire, for use in canonicalization or storage engines 
> or code - *not* in a serialized document sent between parties. 

Exactly.  The user should not normally see them.

On 11/29/18 4:14 PM, Nathan Rixham wrote:
 > . . . as soon as we start skolemizing, throwing away redundant
 > nodes becomes a great deal more complex.

Not if the RDF indicates the (composite) key for each object.  If the 
key is known it becomes *easier* to collapse redundant nodes if URIs are 
used, because those nodes will have the same URI (assuming they are 
predictably generated based on the key).

To summarize, if conventions for n-ary relations allow the user to 
conveniently indicate which properties constitute a (composite) key -- 
perhaps defaulting to all properties -- then in theory tools could use 
that information to automatically collapse duplicate nodes, whether they 
use blank nodes or URIs.  But if this is done with URIs that are 
predictably generated from those keys -- instead of blank nodes -- then 
we get the advantage that existing tools *already* will collapse them, 
whereas they wouldn't if blank nodes are used.

David Booth

Received on Friday, 30 November 2018 01:08:11 UTC