Re: Pragmatics of Blank Nodes Re: Toward easier RDF: a proposal from Henry Story on 2018-12-06 (semantic-web@w3.org from December 2018)

From: Henry Story <henry.story@bblfish.net>
Date: Thu, 6 Dec 2018 11:53:32 +0100
To: Andy Seaborne <andy@seaborne.org>
Cc: Semantic Web <semantic-web@w3.org>
Message-Id: <CA6DC651-1803-4800-8E0E-7DA01A2C42DF@bblfish.net>
> On 5 Dec 2018, at 19:28, Andy Seaborne <andy@seaborne.org> wrote:
> 
> 
> 
> On 05/12/2018 04:13, Patrick J Hayes wrote:
>>> On Dec 4, 2018, at 9:55 PM, David Booth <david@dbooth.org> wrote:
>>> 
>>> Hi Pat,
>>> 
>>> On 12/4/18 7:31 PM, Patrick J Hayes wrote:
>>>>> On Dec 4, 2018, at 2:30 PM, David Booth <david@dbooth.org> wrote:
>>>>> 
>>>>> On 12/3/18 8:29 AM, Henry Story wrote:
>>>>>> . . .  So what are the advantages of blank nodes
>>>>>> pragmatically? They make a description local to the graph
>>>>>> in which they appear and this locality is maintained
>>>>>> across merges. The meaning of URI referenced resources can
>>>>>> be completed by external information of course but the
>>>>>> description ensures that no further links need to be taken
>>>>>> into account when understanding the bnode's meaning. So it
>>>>>> looks like it's ideal for things that need to be entirely
>>>>>> defined by description.
>>>> OR that cannot be *defined* at all, which is closer to the
>>>> original idea. Henry, why would you assume that everything
>>>> that can be mentioned, can also be /defined/?
>>>>> 
>>>>> Interesting point!   That means that blank nodes enjoy a
>>>>> form of closed world assumption (CWA),
>>>> 
>>>> No. That is exactly the kind of mistake that one gets into
>>>> by being too loose with words like 'define'.
>>>> 
>>>>> in that there *cannot* be any other triples asserted
>>>>> (directly) about a blank node, other than the ones already
>>>>> in the document/graph/dataset at hand.  (Inference could
>>>>> add some though.)
>>>> 
>>>> Yes, it certainly could, if one has access to something
>>>> like OWL.
>>>>> 
>>>>> Of course, if we are dealing with implicit blank nodes --
>>>>> the ones generated by [] or () notation in Turtle -- then
>>>>> it's even more obvious that the only property connections
>>>>> to/from that blank node are the ones provided right there
>>>> 
>>>> Inference can add extra triples to those also.
>>> 
>>> Yes, of course.
>>> 
>>>> Suppose for example you know that the property rdf:rest is funcitonal and you know that x:A rdf:rest _:x ., and someone
>>>> tells you that
>>>> x:A rdf:rest _:y .
>>>> _:y x:Q x:C .
>>>> then you know know that  _:x owl:sameAs _:y ., and hence that _:x x:Q x:C .
>>>> Now, someone might argue that such cases are vanishingly rare, or even that they shouldn’t be allowed or encouraged, but that would be a different argument.
>>>>> 
>>>>> This brings me to an interesting question.  To rephrase, the "identity" of a blank node object is determined entirely by the identities of its connected nodes, because it is guaranteed to not have any other connections.
>>>> It isn't, if we allow inferences.
>>> 
>>> Certainly we must allow inferences.  However, the results of inference constitute a different graph: the original graph + the entailments.
>>> 
>>> I put "identity" in quotes above because what I mean is the identify of that node *within* the graph, i.e., a name that allows us to distinguish that node from other nodes in the graph.  I am *not* referring to "all information known/knowable about that node", or "the properties of the node", or any other grand notion of identity like that.  I am talking about identity in the context of blank node labeling, in which the goal is to have a standard algorithm for labeling each blank node.
>>> 
>>>>> Therefore, a blank node labeling algorithm (or standard
>>>>> Skolemization algorithm) only needs to take into account the
>>>>> subgraph of that blank node's tightly connected neighbors.
>>>>> By "tightly connected" I mean the subgraph that is connected
>>>>> only through consecutive blank nodes.  (I think this may
>>>>> be slightly different from the Concise Bounded Description
>>>>> (CBD), because the CBD starts only with the *subject*
>>>>> of a triple.)  https://www.w3.org/Submission/CBD/
>>>>> Aiden (or someone else), is this correct?  If so, this would
>>>>> be very beneficial, because the labeling algorithm could
>>>>> then be guaranteed to generate the *same* label (or Skolem
>>>>> URI) for the blank nodes in that subgraph, regardless of any
>>>>> larger graph in which that subgraph appears.  This is very
>>>>> pertinent to n-ary relations, because it means that blank
>>>>> nodes for the same n-ary relation, appearing in different
>>>>> RDF graphs, could be automatically given the *same* label (or
>>>>> Skolem URI) -- even without knowing a key for that object.
>>>> That would be a wildly invalid conclusion. The coding of an n-ary atomic sentence into binary RDF basically says
>>>> that an 'event' (or a 'fact', or 'situation', or)  exists
>>>> which represents the fact of the relation holding between
>>>> the participants. So my hitting a wall with a hammer (a
>>>> three-place relation) might be encoded as a bnode of type
>>>> hitting with an agent being me and an object being the wall
>>>> and the means being the hammer. But there might be a whole
>>>> lot of hits of that wall with that hammer by me. You can't
>>>> infer that the many bnodes which encode various assertions
>>>> of this kind are all the same single entity with a single
>>>> global identifier: for one thing, that would imply that I
>>>> only hit the wall once.
>>> 
>>> No, it would imply that you hit the wall at *least* once.
>>> Asserting the same thing multiple times does *not* imply
>>> that it happened more than once.  It is logically equivalent
>>> to asserting it once, right?  So if these two statement groups
>>> appear in a graph:
>>> 
>>>  [ a :Hit ; :by :hammer ; :agent :pat ; :target :wall ] .
>>>  [ a :Hit ; :by :hammer ; :agent :pat ; :target :wall ] .
>>> 
>>> then they are logically equivalent to a single (lean) statement group:
>>> 
>>>  [ a :Hit ; :by :hammer ; :agent :pat ; :target :wall ] .
>>> 
>>> and hence they can share the same blank node.  Correct?
> 
> lean graphs are all very well until update happens.  New information arrives that breaks the equivalence.
> 
> For an "easier RDF", talking about how the graph is built seems quite natural.
> 
> Leaning has a place at the point of publishing (maybe).

Why could not the RDF library implement bondes as a triple 

   type BNode = GraphID × LocalNodeId × Lean 

which could of course be done efficiently with 

   type GraphId=Long 
   type LocalNodeId=Int or Long
   type Lean=Boolean

where Lean would be a flag that the node was calculated as lean as 
described as I understand it by the algorithms detailed in 
"Everything you always wanted to know about blank nodes"
https://www.sciencedirect.com/science/article/pii/S1570826814000481

?

> 
>> Yes, you are absolutely right. And I was wrong, above. (I bow graciously and remove my hat.)  Though if you have both copies, which is what ‘share’ suggests, then your graph is still non-lean. It would be better to just keep one copy, and have a lean graph.
>>>  And if that blank node is Skolemized, then they can share the same Skolem URI.  Correct?
>> Yes, with same comment about ’share’.
>> Pat
>>> 
>>> David Booth
Received on Thursday, 6 December 2018 10:53:59 UTC