Re: Representing NULL in RDF from Hugh Glaser on 2013-06-04 (public-lod@w3.org from June 2013)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Tue, 4 Jun 2013 10:51:57 +0000
To: Jan Michelfeit <michelfeit.jan@gmail.com>
CC: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <14A1C9D5-EFDC-4362-8883-7A506969D7A5@soton.ac.uk>
Ji Jan,
On 4 Jun 2013, at 11:31, Jan Michelfeit <michelfeit.jan@gmail.com>
 wrote:

> Hi,
> 
>> NULL most often simply represents that the value is not known, in my experience
> 
> So another conclusion of this discussion can be that unknown is the most sensible default interpretation if the triple is not there and there is no indication of the other cases.
If what you are saying is that a NULL in the DB results in no triple, then yes.
> 
>> I think that you have to ask exactly what is meant and then model it. ... the purpose of the whole exercise is to construct some RDF that is easy to query
> 
> My original motivation was actually not modelling such situation, but rather interpreting data from unknown sources.
> 
> An example: Let's have a paper ex:paper. Source A claims "ex:paper ex:reviewedBy ex:Hugh". Source B doesn't have any triple "ex:paper ex:reviewedBy *".
> Now I want to integrate the two sources. Shall the result be "the paper was reviewed by Hugh" or "we are not certain whether the paper's been reviewed because source B says it has not been".
This is not the RDF way.
Source B definitely does not say the paper has not been reviewed - it simply remains silent on the matter.
Source A says "the paper was reviewed by Hugh" - if you are including Source A, then your conclusion is that "the paper was reviewed by Hugh".
In fact, nothing (simple) that Source B says can contradict that.
And any queries of the two sources together will give you the same answer as just Source A.
> 
>> it may be more that the subject of the row is having the property withheld than the value is a nonVisibleValue.
>> you may well find that there is another field in the DB that actually has the information already
> 
> Answers in this list have been helpful. The conclusion for me is:
> (1) Don't look just at triples alone, but traverse blank nodes [1], they may bear important information.
I don't quite see this in the context of your original message.
If you are telling the consumers of the RDF you generate that they should do this, then fair enough.
By the way, bnodes are also parts of triples, so there is nothing special about traversing them.
What you seem to be saying is that you should explore the graph around a resource to see if there is any important information, which is of course true.
> (2) Dependencies between properties should be considered.
> (3) Conflict resolution should also consider sets of values. In the example above, I would conclude "paper was reviewed"; if ex:reviewedBy was modelled with an RDF collection of reviewers, one empty and one non-empty, I would conclude "we don't know".
As I say, this is contrary to the RDF way, to put it mildly.
You should conclude that there is at least one reviewer.

I think there is a problem here which is common to the move from DB to RDF.
It is necessary to sometimes step back from the DB and actually ask what the DB is trying to represent at a more abstract level - a naive translation will just produce poor (that is hard to consume) RDF.
In this case you seem to be trying to infer things (various things at different places) from the presence or otherwise of a value in a triple.
RDF just doesn't work like that, especially when combining sources, as you describe.
> 
>> I would always avoid bnodes if it is possible/sensible to do - generating a URI is not hard, and can be useful in the long run.
> 
> BNodes would be actually useful in my particular use case. There are the only thing which can distinguish an "entity" from a "structured attribute" if we don't know anything about the source.
But you can often make up a URI for the bnode.

Good fun!
Best
Huh
> 
> Regards,
> Jan
> 
> [1] http://www.w3.org/Submission/CBD/
Received on Tuesday, 4 June 2013 10:53:38 UTC