Re: Representing NULL in RDF

On Jun 4, 2013, at 5:31 AM, Jan Michelfeit wrote:

> Hi,
>> NULL most often simply represents that the value is not known, in my experience
> So another conclusion of this discussion can be that unknown is the most sensible default interpretation if the triple is not there and there is no indication of the other cases.
>> I think that you have to ask exactly what is meant and then model it. ... the purpose of the whole exercise is to construct some RDF that is easy to query
> My original motivation was actually not modelling such situation, but rather interpreting data from unknown sources.
> An example: Let's have a paper ex:paper. Source A claims "ex:paper ex:reviewedBy ex:Hugh". Source B doesn't have any triple "ex:paper ex:reviewedBy *".
> Now I want to integrate the two sources. Shall the result be "the paper was reviewed by Hugh" or "we are not certain whether the paper's been reviewed because source B says it has not been".

Well, certainly not the second, as B does not say it has not been reviewed. B simply makes no assertion about the reviewing. In an open world, it is not correct to infer "source says not X" from "source does not say X". If you want to be able to say that a paper is not reviewed, then (since RDF does not provide you with a built-in "not") you have to find or invent a way to say that explicitly. You might for example use OWL-style reasoning and have a class of papers reviewed by Hugh (it would be the owl:hasValue ex:Hugh restriction on ex:reviewedBy) and then B can say that this paper is not in that class, by asserting that its in the complement class. (ex:paper rdf:type (owl:complementOf (owl:hasValue ex:reviewedBy ex:Hugh))). But admittedly this might be overkill for your example application. 

As to how best to combine data from multiple sources even when they might disagree, this is a problem for everyone. But your example would seem to be straighforward in the RDF view of things, as your A and B don't actually disagree: A just provides some information that B is lacking. One expects such things to happen in an open world. 

>> it may be more that the subject of the row is having the property withheld than the value is a nonVisibleValue.
>> you may well find that there is another field in the DB that actually has the information already
> Answers in this list have been helpful. The conclusion for me is:
> (1) Don't look just at triples alone, but traverse blank nodes [1], they may bear important information.

Um...blank nodes are parts of triples. I'm not sure what you are intending to say here. 

> (2) Dependencies between properties should be considered.
> (3) Conflict resolution should also consider sets of values. In the example above, I would conclude "paper was reviewed"; if ex:reviewedBy was modelled with an RDF collection of reviewers, one empty and one non-empty, I would conclude "we don't know".

Why would you come to that conclusion? 

>> I would always avoid bnodes if it is possible/sensible to do - generating a URI is not hard, and can be useful in the long run.
> BNodes would be actually useful in my particular use case. There are the only thing which can distinguish an "entity" from a "structured attribute" if we don't know anything about the source.

Again, I don't know what you are talking about, but whatever it is, blank nodes don't sound like they have anything to do with it. They don't distinguish one kind of thing from another, for sure. 


> Regards,
> Jan
> [1]

IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile

Received on Wednesday, 5 June 2013 05:57:56 UTC