RE: Representing NULL in RDF

Hi,

You may not know a persons date/time of birth, but if they were born this can be a property of their linked data. If you know a persons age to a given level of accuracy at a specific time, then you can derive bounds for their date/time of birth and provide a likelihood value for that date/time of birth. Similarly, people once they die have a date/time of death. 

There is a way of querying linked data to select/filter all living people that may be of a specific age range at a specific time. This can be done both using null values or by implicit null of no property being specified.

I think for entities which are defined as having specific attributes, then it is better to have null values in the RDF/XML when these are unknown as I think that makes the computation easier.

For triple store RDF then I think Hugh is right.

Of course we have to deal with uncertainty and mess in data about people as we often do not know things accurately for sure! Some times we might discover conflict such that increases our uncertainty about specific attributes.

Regards,

Andy

________________________________________
From: Hugh Glaser [hg@ecs.soton.ac.uk]
Sent: 04 June 2013 10:35
To: Jan Michelfeit
Cc: <public-lod@w3.org>
Subject: Re: Representing NULL in RDF

If there is a "*standard or generally accepted*" way of doing things, then, as has been pointed out, it is to ignore it.
Or rather the norm is that NULL (and "unknown" and anything else like - I'll use NULL for shorthand) that is ignored, and doesn't generate a triple.
In fact it is really important to do so, as NULL most often simply represents that the value is not known, in my experience.
Making a triple in such situations is one of the RDF101 basic mistakes, as I'm sure you know, since it causes all sorts of sensible queries to do very strange things.
For example, if the field is a person's age, then it would mean that a simple query asking for people of the same age as someone of unknown age would give you all the other people whose ages were not known.

If you are in a generic world where you cannot bring any extra information to the table, then this is all you can do.

Beyond that, I think that you have to ask exactly what is meant (as you do) and then model it.
Basically, is there something that is being said by the NULL, and if so, how should that be captured in RDF?
So your
> 4. The value is withheld, e.g., when the data consumer is not allowed to access it.

should be a "visibility" or "privacy" triple.
I think this may be what you are doing in (3) below, but I have some concerns about the way you do it there.
Similarly for others such as
> 2. The value is unknown, i.e., it should be there but we don't know it.
which is where you ask the question of whether you want to represent that someone's age is actually missing, with a triple.

You need to ask what the new property should be attached to.
It is an important question whether it should be "part of" the value itself.
So, for a "visibility" triple, it may be more that the subject of the row is having the property withheld than the value is a nonVisibleValue.
It is the person's foaf:givenName that is not being recorded, not some property of a field from a DB.
There are patterns in various domains that try to tackle these sorts of problems - in programming languages it is similar to the problem of returning an exception instead of a value, and things like Union types can get used.
But remember that you want things to be easy to query for the most basic question, and it is likely that you want to simply have a triple that says
:foo foaf:givenName "Jan"
which is what a user expects.
That then allows
SELECT ?name WHERE { :foo foaf:givenName ?name }
In fact, if you have things like your :nullableValue construct, then you can't use predicates such as foaf:givenName at all, since the domain/range constraints are bad (I think).

Of course you may well find that there is another field in the DB that actually has the information already, and is being transformed into RDF as well, in which case the NULL field can simply be discarded.

I think for these two I would just leave them without a triple:
> 1. The value is not applicable, i.e. property p does not exist or does not make sense in the context.
> 3. The value doesn't exist, i.e. the property doesn't have a value (e.g. year of death for a person alive).

I don't think I would go of into RDFS and OWL specifically to capture things - it is likely that the DB is simply modelling things in an unclear way, and the challenge of transforming to RDF is to work out what the fuzziness was and shine a light on it.
Remember that the purpose of the whole exercise is to construct some RDF that is easy to query - or at least I hope that is the purpose!
So not having triples for things that don't have values is good.
And having triples that give more information about things is also good, as they are very easy to query.
In fact, using RDFS and OWL for what is likely to be simple stuff from a DB is only likely to provide checking at assertion, and not add anything easy to querying - and since you are transforming from a DB, it is likely that the data you are transforming is well-formed.

Finally, I know this generates controversy, but I would always avoid bnodes if it is possible/sensible to do - generating a URI is not hard, and can be useful in the long run. In your example, you could just as easily say "Use a node to give more details about the questioned value."

Sorry, I've gone on a bit, but I just went with the flow!

Best
Hugh

On 3 Jun 2013, at 22:39, Jan Michelfeit <michelfeit.jan@gmail.com>
 wrote:

> Hi,
> thank you all for your answers.
>
>> ... One "represents" a null by failing to include the relationship
>> ... RDF semantics make no assumptions about what the absence of a proposition/statement means
>
> I agree. The question was actually about *distinguishing* between the mentioned cases.
>
>> From your suggestions and a quite comprehensive answer at SO [1], I see these solutions:
>
> (1) Use ontology to specify proper constraints. This may be cardinality of the questioned property or, as suggested by Phillip, assertion "that anything with a year of death is necessarily a dead person".
>
> (2) Use an RDF container and possibly rdf:nil (thanks to Barry and Robert for his example) .
>
> (3) Use a blank node to give more details about the questioned value. Examle [2]:
>   :foo :aProp [a :nullableValue; rdf:value "value"] ;
>        :bProp [a :nullableValue; :reason :notAvailable ]
>
> Regards,
> Jan
>
> [1] http://stackoverflow.com/a/16889273/2032064
> [2] http://stackoverflow.com/a/16898786/2032064
>

Received on Tuesday, 4 June 2013 10:25:36 UTC