Re: representing null in semantic frameworks from Frank Manola on 2007-10-20 (semantic-web@w3.org from October 2007)

From: Frank Manola <fmanola@acm.org>
Date: Sat, 20 Oct 2007 16:44:12 -0400
To: Garret Wilson <garret@globalmentor.com>
Cc: Story Henry <henry.story@bblfish.net>, Semantic Web <semantic-web@w3.org>
Message-Id: <92D67566-0652-4C80-A072-5EC8F94F733F@acm.org>
Garret--

On Oct 20, 2007, at 4:01 PM, Garret Wilson wrote:

> Frank,
>
> Frank Manola wrote:
>>
>> Not exactly.  rdf:nil doesn't represent "no list at all", it  
>> represents the empty list (see RDF Semantics).
>
> Oops, my bad. The name threw me off, and I made an assumption... :)
>
>
>>  I'm not trying to play with words here.
>
> No, of course not---I understand exactly. I was mistaken.
>
>> I know what you're getting at, but a resource that represents no  
>> resource sounds a little odd.  I mean, it *is* a resource, right?   
>> It really needs to mean something other than "no resource".
>
> First, let me expand my disclaimer---I have no agenda regarding  
> null, or even well-thought opinions regarding null, other than I  
> know that if I represent an instance graph of, say, Java or  
> JavaScript objects then at some point I'll have to deal with them.  
> (Someone said offline that the very concept of null is fraught with  
> multiple meanings and hidden issues, and I suspect that's the  
> case.) Hence my request for any references of this discussion  
> having already taken place. If it hasn't, and people want to  
> discuss it here, that's great---I appreciate the input.
>
> As to what you say above: Perhaps null really is *not* a resource.  
> For example, IEEE 754 floating point numbers (which appear  
> throughout computing---even in RDF through the use of xsd:float as  
> a typed literal datatype) has the concept of a floating point value  
> that is *not* a number (which is why they are called NaNs). In  
> fact, NaN does not even equal itself. Perhaps it needs to be  
> something built into the very framework itself representing "no  
> resource at all". (I don't know; I'm contemplating.)

I understand, but notice that NaN isn't really an example of the kind  
of general null you're talking about (one that can be used with  
arbitrary properties, as in the case of relational nulls).  Rather,  
it's a *type-specific* (to floating point numbers) value that has a  
type-specific meaning, and where stuff like comparisons, what happens  
if they're used in further operations and statistical functions, etc.  
has been worked out.  It's possible to have a more sensible design  
discussion about using something like NaN (as opposed, say, to  
condition codes as indicating the results of operations) for just  
floating point numbers, as opposed to a null that can be used with  
arbitrary properties (and where the semantic nits that have to be  
dealt with may be very different from those in NaNs).

>
>>
>>
>> Well, as a practical matter in language design, let's work out our  
>> use cases a little more carefully :-)  Clearly, a simple null as  
>> the value of, say, ex:score isn't going to represent "there was no  
>> score that week because there was a tornado that canceled the  
>> game" right?  That's an awful lot of meaning to stuff into one  
>> little null!  So first off we're thinking in terms of a separate  
>> property or properties that describe, say, *why* the score is what  
>> it is.  After all, if the game were postponed due to the next  
>> inning starting beyond curfew (happens in Boston anyway) the score  
>> might be 5-5 but not final (the game would be resumed later), so  
>> we'd want to indicate that somehow.  On the other hand,  
>> considering the game status after 1/2 inning, the score might be  
>> visiting team zero, home team "no score" (on the broadcasts they  
>> say "coming to bat").  This sort of sounds like a null, but it  
>> isn't a general one, but rather one specialized for the type (this  
>> could also be handled with some kind of "game status"  
>> information).  All the proposed uses of nulls that I'm familiar  
>> with have similar complexities that come out when you look at them  
>> more carefully.  My general preference would be to deal explicitly  
>> with the potential specializations, rather than lumping a lot of  
>> semantics into a general null.  Again, the question of what  
>> exactly would this null *mean* has to be answered.
>
> I agree, and in general "what null means" must be specified by the  
> ontology using it. In my experience null is used in certain  
> programming languages to mean various things because, in *very*  
> strongly typed languages, all your other choices are taken because  
> they are legitimate return values. Maybe that's not as useful in a  
> semantic framework---maybe you can (and should) always create some  
> value that will represent all the choices, even error conditions  
> (we don't have a score for the game; no reason was given). Maybe  
> the only value of null in a semantic framework is when a  
> programming language object instance graph is being represented.
>
> But I for one would like to see semantic frameworks be more  
> strongly typed. That's one reason why we all go to great lengths to  
> specify ontologies using OWL---so that machines can expect what  
> types of values certain properties can take. My first impression is  
> that it might be useful to have some means of representing "we  
> expected a value here, but we don't have one---even if we don't  
> know why."

I'd like to see semantic frameworks be more strongly typed too (that  
is, in the sense that there's some way of specifying all the  
information I have about the situation to be modeled;  not  
necessarily that I want things as rigid as is sometimes implied by  
"strongly typed").  But I don't necessarily see that a general null  
that, in order to be used with arbitrary properties (taking values of  
arbitrary types) might need to mean different things for different  
types, contributes to "strong typing".  It seems to me more likely to  
muddle things up. Take the example you just cited.  Sure, it would be  
useful to represent "we expected a value here, but we don't have  
one---even if we don't know why."  But once again, look at the  
details.  You can always represent "we don't have a value" simply by  
not having one (no triple).  A program can look at the data and  
deduce something from the fact that there's no value.  If you mean to  
distinguish variants of that ("we *expected* a value but don't have  
one"), trying to use a null for this seems awfully "hacked-up" (for  
instance, what do you do about "we *didn't* expect a value but have  
one"?)  If you want to model expectations, whether or not we know why  
a value does or doesn't exist (or other things about the nature of  
the value, like it's only approximate), it seems to me that the  
"strongly typed" (in the sense of providing more complete semantics)  
way to handle that type of thing is to explicitly handle it, rather  
than attaching all kinds of implicit (and therefore not transferable  
to others without pre-agreement) semantics to a null value.

--Frank


>
>
>>
>> I don't specifically recall a lot of discussion about this in RDF  
>> Core (someone else might have a better recollection, and perhaps  
>> there were earlier discussions in developing the 1999 specs).   
>> However, allow me to point you to the endless discussions (and the  
>> associated complexity) caused by allowing nulls in the relational  
>> data model (of which RDF can be considered a specialization).   
>> Simply Google "relational null" and have a grand time!
>
> I shall---thanks for the pointer!
>
> Best,
>
> Garret
Received on Saturday, 20 October 2007 20:44:30 UTC