Re: Input sought on datatyping tradeoff

I'm coming to this very late on, but would like to cast my vote in favour
of tidy literals. (Disclaimer: I'm an RDF novice, so I haven't yet had
time to immerse myself in all the ramifications of RDF model theory.)

If the W3C were to adopt untidy literals, as far as I can see that would
make an untidy-literal graph like

    <Jenny> <ageInYears> "10"

effectively non-ground, equivalent to one of the following tidy-literal
graphs:

 1. <Jenny> <ageInYears> "10"
    <ageInYears> <rdfs:range> :_xxx

 2. <Jenny> <ageInYears> :_a
    _:a _:yyy "10"
 
Yes, I know, you can't really do the latter in current RDF: you can have
blank nodes but not blank arcs. But I hope you see what I mean, at least
on an intuitive level.

Of course, the intuitive meaning of the <ageInYears> property differs
slightly between the two tidy-literal graphs above. The first graph can
be paraphrased as:

 1a. <Jenny> has <ageInYears> "10"
     and <ageInYears> has an <rdfs:range>
     (i.e., there are some limitations on what its `value' can be)

 2a. <Jenny> has an <ageInYears>
     which has some property with `value' "10"

(In the first case the `value' of the <ageInYears> property is a literal
value, while in the second case the `value' of the property is a
resource - which might well represent a number.)

So to resolve uses of untidy literals, we need to include additional
triples in our graphs _and_ we need to choose between these two idioms
(or choose something else):

 I. <Jenny> <ageInYears> "10"
    <ageInYears> <rdfs:range> <xsd:decimal>

II. <Jenny> <ageInYears> _:a
    _:a <xsdr:decimal> "10"

In Test D, the property <ageInYears> appears in both of these idioms and
it seems rash to conclude that John and Jenny have the same age. It's
not impossible, but it seemingly requires you to bring in a whole load
of datatyping baggage into the core of the model theory.

(If instead of the peoples' ages, you were asking about the `values' of
the <ageInYears> property for the nodes <Jenny> and <John>, then clearly
they are different, because one is a literal value, while the other is
a resource. That's probably not what was intended.)

In Test A, on the other hand, we can certainly say (given tidy literals)
that <Jenny> and <John> have the same value for an <ageInYears> property
(the literal value "10"). We can conclude that they share such a value,
but not that `Jenny and John have the same age', because (a) two
different ages might have the same representation as a literal with
respect to the <ageInYears> property, and (b) there's nothing in the
graph to say that a resource can only have one <ageInYears> property.

Similarly for Tests A2 and A3. In the latter, given tidy literals, we
can conclude that the resource <Jenny> has an <ageInYears> property with
a literal value that is the same as that of a <title> property of the
resource <Film>. We can't conclude much more than that. Using untidy
literals, we could conclude nothing about the values of the properties
without additional datatyping information.

I agree that

    <Jenny> <ageInYears> _:a
    _:a <xsdr:decimal> "10"

(along with its XML serialization) is a very reasonable way to express
numerical information (with tidy literals). But I don't think there's
any way in which it's reasonable to make

    <Jenny> <ageInYears> "10"

compatible with this. Of course, if one person uses

    <Jenny> <Age> _:a
    _:a <xsdr:decimal> "10"

and another person uses

    <Jenny> <DecimalAge> "10"

then that's fine - we can reconcile the two _different_ properties
without difficulty. There's nothing wrong, in the tidy-literal world,
with the second form - the datatype information is inherent in the
<DecimalAge> property itself. To write

    <Jenny> <DecimalAge> _:a
    _:a <xsdr:decimal> "10"

or 

    <Jenny> <Age> "10"

would of course be untrustworthy nonsense, given the agreed usage.

Sandy Nicholson

Received on Monday, 22 July 2002 09:54:39 UTC