Re: Input sought on datatyping tradeoff

[Drew McDermott]

> I know it is futile to make the point at this late date, but the whole
> farcical question stems from the fact that RDF (and XML, and SGML, the
> whole ridiculous lineage) have no syntax for string literals.  If
> everything is a string, then nothing is a string.  The problem could
> be solved very simply if every literal found in an RDF file belonged to
> at most one literal class (some apparent literals being ill-formed,
> and hence not belonging to any).  That would require strings to be
> indicated in some explicit way.  Hey, about quotes?
>
> The test cases would all be handled thus:
>
>    Test A:
>
>       <Jenny> <ageInYears> "10" .
>       <John>  <ageInYears> "10" .
>
>    Should an RDF processor conclude that the value of the ageInYears
>    properties for Jenny and John are the same?
>
> Yes, because "10" would be an integer, without ambiguity.
>

Right, it means the decimal number 2.  Oops, didn't you know I meant binary?

The above notwithstanding, I think Drew has been saying the most cogent
things here.

[From another post of Drew's]

>    Now add a common super property for all the dc properties:
>
>       dc:property rdf:type        rdf:Property .
>       dc:title rdfs:subPropertyOf dc:property .
>       dc:date  rdfs:subPropertyOf dc:property .
>
>    This now entails:
>
>    [and here I edit a little, possibly completely missing Brian's point]
>
>       _:a dc:property "4th July"
>       _:b dc:property "4th July"

Yes, I thought of this and didn't know how to resolve it, but decided to
stir the pot by posting it anyway.  I had suggested that two literals that
are objects of different kinds of predicates should not be comparable.  Drew
apparently shoots that full of holes, but on a deeper level it is the same
issue - underspecified facts.  __We__ all know that those predicates are
essentially different, but DC doesn't enough much information to infer that
based on the given triples.  Whether the predicates or the literals or both
are underspecified, the problem is about the same.

I am sure that we do not want to mandate lots of triples to disambiguate
literals, so another solution is needed.  I agree with Jonathan that string
literals must compare equal (in the absence of other information).

It seems to me that we either need to use different lexical and value
comparisons (a la Borden), or better specificity in the literals (a la
McDermott) and very possibly both.  The question is, where should that
capability reside?  Certainly at the query level, wouldn't you say?  If so,
then rdfs should be able to supply enough information to support those kinds
of queries.

With this view, the kinds of questions asked by Brian should be replaced by
a different group of questions, along the lines of "what kinds of
comparisons in queries do we wish to support", and "should this support be
provided at the rdf, rdfs, or some other level?".  As part of the answers,
it might turn out that a particular convention should added to rdf, such as
"use lexical identity by default, unless the value types can be determined".

Cheers,

Tom P

Received on Friday, 12 July 2002 21:05:39 UTC