W3C home > Mailing lists > Public > www-rdf-interest@w3.org > June 2002

Re: Datatype question

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Tue, 25 Jun 2002 13:25:49 +0300
To: ext Geoff Chappell <geoff@sover.net>, RDF Interest <www-rdf-interest@w3.org>
Message-ID: <B93E1FDD.17531%patrick.stickler@nokia.com>

On 2002-06-25 13:22, "ext Geoff Chappell" <geoff@sover.net> wrote:

> ----- Original Message -----
> From: "Patrick Stickler" <patrick.stickler@nokia.com>
> To: "ext Geoff Chappell" <geoff@sover.net>; "RDF Interest"
> <www-rdf-interest@w3.org>
> Sent: Tuesday, June 25, 2002 3:10 AM
> Subject: Re: Datatype question
>> On 2002-06-24 17:07, "ext Geoff Chappell" <geoff@sover.net> wrote:
>>> ----- Original Message -----
>>> From: "Patrick Stickler" <patrick.stickler@nokia.com>
>>> To: "ext Geoff Chappell" <geoff@sover.net>; "RDF Interest"
>>> <www-rdf-interest@w3.org>
>>> Sent: Monday, June 24, 2002 9:03 AM
>>> Subject: Re: Datatype question
>>> [.......]
>>>>> I can see the value of the untidy literal approach to datatyping. I do
>>> think
>>>>> though, there is a practical impementation advantage to tidy literals
>>> (which
>>>>> admittedly may not outweight the cost of keeping them).
>>>> There is no such practical implementation advantage to tidy literals.
> You
>>>> can, in your implementation, employ tidy literal nodes in the triples
>>>> store, so long as you preserve the semantic untidyness. There are
> numerous
>>>> ways to optimize storage of untidy literals. So no worries there.
>>> it's not the storage I'm concerned about (because as you say that's easy
> to
>>> deal with). Saying "There is no such practical implementation advantage
> to
>>> tidy literals" is equivalent to saying there is no practical
> implementation
>>> adantage to using anything but bnodes/existential variables to identify
>>> resources - i.e. it disassociates nearly completely identity of the
> denoted
>>> object with its label and relies upon additional information to
> establish
>>> identity.
>>> It's one thing to say that multiple names may refer to the same
>>> object, it's something else entirely to say that the same name can refer
> to
>>> multiple objects.
>> Exactly. That's the point. Literals (IMO) are contextual labels. They
>> are interpreted within the context of some datatype.
>> Literals are not global constants. That is what URIrefs are for.
> Well they can be if they denote themselves; they only can't if we're using
> them as names to refer to other things. Don't misunderstand me, I can see
> the appeal of using them as names because that's clearly what we do in
> everyday use. But (IMO) it is a significant change to RDF - systems designed
> to use unambiguous names will have to be redesigned to deal with ambiguous
> ones. 

This is true. Though one could also argue that since RDF does not in fact
tell folks how to do datatyping, such systems are simply employing
proprietary interpretations on top of RDF -- and when standards evolve,
so must systems.

> And that's all being driven by the desire to support the inline idiom
> (age x "10"), right?

Which happens to be the most widely used and intuitive idiom
by far. So I consider its support in RDF datatyping to be an
absolute requirement (as do many others).

> insert another node and literals are just strings again
> instead of names. My original post was just exploring whether this one
> special, albeit common, case could be dealt with in other ways.
>> Do you really expect the literal "1984" in all the following cases to
>> refer to the same value?
>>    ABook title "1984" .        ("1984")
>>    OurTown population "1984" . (1984, decimal encoding)
>>    Widget productCode "1984" . (6532, hexidecimal encoding)
>>    Bob yearOfBirth "1984" .    (calendar year 1984)
>> i.e.
>>    title rdfs:range xsd:string .
>>    population rdfs:range xsd:integer .
>>    productCode rdfs:range xyz:hexInt .
>>    yearOfBirth rdfs:range xsd:gYear .
>>> It's the cost of "preserv[ing] the semantic untidyness"
>>> that I.m concerned about
>> Well, as Jeremy Carroll has so
>> accurately pointed out: there's untidyness in there somewhere.
>> I.e., those literals *do* mean different things. They are not
>> global constants.
>>> because in many implementation it results in
>>> cross-product behavior followed by functional equality testing to winnow
> the
>>> values.
>>> I'm sure it's not insurmountable but I think it's fair to say there
>>> will be a measurable cost.
>> I'm not sure I fully follow what you mean here.
> Sorry, I was rushing when I wrote it. My only point was that queries with
> multiple conditions are more efficient if those conditions have common
> bindings - e.g.  I'd rather be waiting for my system to process "{?a ?b ?c}
> and {?c ?d ?e}"  than "{?a ?b ?c} and {?d ?e ?f} and
> somefunc(?c)=somefunc(?d)". I realize that even in a tidy world we'd often
> end up with this pattern

I'd say it would be the rule, rather than the exception, with the rare
exceptions being (a) string datatyped values and (b) datatypes with only
canonical lexical representations, and only for equality comparisons.

Otherwise, you're going to have to rely on an RDF-external function.

> - i.e. some types of values would require case
> insensitve comparisons, canonicalized comparisons, etc. to be useful. Of
> course not all literals are representations of datatype values. Some are
> just strings. For example a description is just a description. Changes made
> to literals to support datatyping also of course have an effect upon these
> "plain" literals.

No. They are also datatype value. The datatype simply happens to correspond
to the set of Unicode strings. E.g. xsd:string.

>> As an aside, the WG should be publishing some detailed documents
>> about datatyping very soon, so rather than simply re-iterate
>> what is already explained in detail in the WD, and also that
>> which is currently under debate by the WG, I'll ask you to
>> wait just a bit for all the gory details.
> Yeah, I know I'm jumping the gun. Thanks for taking the time to reply
> anyway.

You're welcome. I, and many others, are very eager to see these
remaining points of debate resolved and a WD published, and
are working very hard to see that happen as soon as possible.

And of course, you can always have a look at the archives for
the Core WG mailing list for up to the moment thrills ;-)



Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com
Received on Tuesday, 25 June 2002 06:21:21 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:07:41 UTC