Re: Datatype question

On 2002-06-24 17:07, "ext Geoff Chappell" <geoff@sover.net> wrote:

> 
> ----- Original Message -----
> From: "Patrick Stickler" <patrick.stickler@nokia.com>
> To: "ext Geoff Chappell" <geoff@sover.net>; "RDF Interest"
> <www-rdf-interest@w3.org>
> Sent: Monday, June 24, 2002 9:03 AM
> Subject: Re: Datatype question
> 
> 
> [.......]
>>> I can see the value of the untidy literal approach to datatyping. I do
> think
>>> though, there is a practical impementation advantage to tidy literals
> (which
>>> admittedly may not outweight the cost of keeping them).
>> 
>> There is no such practical implementation advantage to tidy literals. You
>> can, in your implementation, employ tidy literal nodes in the triples
>> store, so long as you preserve the semantic untidyness. There are numerous
>> ways to optimize storage of untidy literals. So no worries there.
>> 
> 
> it's not the storage I'm concerned about (because as you say that's easy to
> deal with). Saying "There is no such practical implementation advantage to
> tidy literals" is equivalent to saying there is no practical implementation
> adantage to using anything but bnodes/existential variables to identify
> resources - i.e. it disassociates nearly completely identity of the denoted
> object with its label and relies upon additional information to establish
> identity. 
>
> It's one thing to say that multiple names may refer to the same
> object, it's something else entirely to say that the same name can refer to
> multiple objects.


Exactly. That's the point. Literals (IMO) are contextual labels. They
are interpreted within the context of some datatype.

Literals are not global constants. That is what URIrefs are for.

Do you really expect the literal "1984" in all the following cases to
refer to the same value?

   ABook title "1984" .        ("1984")
   OurTown population "1984" . (1984, decimal encoding)
   Widget productCode "1984" . (6532, hexidecimal encoding)
   Bob yearOfBirth "1984" .    (calendar year 1984)

i.e.

   title rdfs:range xsd:string .
   population rdfs:range xsd:integer .
   productCode rdfs:range xyz:hexInt .
   yearOfBirth rdfs:range xsd:gYear .

> It's the cost of "preserv[ing] the semantic untidyness"
> that I.m concerned about

Well, as Jeremy Carroll has so
accurately pointed out: there's untidyness in there somewhere.
I.e., those literals *do* mean different things. They are not
global constants.

> because in many implementation it results in
> cross-product behavior followed by functional equality testing to winnow the
> values. 
>
> I'm sure it's not insurmountable but I think it's fair to say there
> will be a measurable cost.

I'm not sure I fully follow what you mean here.

The requirements/cost for comparing datatyped values will be the same
whether literals are tidy or untidy. This is because (a) lexical
forms are not constrained to be canonical. I.e. the integer value
5 can be represented by an infinite number of lexical forms
("5", "05", "5.0", "5.0000...", etc.) and thus, simple string
comparison will never ensure that two values are not actually
equivalent, even if they are not string equal. And (b) value
comparisons must be made externally to RDF by applications which
have full knowledge of the datatypes in question. RDF does not
define datatypes. Datatypes are fully opaque at the RDF level.
All RDF can provide is an association between a lexical form
(literal) and the datatype according to which it should be
interpreted. And the definition of what a datatype is tells
us that for any given lexical form, for a particular datatype,
that lexical form maps to one and only one value.

What untidy literals does give us, is an explicit denotation
of that single value to which a given lexical form maps to
(even though we can't know exactly which value that is at the RDF
level). And thus, in conjunction with an external application
with full knowledge of datatypes, equivalence relations between
value denotations can be determined and expressed in terms of
the RDF graph. And since the same lexical form can map to different
values according to different datatypes (per the examples above)
that denotation cannot be tidy, if RDF is to accomodate the
semantic untidyness which is inherent in literals.

Now, we *could* say that insofar as RDF is concerned, all those
property values are just strings -- and leave it up to applications
to deduce what is meant by them. But that forgoes the ability
to define equivalence relations between acutual values, since
the values may not have any denotation in the graph (as would
be the case with the inline idiom).

I'd rather have my RDF be explicit about the meaning of the literal
rather than leaving up to some external application to guess what
is meant (and possibly get it wrong). Of course, this view is
not necessarily shared by everyone.

As an aside, the WG should be publishing some detailed documents
about datatyping very soon, so rather than simply re-iterate
what is already explained in detail in the WD, and also that
which is currently under debate by the WG, I'll ask you to
wait just a bit for all the gory details.

Cheers,

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com

Received on Tuesday, 25 June 2002 03:05:35 UTC