Use case for tidy literal subjects (ignore if not interested)

During the datatyping followup after Friday's telecon, I made
the statement that if literals were subjects which took datatyping
properties, they must be untidy -- and I was told this was incorrect.

Here is a use case which illustrates that I was, in fact, correct:

I presume that literals can be subjects both in the graph and
that this is supported by the serialization (RDF/XML, N3, etc.)

Consider the following two graphs, defined separately, which are
subsequently merged, and all literal nodes are tidy and hence
the same.

  Bob age "30" .
  "30" rdf:dtype decimalInt .

and separately, elsewhere

  WidgetX gspec "30" .
  "30" rdf:dtype octalInt .

and then a query on the merge of the two graphs
that asks what Bob's age is

  Bob age ?y .

where ?y is bound to "30".

Hmmmm, says the app. I need a value, let's get the
datatyping information for that literal:

  ?y rdf:dtype ?x .

where ?x is bound both to decimalInt and octalInt.

Well, the app doesn't recognize the decimalInt datatype
but does know about octalInt, so it can use that to get
to a value, thus the pairing ("30", octalInt) = 36, so

  Bob's age = 36

Wrong!

--

Now, if literals were untidy, we'd get the following
similar, though more useful scenario

  Bob age _:1:"30" .
  _:1:"30" rdf:dtype decimalInt .

  WidgetX gspec _:2:"30" .
  _:2:"30" rdf:dtype octalInt .

and then a query that asks what Bob's age is

  Bob age ?y .

where ?y is bound to _:1"30".

Hmmmm, says the app. I need a value, let's get the
datatyping information for that literal:

  _:1:"30" rdf:dtype ?x .

where ?x is bound only to decimalInt.

Well, again, the app doesn't recognize the decimalInt datatype
so it exits with a message that it is unnable to determine the
actual value of Bob's age, but it was at least able to determine
that it is at least whatever value is denoted by the pairing
("30", decimalInt).

Now, that's both more useful *and* more reliable.

And of course, if the application does know what a
decimalInt datatype is, then it can get the value
reliably -- the correct value -- with no risk of
misinterpretation due to loss of datatyping context
because literals are tidy.

Eh?

--

Saying "this literal *might* be used to denote a decimalInt"
is virtually useless. So what. How does that help any application
that needs to know *what* it actually denotes in a given
context? It doesn't. It's useless knowledge in nearly all
cases.

Thus, I assert again, if literals are to be subjects that
bear datatyping -- or any other occurrence/context specific
knowledge -- then they *must* be untidy. If they are tidy,
then the *only* knowledge that can be attached to them is
trivial knowledge about the nature of the string itself, and
nothing about its meaning, use, significance, etc.

That's not saying that tidy literals can't, technically, be
subjects -- but if they are tidy, they have little to no
utility as subjects. It becomes simply an academic exercise
with little to no value for actual implementations, *and*
if it is not clear that all properties defined for a literal
are global, not specific to any context, logical errors
such as the first case above can still arise, thus the
anemic utility of tidy literal subjects must be made crystal
clear in some primer, spec, etc.

Hopefully the above clearly communicates what I was unnable
to communicate during the concall.

Cheers,

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com

Received on Saturday, 16 February 2002 03:36:41 UTC