Re: suggestions for datatyping (long) from Pat Hayes on 2001-11-01 (w3c-rdfcore-wg@w3.org from November 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Thu, 1 Nov 2001 11:00:29 -0600
To: Sergey Melnik <melnik@db.stanford.edu>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p0510100bb804d6c3dd39@[205.160.76.193]>
Sorry this response is late. I have been laid up with some kind of 
flu for a few days and am still only half functioning.

Pat
---------
>Pat Hayes wrote:
>>
>>  >
>>  >b) typing information can either be represented in an instance graph
>>  >only,
>>  >    in a schema graph only, or both.
>>
>>  Can you clarify this distinction? I wasn't aware that we had such a
>>  distinction in RDF (?)
>
>Recall the example from
>http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0343.html:
>
>               (John_Smith weight "160 1/8") goes together with a rule
>like
>                'if X is a person living in the US and (X weight Y),
>                 then Y is a "pieces-of-eight" number that gives weight
>in pounds'
>
>In this example, the information that "160 1/8" maps to an integer by
>means of some pieces-of-eight encoding is contained in a schema
>entirely. Part of this information can in principle appear in the
>instance graph.

OK, but I'm still confused. You referred in the previous message to a 
schema *graph* . Is that meant to be a kind of RDF graph? And what is 
an 'instance' graph?

>.....> >
>>  >My working understanding of the [XSD] document in terms of the current
>>  >model theory draft is that the elements of a lexical space are literal
>>  >values.
>>
>>  That is not mine. I would characterize literals as the lexical space
>>  and literal values as the value space. That is the working assumption
>>  behind the pfps/ph datatyping extension to the MT.
>
>Well, this is exactly what I don't like at all about it. By making
>literals Heroes with a Thousand Faces we make the life of a Desperate
>Perl Hacker (of the Life of Brian ;) who tries to model some domain
>quite tough. I think we can avoid this additional complexity.

Well, in fact I think it is easier to do things this way, in fact. It 
certainly keeps the graphs smaller, and they can be 'read' more 
naturally; all the datatyping information is in the triples somewhere 
(even if implicitly) and there is no need to use rdf:value, which 
embodies a use/mention confusion. If we could allow literals as 
subjects it would be consummately easy and transparent; a literal 
occurrence L would be of datatype T just in case  a triple of the form
L rdf:type T .
was in the rdfs closure of the graph.

>...> >
>>  >Translated into RDF terms, a data value corresponds to a bNode in a
>>  >graph.
>>
>>  I disagree. That begs several important questions, but in any case a
>>  bNode can denote any kind of value. Why would we want to say that
>>  bNodes *are* values?
>
>Ok, more precisely, a data value maps to I(some bNode), nothing said
>about the reverse direction.

But what I find odd about this entire approach of treating literals 
is that the literal is already the label of a node in the graph, so 
why should we have to introduce another node to represent its value? 
When we label a node with a uriref, we don't feel obliged to 
introduce another node to stand for its value; that uriref node 
denotes the value already. Why not do the same for literals? That 
seems like the obvious naive approach, and indeed it works.

>  > >3.2 Datatyping: classes or mappings?
>  > >------------------------------------
>......
>  > >My feeling is that both views may be useful for representing typed
>>  >data (just as wave-particle dualism is helpful for explaining
>>  >different phenomena in physics ;). On the one hand, if data values do
>>  >not have fixed URI identifiers, we need a *mapping* that allows us to
>>  >identify resources as data values using their lexical representations.
>>  >On the other hand, for defining and resticting datatypes, the class
>>  >view is superior (although it looks like the class view is in
>>  >principle dispensable).
>>
>>  I think we can have both. We have a class/property distinction at the
>>  basis of RDFS, and it seems natural to map this entire discussion
>>  into that vocabulary. Data type mappings are rather like (the
>>  extensions of) properties assigning data values to lexical strings,
>  > and the ranges of these properties are the classifiers whose class
>>  extensions are the sets of data values themselves
>
>This is exactly what I think, too. I reckon it's worth it to demonstrate
>explicitly how the class/property distinction applies to datatyping just
>to clarify things.

Right, that is exactly how the model theory treats them also. The 
only requirement is that the classes that are datatypes have 
recognizable names, so that a reasoner (human or machine) knows to 
treat them in a particular way when it comes to literals. That could 
be done for example by having a class called rdfs:Datatype to which 
they must all belong.

>.....
>>  >it presumably does
>>  >not make sense to use it as object for properties like "age", "size",
>>  >"price", "weight", etc. In fact, such use would suggest that e.g. the
>>  >weight of a thing is a lexical token; typically, we'd like it to
>>  >denote some abstract entity that corresponds to say 5 pounds.
>  >
>>  No, no. If I USE a literal as a value, I am not MENTIONING a lexical
>>  token; I am using the literal to indicate a literal value. So for
>>  example by writing
>>
>>  phayes weightAtAge50inPounds "165" .
>>
>>  I am saying that my weight was 165 pounds, not that it was a lexical item.
>
>To reiterate my point, with substantial mental effort we (actually, you
>and Peter P.-S.) can make the above statement work, i.e. to have some
>meaningful interpretation.

WOrking out the mathematical machinery of the model theory was some 
effort, but now that this has been done, the technique of using it to 
interpret pieces of RDF is easy and very intuitive.

>My point is that *clarity* is what matters
>first for the SW to take off.

I agree entirely. The semantic conditions used by the MT extension 
could hardly be clearer. In intuitive paraphrase, they are that 
datatypes are classes, and that if the value of a literal is known 
(using the ordinary rdfs inference machinery) to be in a class which 
is a datatype, then the literal must be interpreted using the 
conventions associated with that datatype. So for example if I know 
that the rdfs:range of aaa is (a subclass of) the datatype 
xsd:integer, then I know that if I see
bbb aaa "101001" .
then the value of aaa on bbb is a hundred and one thousand and one, 
and not a character string, or the 10th October this year, or 41.

>Recall the recent suggestion by Peter to
>give each and every XML document some meaningful semantic
>interpretation. This just doesn't work

Maybe not, but that seems irrelevant to the present point.

>, because developers would
>generate a lot of "meaning" which is in fact just jibberish. Same
>argument applies to the above. In order to make applications work,
>people who encode the data must cooperate.

All they have to do is name their datatype, and rdfs can do the rest. 
If they refuse to name their datatypes then there is indeed not much 
we can do for them by way of datatyping.

>Sorry about getting into
>rhetorics.
>
>In the above statement, the property weightAtAge50InPounds fulfills in
>fact two purposes at once:
>
>1) it tells us how to interpret the token "165"

Because we know that its range is a datatype, yes. But I would like 
us to have other ways to convey that information. This is not really 
to do with the property, but to do with what we can infer about its 
object. Unfortunately, since these objects cannot be subjects, it is 
rather hard to find any other way to make this kind of inference than 
through the range of the property, but that is merely an accident of 
this arbitrary syntactic restriction. If literals could be subjects 
there would be a rich variety of ways of stating the datatype of the 
literals.

>2) it establishes some relationship between the interpretation of this
>token with phayes.

Right. Notice however that the usual way to state a realtionship 
between an interpretation of a token and something else is to simply 
use the token in an assertion. Which is exactly what I want to be 
able to do with literal tokens, just like urirefs.

>My suggestion is to separate these two purposes. To be even more
>human-friendly, you'd write:
>
>     phayes weightAtAge50inPoundsInDecimalEncodedByISO8601 "165"
>
>But that is far from being machine-friendly.

I agree. However, we could allow things like
phayes weightAtAge50 <xsaw:pounds#xsd:decimal "165"> .

xsaw is a hypothetical American Standard Weights data metatype, by the way.

>
>>  >In other words, for most meaningful representations, we can think of a
>>  >property whose objects are literals as a mapping that associates a
>>  >value space with some lexical space.
>>
>>  No, that is what the datatyping mapping does, not the property. It is
>>  LIKE a property, but it is not itself an RDF property. If we assume
>>  that, then we are begging the question, since we have simply
>>  described the datatying in RDF; and then there is no datatying as
>>  such.
>
>Perfect. So we can sit back and relax.

Well, but I think we are relaxing on false hopes, since in this case 
we simply do not have datatyping; what we have done is to trivialize 
the datatyping issue by making all literals be strings, and then 
talking about the datatypes as if they were ordinary properties of 
those strings. This is coherent, but it seems to me that it just 
abandons the whole idea of datatyping by failing to distinguish 
between datatyping and ordinary properties.

>Still, it's like saying that
>since ICEXT is an abbreviation that uses the extension of I(rdf:type),
>there are no classes and instances in RDFS.

No, its not like that. Having or not having ICEXT in the MT is a 
stylistic matter in how the model theory is presented. It makes no 
difference to the RDF user. But what we are arguing about here is a 
real difference to the language itself.

>IMO, RDF and the MT draft
>has already got enough means to introduce other concepts like classes
>and datatyping elegantly, without much friction.
>
>>  >In yet other words, each
>>  >literal-valued property may be though of (by convention) as a
>>  >"datatyping property" (also referred to as "interpretation property"
>>  >by TimBL).
>>  >
>>  >If <SUG2> turns out to be acceptable, the next thing I would suggest
>>  >to nail down is the nature of literals. A further proposal from my
>>  >side would therefore be
>>  >
>>  ><SUG3>: the interpretation of each literal symbol is fixed
>>  >         and is determined by its textual contents.
>>
>>  If we adopt this convention then there is no need to invoke any
>>  special treatment of datatyping in RDF itself, since all the
>>  datatyping is purely a lexical matter. (?) Seems to me that this
>>  trivialises the discussion.
>
>Yes, as does ICEXT. Basically, with <SUG3> we can build datatyping on
>top of RDF just by providing some standard interpretation for a bunch of
>properties, just as RDFS builds on RDF/MT. But that's great, isn't it?

Not if it involves complicating hte language in what seem to me to be 
unnatural ways, and also if it provides no way to distinguish 
properties which an RDF processor would expect to call out to a a 
API, and those which it would expect to find described in an RDF 
graph somewhere.

Pat

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Thursday, 1 November 2001 12:01:58 UTC