Re: PRIMER: draft data model section from Pat Hayes on 2001-10-22 (w3c-rdfcore-wg@w3.org from October 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Mon, 22 Oct 2001 12:05:40 -0500
To: "Bill de hOra" <bdehora@interx.com>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05101036b7f9edd13635@[205.160.76.193]>
>Hi Pat,
>
>lots of questions...
>
>
>Pat Hayes:
>>>
>You say there:
>"Clearly in RDF processing, literals are opaque to RDF (i.e. if a
>literal
>looks like a URI or more RDF, it is treated as a literal, not as a URI
>or more RDF). However an application using RDF triples and able to
>properly manipulate Dublin Core in your example would be entitled to
>draw the inference that the literal names an entity."
>
>I want to take issue with this, because to me it seems to contradict
>itself. As you say, to an RDF processor, literals are opaque. But you
>then say that an 'application using RDF triples' would be entitled to
>draw an inference which requires (?) NOT treating literals as opaque
>if they refer to the Dublin core. Is this application supposed to be
>an RDF application or not? If it is, then it is not entitled to make
>that inference, since there is nothing in the RDF that enables it to
>draw it. If on the other hand it is able to draw that conclusion,
>then it evidently is not using information (only) from the RDF
>triples, so why should an account of the RDF meanings of the triples
>be concerned with this information?
>>>
>
>When I say opaque, I mean a parser cannot treat the literal as anything
>other than a literal. What the application/engine/person does with the
>triple is its own business.

What about something that claims to be drawing RDF-valid conclusions 
from the RDF ? That's not a parser, but it is (I presume) 'using' the 
content that can be expressed in the RDF. Contrast that with 
something like your examples below, where the engine is doing 
God-knows-what with the RDF. Now, I'm not saying that this last kind 
of usage should be deplored or forbidden or anything, but I am saying 
that it seems useful - I think essential - to draw a distinction 
between these two cases. One of them uses the RDF to mean exactly 
what the RDF spec says it means, the other just uses it in some 
private ad-hoc way. One conforms to the standard, one does not. If 
someone publishes some RDF and says it is RDF, then the first engine 
will draw conclusions that were intended by the publisher, and that 
is guaranteed by the RDF spec. They do not need to communicate with 
one another in any way beyond the RDF itself. There are no guarantees 
for the second case; what that engine makes of the published RDF 
might have nothing whatever to do with what the writer of the RDF 
intended by writing it, unless the latter has somehow explained to 
the reader what he intended to mean by it, in some other way.


>I appreciate that a clean delineation
>between parsers and engines isn't always the case.

We agree.

>
>A strawman. With my A. Haxor hat on, I read the DC creator description
>in order to write some code for a dodgy KM module which tries lifts a
>literal from bytes/string/Unicode/whatever we decide literals are, into
>denoting a person that has a URI denoting them. So the code's job is to
>bind dc creator values to RDF  resources. Someone might say 'Bill, what
>on earth is this code doing? You should get the assumptions out of the
>code and write them down'.

I would say, " Neat. But you aren't going to call this an RDF engine, I hope?"

>I could point to my code and say, 'But it
>works, the worst that happens is that it says it couldn't find anyone,
>and anyway, Dublin Core didn't write their assumptions down either, it's
>not my fault we use it'.
>
>Maybe that's not an RDF application, and that's a kind of realpolitik
>hacking we wish to discourage or someday make unnecessary, but I
>conjecture there will lots of abductive code like this for some years to
>come for the express purpose of scraping over existing data: it's the
>stuff of business models. Old data never dies, it just gets integrated.

OK, but what worries me is what we, as Guardians of the RDF Faith, 
should be saying about it. If we say this is what RDF is all about, 
then there's no real point to RDF other than as a triples-sending 
protocol. (Yawn.) If we take the SW rhetoric even slightly seriously, 
then saying that something is RDF ought to mean more than that.

>
>Pat:
>>>
>I think this is particularly tricky as an example since the part of
>the Dublin core that you quote is written in English, so presumably
>is not fully understandable by any software yet written by anyone.
>>>
>
>I think any examples working out of natural language are tricky.

I agree, but one way to exorcise some of the trickiness is to 
emphasize that we are NOT saying that the RDF fully captures the 
meaning of English, that these are just rhetorical devices, etc.

>Are
>there better examples we could use for the Primer?

Well, are there examples that would have immediate appeal to the DPL?

>
>
>Pat:
>>>
>>Bill:
>>[Indeed most of the examples I've seen of literal values have literals
>>naming an entity in a common-sense way. Yes that's folkware semantics
>>but there's nothing wrong with common-sense just because computers
>don't
>>have any.
>
>There is if you are touting the formalism as something for computers
>to use. In fact it is dishonest and foolish, and would be valid
>grounds for having ones tenure case dismissed at any reputable
>university department.
>>>
>
>I'm tempted to quote Groucho Marx. I was referring to common-sense as an
>aid to following the example in the primer, not attributing magic powers
>to RDF.

That really wasn't clear, however;  and that is the basic problem.

>  In the meantime someone's going to have to explain to me how we
>get from common-sense examples to RDF triples in this primer without
>some form of conjuring. There _is_ magic dust being sprinkled over
>Frank's examples and the ones in the M&S. What exactly is wrong with
>doing this as long as we point that out?

OK, but we should be more careful in pointing it out.

>
>
>Pat:
>>>
>>Bill:
>>In any case it's up to the modeller (in this case, the primer)
>>to make the semantics explicit. Surely that's what RDF is for?]
>>
>>Anyway. There is _nothing_ in RDF to make us think that when a property
>>has a literal value the property means the literal itself is the entity
>>and not what the value might denote
>
>True, but what it can denote is severely restricted. The model theory
>at present assumes that literals have a *fixed value*. So whatever a
>literal denotes, it has to denote it once and for all, globally. It
>can't denote one thing in the Dublin core and something else
>somewhere else.
>>>
>
>I'm not suggesting that a literal should denote different things.

But aren't you, implicitly? You objected initially to saying that a 
literal denotes a string, say, on the grounds that yer average hacker 
is going to want to use literals to refer to all sorts of things.  If 
that (literal-->whatever) mapping isn't called denotation, then we 
can say that literals denote strings (and strings can represent 
anything, but that's not RDF's business); if we say it is called 
denotation, then its globally fixed and YAH is not free to make it be 
whatever he likes. It the first case you shouldn't have objected, 
seems to me.

>I'm
>suggesting its reasonable for a parser to treat a literal as a blob and
>an application not to.

Say 'an RDF engine' and 'an application using RDF' and I will agree 
with you. But the RDF model theory specifies things about RDF that go 
beyond what a parser alone would do. It constrains the intended 
*meaning* of the notation, not just its syntax.

>As far as I'm concerned a parser's job is to get
>literals into working memory as literals; what's done in working memory
>with those literals, is another matter.

OK, but what's done in the name of RDF needs to be distinguished from 
what's done in general.

>Maybe they get served up to a
>person who realises the literal is actually someone they know, maybe
>there's a bug in the application code that gets the literal lifted,
>maybe someday someone will specify DC creator more formally that will
>allow machinery to infer that a literal hanging off dc:creator can stand
>for a person. Lifting literals via some kind of (I don't know,
>abductive?) inference is something people are going to want to do with
>RDF.

OK, but we have a responsibility to make as clear as possible which 
of these things that they are going to do constitute 'received RDF' 
and which do not. In particular, which things are guaranteed to 
preserve RDF meaning (ie are valid according to the RDF model theory) 
and which might not be. We need to do this so that we can state 
*semantic* guarantees for RDF users and toolbuilders.

>Pat:
>>>
>Now, that is a very simple and restrictive assumption, and Peter
>Patel-Schneider wants us to relax it to the extent of allowing
>datatype schemas for literals. The kind of examples he uses are where
>"070801" might mean either a date or a string or an integer, and one
>can determine from things like the rdfs:range information which
>datatype scheme is supposed to be applied to each literal occurrence.
>That is the scheme we have now worked out in full detail so that it
>COULD be incorporated into the MT for RDFS, if we want to do so,
>though right now its just kind of waiting to be put there. But even
>with this extension, the range of things that a literal can refer to
>is somewhat restricted to be the kinds of things that one finds in
>datatyping schemes. I'm not sure if having a literal like "creator"
>denote a particular human being would count as a datatyping scheme,
>but it sure doesn't sound like one to me.
>>>
>
>I'm confused here. Are you saying that you can't soundly get from a
>literal to a thing in the world?

I certainly am saying that. That is a universal fact about all of 
human language, one might call it the basic semiotic situation. There 
is no guaranteed sound way to get from ANY symbol to the world. 
That's why we put the symbols *in* the world, eg by wearing name tags 
at trade fairs, or writing company names on buildings, so we can get 
from the world to the symbols.

>If not, was there ever any point in
>basing DC creator on RDF?

Sure there was a point: it provides a way to securely publish your 
content. It may be impossible in principle to give someone a way to 
read your mind, but what you can do is publish what you mean in a way 
that guarantees that any misunderstandings don't matter. You have one 
interpretation in mind, and you write some RDF about it. I read it, 
and maybe I think of a different interpretation. But as long as we 
both use valid inference techniques, the differences don't matter, 
because anything entailed by that RDF is true in *all* 
interpretations.

>Pat:
>>>
>>In Frank's example, without clear semantics for 'creator' there isn't
>>enough information to make a decision on what's being implied.
>
>Right, and indeed that semantic content cannot be stated in RDF. I
>don't think we should pretend that it can if in fact it cannot.
>>>
>
>Does that invalidate Frank's and the M&S' examples? If it doesn't then I
>don't understand why and if it does how are we going to bootstrap this
>primer for oikes like myself?

It doesn't invalidate them, but it does illustrate why they make me 
squirm. Its OK to use examples like this as long as you are clear 
that you don't intend them to be taken literally; but there's a 
danger that they will be taken literally, so we really need to be 
careful when we are writing explanatory material.

>Pat:
>>>
>I think we should be very careful not to give the impression that we
>are saying that RDF can represent natural language meanings. It would
>be irresponsible, or just plain silly, to claim that
>existential-conjunctive logic can represent every meaning expressible
>in natural language. (If you think it can, then try publishing that
>claim in a refereed journal.)
>>>
>
>I know enough not to claim that. But that seems to make using any
>natural language example to pluck out a triple in the primer RDF snake
>oil.

It could be snake oil, or it could just be using an intuitive example 
in a primer to get an idea across. We need to make sure that it can 
only be read in the latter sense.

Pat

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Monday, 22 October 2001 13:05:53 UTC