Re: One final step to datatyping convergence and closure? from Pat Hayes on 2002-02-09 (w3c-rdfcore-wg@w3.org from February 2002)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Sat, 9 Feb 2002 01:28:45 -0600
To: Patrick Stickler <patrick.stickler@nokia.com>
Cc: RDF Core <w3c-rdfcore-wg@w3.org>
Message-Id: <p05101470b88a67023f56@[65.212.118.208]>
>I am very grateful to Pat for his execellent summary of the convergence
>proposal in terms of his proposed MT treatment and to Jeremy for his
>earlier distillation of it into specific issues. I feel that within
>these can be found an excellent solution to the datatyping issue.

In case anyone wonders what this is about, take a look at
http://www.coginst.uwf.edu/users/phayes/DatatypeSummary.html
A newer (slightly simpler) version is being produced at
http://www.coginst.uwf.edu/users/phayes/DatatypeSummary2.html

>The following (hopefully) explains why I feel it is in the users'
>best interest to adopt the convergence proposal without the datatype
>triple idiom detailed in section 3 of Pat's summary (which actually
>was not included in the original convergence proposal specifically
>for the reasons outlined below).
>
>I've tried to be relatively terse, which I think will be appreciated.
>If any particular statement is not clear, just ask and I will gladly
>expound upon it until it is clear or until you start throwing heavy
>objects ;-)

OK. Most of the following points I either disagree with, or think are 
just mistaken.  I don't follow the qname point and would like to have 
that expounded more fully.

I think there are some convincing reasons why datatype triples (ie 
using the datatype name as a property) have genuine utility, and 
allowing them is not a heavy burden either on users or implementers. 
(I have to say, the idea that RDF is *complicated* seems ludicrous, 
in a world with XSD, Java and DAML+OIL in it; Ive never met anyone 
who has expressed that view. Mostly it is seen as almost childishly 
oversimplified, in circles I move in.)

>As with my convergence proposal, please read the whole thing before
>responding/reacting to a single particular statement, as most are
>qualified/explained in subsequent paragraphs.
>
>OK, here goes...
>
>1. The datatype triple idiom is not needed, and has no actual utility.
>
>a) The idiom does not do anything that doublet idiom does not do,
>except allow multiple types to be explicitly defined for same bNode,

Well, same node, Doesnt have to be a bnode. But OK....

>yet:
>
>b) The ability to define multiple types for same bNode will, I
>think, rarely be useful even even more rarely used,

I disagree. I think it has several obvious uses, and will be widely 
used, largely because it one of the simplest idioms and Linux-grade 
robust.

>  and:
>
>c) The utility of such multiple type definition is an illusion
>since it doesn't remove any untidyness in value comparisons because
>doublet and global idioms still exist, and also there may be separate
>datatype triples which denote same value.

There MIGHT  be, of course; any two RDF nodes *might* co-refer. But 
the central point is that RDF provides no way to express this 
co-reference, which makes it impossible to translate some uses of 
datatyping triples into RDF at present. For example, consider

_:x ex:USdate "05-08-02" .
_:x ex:UKdate "08-05-02" .

which one would have to expand using two bnodes to avoid datatype clashes:

_:x1 rdf:value "05-08-02" .
_:x1 rdf:dtype ex:USdate .
_:x2 rdf:value "08-05-02" .
_:x2 rdf:dtype ex:UKdate .

Now, *how* do we say that _:x1 and _:x2 co-refer? There is no way to 
say this in RDF. So the datatype triple style enables us to express 
some content that cannot be expressed any other way in RDF.

>Thus, the datatype triple idiom actually does not offer any real utility.

I think that this ability to record information about entities using 
a variety of datatypes might be extremely useful when merging 
information from a number of different sources, and is not illusory. 
The point is not to eliminate value comparisons, but to provide a way 
to merge information from disparate sources without needing to worry 
about datatype consistency; without, in fact, needing to even 
consider it; since if one uses this style consistently in an 
application, clashing datatypes can be used with impunity since none 
of the scopes can possibly overlap. This is the only mode of literal 
use in RDF that can completely avoid all checking for datatype 
clashes in a completely open environment.
Imagine for example an HTML content scraper that records results 
initially by treating text fragments as literals, and has a variety 
of techniques for guessing datatype relations between the things it 
guesses exist and the test that refers to them. If it were obliged to 
use doublets, it would need to expend considerable work to keep track 
of  possible clashes, and would need to use some techniques external 
to the RDF triple store in order to keep track of co-references 
between sets of bnodes. All this is unnecessary if it uses datatype 
triples. It can even make up its own 'datatypes' as needed and treat 
them identically in the triples store.

>2. The inclusion of the idiom does not meet the "KISS" desire of users.
>
>a) Since not needed, it merely adds complexity both for users and
>implementors.
>
>The point of a standard such as RDF is to provide a consistent,
>explicit point of intersection between knowledge producers, knowledge
>consumers, and application implementors which facilitates efficient
>implementation and interoperability. Including needless components
>for the sake of style or preference which impact portability of
>knowledge and interoperability of systems, or needlessly complicate
>implementation or use of such knowledge are contrary the very point
>of such a standard.

What counts as necessary? (Are containers necessary? Is reification 
necessary?? Is negation necessary?) It is clear that the 'idiom' is 
found intuitive by many people, even in this working group; it arises 
naturally from established XML usages, as Sergey has noted. Why not 
allow people to use it, if they find it natural, it has clear use 
cases, and it comes virtually for free? (The work needed to recognize 
a datatyping triple is about identical to that needed to detect a 
doublet, and apart from having smaller scopes, they mean the same 
thing.)

>b) It is not as symmetrical with the global idiom, therefore harder
>for users to understand its relationship with the global idiom than
>is the doublet idiom.

I have no idea what this means. What sense of 'symmetrical' is being 
used here? The meaning of a datatype triple is not hard to grasp or 
difficult to work with. One can think about it simply in terms of a 
packaged doublet with a limited naming scope, and never make a 
mistake in usage. Even the MT (which most users will never read) 
states the truth-conditions in one small equation. Its simpler than 
most of RDFS.

>RDF is already widely percieved as "difficult to understand" and
>"difficult to use". The last thing we want to do is make it any
>more difficult by making the datatyping solution needlessly
>complicated.

See above. I should probably not comment on this further, for fear of 
giving offense.

>We have an opportunity to provide a solution based on two clearly
>and intuitively related idioms

I think that it is better to think about these as all variations on a 
theme - basically hanging datatyping information into a value triple 
in one way or another - than as a catalog of 'idioms'. We have been 
talking that way, but I think it makes things needlessly 
awkward-seeming, since one can grasp them all as variations on two 
basic ideas.

>which help users understand the
>relation between typed literals and the datatype that provides the
>context for that typing. The superfluous, more complex

It can hardly be called more complex; its about the simplest form one 
could imagine.

>datatype
>triple idiom undermines us providing the simpler, fully symmetrical
>solution.
>
>c) It requires schema definitions to use -- and thus it is not a
>schema-free local idiom, which was the whole point of providing a
>local/explicit idiom.

I think this is wrong on two counts. First, it doesnt *require* using 
a schema definition - that may be a problem with my exposition in the 
first draft. Second, what is this about local idiom being 
'schema-free'? Ive never heard of that idea in our discussion before, 
and I don't know what it means. Arent we talking about a schema 
language here?

>One must define each datatype as an rdfs:subPropertyOf rdf:value
>in order for the MT interpretation to work. Thus, the idiom does
>not meet the desiderada of either a local/explicit or global/implicit
>idiom, but is a kind of strange hybrid that needs both local
>definition and schema definition to work.

I think this is just plain wrong. There is a single, clear, notion of 
scope that handles all three idioms. The scope of a datatype is the 
'area' of the graph within  which it imposes an interpretation on 
literals. The scope of a datatype triple is the triple itself, the 
most local idiom possible. The scope of a doublet is a kind of 'star' 
of value triples attached to one node: all value triples with the 
same subject. The scope of a range datyping on P is a set of stars: 
all value triples in the graph whose subject is the object of a  P 
triple.
If any of these deserves to be called a 'hybrid', it would be the doublet case.

>
>3. The idiom forces the qname issue.
>
>The XML Schema community strongly dispute RDF qname practice as
>valid and an idiom that requires the use of qnames puts us deep in
>the middle of that issue -- which likely cannot be resolved within
>the boundries of our present charter.
>
>One has no choice but to use qnames to use the datatype triple
>idiom, whereas the other idioms work with full URIs and avoid
>this issue entirely.

I do not follow this. We refer to 'urirefs' and Ive always assumed 
that whatever they are, full URIs always count as urirefs. So it 
would seem to follow that one could use full URIs in this case as 
well. In fact, it seems to me that any uriref that can be used as a 
node label in RDF can also be used as an arc label. So why does one 
have 'no choice' about using qnames  here??

>Why exacerbate this issue needlessly?
>
>4. Its interactions with rdfs:subPropertyOf are not clear.

No, they are perfectly clear and unambiguous.

>
>a) Datatype properties only make sense with a bNode domain.

? Why? I see no problem in having a datatype triple (or a doublet, 
for that matter) with a uriref as subject.  None of the conditions 
even mention the nature of the subject node, in fact.

>We must
>either somehow restrict the use of datatype properties to bNode
>subjects or explain to users why they should do that, making more
>work for us and more for users to understand. If not restricted in
>this fashion, then:
>
>b) What does it mean to combine datatype semantics with "traditional"
>property semantics?

?? They *are* traditional property semantics, with some stuff added 
that only affects literals. All the 'traditional' rules apply just as 
before, unchanged.

>If some property such as ex:age is declared as
>a subproperty of xsd:integer, what does that mean?

It means that if xxx ex:age yyy then xxx xsd:integer yyy, which in 
turn means that xxx has to be an integer and yyy has to be a numeral. 
That's a very odd way to interpret a property called 'age', but 
that's what it means.

>Should it even
>be allowed (per (a) above)?

I don't think we can pass laws forbidding people from saying dumb things.

>Should users be advised against it,
>and why/how?

Again, why should we mention it particularly? I mean, we might also 
point out that it is probably a bad idea to say that ex:marriedTo is 
a subProperty of, say, ex:favoriteDogIs....

>Again, more questions to be answered and addressed by
>the MT, the primer, the spec, or elsewhere.

Nope.

>
>c) Cohabitation with doublet and global idioms not yet certain.
>
>See my comments in my reply to Pat's summary which detials a
>potential erroneous inference based on combined use of all three
>idioms (sorry, offline, but should be easy to find).

And see my reply explaining why it isn't an erroneous inference.

>Even if the MT could be re-re-re-vamped to address these issues,

The MT doesn't need to be re-vamped; there are no issues to address.

>that's unnecessary work, since we have a local idiom (which is a
>true local idiom not requiring any schema assertions to be properly
>interpreted) and the single percieved benefit of the datatype triple
>idiom is, as explained above, an illusion; so what's the point in
>continuing to try to make the idiom work (apart from just stylistic
>preference)?
>
>--
>
>Thus, the combined weight of all of these issues regarding the
>datatype triple idiom is, I feel, overwhelming, and we will be
>serving RDF users far better by giving them a solution based only
>on the symmetrical doublet and global idioms.
>
>If we take the original convergence proposal and Pat's MT for it,
>sans the datatype triple idiom, then we can be done with datatyping.
>I repeat, done. Whoohoo!

We can be done with datatyping without trashing that idiom, even 
bigger whoohoo.

>Please, please, please, let's drop this extra idiom and move on. OK?

Lets keep it and move on. Patrick doesn't have to use it if he 
doesn't want to. Sergey and I will use it, and I suspect that many 
other people will also use it.

Pat

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Saturday, 9 February 2002 10:37:06 UTC