Re: One final step to datatyping convergence and closure? from Patrick Stickler on 2002-02-11 (w3c-rdfcore-wg@w3.org from February 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Mon, 11 Feb 2002 12:35:59 +0200
To: Pat Hayes <phayes@ai.uwf.edu>
CC: RDF Core <w3c-rdfcore-wg@w3.org>
Message-ID: <B88D6B2F.DEA0%patrick.stickler@nokia.com>
On 2002-02-09 9:28, "ext Pat Hayes" <phayes@ai.uwf.edu> wrote:



> (I have to say, the idea that RDF is *complicated* seems ludicrous,
> in a world with XSD, Java and DAML+OIL in it; Ive never met anyone
> who has expressed that view. Mostly it is seen as almost childishly
> oversimplified, in circles I move in.)

It has always been evident that we move in very different
circles... ;-)

> largely because it one of the simplest idioms

Simplest? How? For whom?

> and Linux-grade 
> robust.

I don't see how any of the idioms are any more or less robust
than the others.

What is the basis of your ascribing this quality to one and not
the other idioms?

> There MIGHT  be, of course; any two RDF nodes *might* co-refer....
> 
> _:x1 rdf:value "05-08-02" .
> _:x1 rdf:dtype ex:USdate .
> _:x2 rdf:value "08-05-02" .
> _:x2 rdf:dtype ex:UKdate .
> 
> Now, *how* do we say that _:x1 and _:x2 co-refer? There is no way to
> say this in RDF. So the datatype triple style enables us to express
> some content that cannot be expressed any other way in RDF.

You missed my point entirely.

How do you say that _:x3 and _:x4 in the following co-refer?

  _:x3 ex:USdate "05-08-02" .
  _:x4 ex:UKdate "08-05-02" .

The whole point was that (a) in order to consistently address
equality of values you must have an application that supports
all of the datatypes in the graph and (b) if you have such an
application, why muck about with the idioms anyway, just use
the values! 

Thus, the utility that you and others ascribe to the idiom
is, I assert, mostly an illusion, insofar as knowledge
interchange is concerned.

The only utility comes within the context of a datatype
aware application -- and in that context there are far
better representations than the datatype triple.

>> Thus, the datatype triple idiom actually does not offer any real utility.
> 
> I think that this ability to record information about entities using
> a variety of datatypes might be extremely useful when merging
> information from a number of different sources, and is not illusory.
> The point is not to eliminate value comparisons, but to provide a way
> to merge information from disparate sources without needing to worry
> about datatype consistency; without, in fact, needing to even
> consider it; 

And how, pray tell, can you achieve that merging without
value comparisons?! In order to merge, you must compare values.

> since if one uses this style consistently in an
> application, clashing datatypes can be used with impunity since none
> of the scopes can possibly overlap. This is the only mode of literal
> use in RDF that can completely avoid all checking for datatype
> clashes in a completely open environment.

Again, you are mixing implementation space and model space. You
are arguing that we should keep the datatype triple idiom because
it makes internal graph representation more economical or captures
that two lexical representations denote the same value.

Could you provide some examples, e.g. queries or similar, where
the datatype triple idiom effects either the expression of the
query or its accuracy if we are concerned with values, and not
the idioms themselves? I doubt it.

If we are going to argue idioms for the same of implementational
benefit, then I would say that the URV idiom beats all of them
hands down -- so forget both the doublet and datatype triple
idioms and use URVs which achieve maximal tidyness in the graph
and also completely avoids all datatype clashes, etc. etc.

> Imagine for example an HTML content scraper that records results
> initially by treating text fragments as literals, and has a variety
> of techniques for guessing datatype relations between the things it
> guesses exist and the test that refers to them. If it were obliged to
> use doublets, it would need to expend considerable work to keep track
> of  possible clashes,

How so? I can think of several ways to do this easily. The most obvious
is a typed node with a membership property (e.g. a type of container).

> and would need to use some techniques external
> to the RDF triple store in order to keep track of co-references
> between sets of bnodes.

Not at all. I wouldn't use the datatyping idioms for the
process specific knowledge at all. Rather, I'd define an ontology
for the process that keeps track of the literals and the
possible intepretations that are suggested by various input content,
and then, once  done with the scraping, analyze the
various possiblilities and express the results in terms of
the datatyping idioms.

> All this is unnecessary if it uses datatype
> triples. It can even make up its own 'datatypes' as needed and treat
> them identically in the triples store.

Well, I don't really see much more utility or graph compression in

   _:x ex:datatype1 "foo" .
   _:x ex:datatype2 "foo" .
   _:x ex:datatype3 "foo" .
   _:x ex:datatype4 "foo" .

than in

   _:x rdf:value "foo" .
   _:x rdf:dtype ex:datatype1 .
   _:x rdf:dtype ex:datatype2 .
   _:x rdf:dtype ex:datatype3 .
   _:x rdf:dtype ex:datatype4 .

and in fact, I consider the latter to be more intuitive.

The restriction of one rdf:value can be either a constraint of
the idiom (an addition to the present definition) or a constraint
of the scraper application -- and the multiple types may conflict
until the application decides which is/are correct for the literal
in question.


> What counts as necessary? (Are containers necessary?

Yes (even if the present treatment is not optimal)

> Is reification 
> necessary?? 

Yes

> Is negation necessary?

Some think so ;-)

> ) It is clear that the 'idiom' is
> found intuitive by many people, even in this working group; it arises
> naturally from established XML usages, as Sergey has noted.

Actually, I don't think that is an accurate statement, which I've
explained in earlier responses to Sergey on this point. It no more
reflects XML usage than any of the other idioms.

I could (just as wrongly) make the same claim for each of
the other idioms.

> Why not 
> allow people to use it, if they find it natural, it has clear use
> cases, and it comes virtually for free? (The work needed to recognize
> a datatyping triple is about identical to that needed to detect a
> doublet, and apart from having smaller scopes, they mean the same
> thing.)

Again, the issue is that if the doublet idiom does the job, we
do not need two idioms doing the same thing. That needlessly
increases the burden on both users and implementors.

>> b) It is not as symmetrical with the global idiom, therefore harder
>> for users to understand its relationship with the global idiom than
>> is the doublet idiom.
> 
> I have no idea what this means. What sense of 'symmetrical' is being
> used here? 

The fact that the doublet and global idioms are identical except
for the presence or absence of the rdf:dtype property. I.e. they
look similar both in the graph and in the XML serialization.
Their relationship is "visually" reinforced for the user.

> The meaning of a datatype triple is not hard to grasp or
> difficult to work with. One can think about it simply in terms of a
> packaged doublet with a limited naming scope, and never make a
> mistake in usage. Even the MT (which most users will never read)
> states the truth-conditions in one small equation. Its simpler than
> most of RDFS.

This may come as a surprise to you, Pat, but most users of RDF
will neither care to nor be able to read the MT. To say that something
is "easy to grasp" because of the MT, while perhaps true in the
circles you move in, has little to no weight in the circles I move in.

That's not saying the MT is not important, it is, but with all
due respect, what you percieve as easy, intiutive, or optimal
is not necessarily what the typical RDF user will find easy,
intuitive, or optimal. This is no insult to the "typical" RDF
user, but rather a compliment to you.

>> RDF is already widely percieved as "difficult to understand" and
>> "difficult to use". The last thing we want to do is make it any
>> more difficult by making the datatyping solution needlessly
>> complicated.
> 
> See above. I should probably not comment on this further, for fear of
> giving offense.

Likewise ;-)

>> We have an opportunity to provide a solution based on two clearly
>> and intuitively related idioms
> 
> I think that it is better to think about these as all variations on a
> theme - basically hanging datatyping information into a value triple
> in one way or another - than as a catalog of 'idioms'. We have been
> talking that way, but I think it makes things needlessly
> awkward-seeming, since one can grasp them all as variations on two
> basic ideas.

I agree, in that the idioms are just expressions of the same
underlying concept -- which has been expressed in the U and TDL
proposals for a long long time. That a literal within a datatype
context is a lexical form that denotes a value. That's crystal
clear. 

Changing "typed data literal pairing" to "value triple" does
not change the underlying idea.

And the separation of idioms from that core model was
a fundamental goal of the TDL proposal. Though even though the
idioms are at a separate layer from the core model, that does
not mean we want alot of them.

And I have argued consistently that the idioms are secondary to
the underlying model, and that we should have the absolute minimal
number of idioms.

>> which help users understand the
>> relation between typed literals and the datatype that provides the
>> context for that typing. The superfluous, more complex
> 
> It can hardly be called more complex; its about the simplest form one
> could imagine.

I am not speaking about form. I am speaking about the whole enchilada.
Understanding how the form relates to the datatyping model and results
in an interpretation providing a value. How an application (or user)
puts it all together and understands the sum total.

>> datatype
>> triple idiom undermines us providing the simpler, fully symmetrical
>> solution.
>> 
>> c) It requires schema definitions to use -- and thus it is not a
>> schema-free local idiom, which was the whole point of providing a
>> local/explicit idiom.
> 
> I think this is wrong on two counts. First, it doesnt *require* using
> a schema definition - that may be a problem with my exposition in the
> first draft. Second, what is this about local idiom being
> 'schema-free'? Ive never heard of that idea in our discussion before,
> and I don't know what it means. Arent we talking about a schema
> language here?

You should have a look at the desiderada, then.

It has been repeatedly stated that we must have an idiom that captures
the datatyping explicitly and which can be interpreted by an application
without any additional schema knowledge -- thus
the fact that there must be an rdfs:subPropertyOf relation defined for
*every* datatype "property" means that it is not a local/explicit idiom.
An application cannot differentiate between datatype properties and
non-datatype properties without it.

The datatype triple idiom is not safely recognizable by an application
without that extra schema knowledge.

>> One must define each datatype as an rdfs:subPropertyOf rdf:value
>> in order for the MT interpretation to work. Thus, the idiom does
>> not meet the desiderada of either a local/explicit or global/implicit
>> idiom, but is a kind of strange hybrid that needs both local
>> definition and schema definition to work.
> 
> I think this is just plain wrong.

Which? That the idiom does not work without the rdfs:subPropertyOf
statements or that it is not a true local idiom?

Both of those assertions, however, are correct.

> There is a single, clear, notion of
> scope that handles all three idioms. The scope of a datatype is the
> 'area' of the graph within  which it imposes an interpretation on
> literals. 

No disagreement there. That's simply an expression of the TDL concept.

> The scope of a datatype triple is the triple itself, the
> most local idiom possible.

Wrong. Without an rdfs:subPropertyOf rdf:value statement, you
cannot know that it is a datatype triple. Only the doublet
idiom has such a constrained scope.

Without knowledge that the datatype property in question is
a subproperty of rdf:value, it is just another property and
not a datatyping property.

> If any of these deserves to be called a 'hybrid', it would be the doublet
> case.

I very much disagree. I.e.


                   Global      Doublet      Datatype Triple
                 --------------------------------------------
Local Typing                      +                 +
Schema Required       +                             +


Now, which one is the hybrid?

 
>> 
>> 3. The idiom forces the qname issue.
>> 
>> The XML Schema community strongly dispute RDF qname practice as
>> valid and an idiom that requires the use of qnames puts us deep in
>> the middle of that issue -- which likely cannot be resolved within
>> the boundries of our present charter.
>> 
>> One has no choice but to use qnames to use the datatype triple
>> idiom, whereas the other idioms work with full URIs and avoid
>> this issue entirely.
> 
> I do not follow this. We refer to 'urirefs' and Ive always assumed
> that whatever they are, full URIs always count as urirefs. So it
> would seem to follow that one could use full URIs in this case as
> well. In fact, it seems to me that any uriref that can be used as a
> node label in RDF can also be used as an arc label. So why does one
> have 'no choice' about using qnames  here??

Because, with the datatype triple idiom, one must use qnames in
the RDF/XML serialization, yet with global and doublet idioms
one may use only complete URIrefs in the RDF/XML serialization.

The scope of the datatyping solution extends beyond just the
MT or the graph. It must be an optimal solution for the entire
scope of RDF, which includes XML serialization and other
usability issues.


>> Why exacerbate this issue needlessly?
>> 
>> 4. Its interactions with rdfs:subPropertyOf are not clear.
> 
> No, they are perfectly clear and unambiguous.

That is still not evident to me. And if not evident to me,
likely also not evident to a great many RDF users.

(I know that probably comes off sounding arrogant or
conceited, but I don't know how else to say it...)



> Again, why should we mention it particularly? I mean, we might also
> point out that it is probably a bad idea to say that ex:marriedTo is
> a subProperty of, say, ex:favoriteDogIs....

No. You are trivializing the issue. It appears logical, efficient,
and clever to get double duty out of a property by subclassing
it as a datatype property rather than defining a range -- whereas
your example above is just plain stupid.

>> Again, more questions to be answered and addressed by
>> the MT, the primer, the spec, or elsewhere.
> 
> Nope.

I disagree. Maybe not in the MT, if you are correct
and there really are no longer issues there, but
certainly *somewhere*. I don't expect we will leave
the users in the dark about such pitfalls.


>> See my comments in my reply to Pat's summary which detials a
>> potential erroneous inference based on combined use of all three
>> idioms (sorry, offline, but should be easy to find).
> 
> And see my reply explaining why it isn't an erroneous inference.

Fair enough, though it is unexpected. I.e. the explicit
knowledge appears correct, and the reason why it is not
correct is hidden in the machinery in a way that it is
not hidden for the doublet idiom.

>> Please, please, please, let's drop this extra idiom and move on. OK?
> 
> Lets keep it and move on. Patrick doesn't have to use it if he
> doesn't want to. Sergey and I will use it, and I suspect that many
> other people will also use it.

Gee, and I thought we were thinking about the entire RDF
community... not just what we individual members would
personally like to use.

Patrick


--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com
Received on Monday, 11 February 2002 06:54:04 UTC