RE: [RDFTM] Review of Survey Document from Uschold, Michael F on 2005-03-01 (public-swbp-wg@w3.org from March 2005)

From: Uschold, Michael F <michael.f.uschold@boeing.com>
Date: Tue, 1 Mar 2005 14:49:19 -0800
To: "Lars Marius Garshol" <larsga@ontopia.net>, <public-swbp-wg@w3.org>
Message-ID: <823043AB1B52784D97754D186877B6CF0583C2EE@xch-nw-12.nw.nos.boeing.com>
Well, that was darn fast!

It's a very tough job you have, the issues are many and complex and hard
to put into a nice pretty picture/story.

On the whole, I cnfess that I did not look at any of the test cases. I
tried, but it was too daunting, eyes kept glazing over.  It might be
helpful to talk to some key parts of the test cases, explaining the
important points, and then the rest of the code would be more managable.

I look forward to seeing a future draft.

Mike



-----Original Message-----
From: Lars Marius Garshol [mailto:larsga@ontopia.net] 
Sent: Tuesday, March 01, 2005 1:53 PM
To: public-swbp-wg@w3.org
Subject: Re: [RDFTM] Review of Survey Document




I skip over most of your excellent comments for the simple reason that I
agree with nearly all of them. I'm just going to try to deal with your
questions here, and leave the discussion of where we take the document
for later, since I don't want to talk for the TF as a whole.

* Michael F. Uschold
| 
| 2.1 Translation Features
| Completeness: By the definition given, a complete translation need not

| be reversible. One can translate from source to target and lose no 
| information one way, but lose information the other way.

That's true. This was meant as a criterion on RDF->TM proposals and
TM->RDF proposals in isolation from each other, so "reversible" here
means "reversible in theory given the right conversion in the opposite
direction".

I guess this is much less clear than it could be, in part because this
section was finished so late in the process. We should go back and make
this clearer.

| Perhaps you mean to say that complete means there is no loss either 
| way?  That is only possible when the two formats are capable of 
| representing the exact same set of concepts, i.e. have same expressive

| power.

That depends how you define these things, I guess. It's entirely
possible for RDF to express a "parenthood" relationship between three
people despite the fact that RDF only has binary relationships. The main
difference is really how much help each data model (or metamodel or
model, or ...) gives you in expressing what you want, and whether when
converting you can make the result of the conversion natural in the
target model.

| Yet RDF and TM both have some important differences. This means it is 
| inherently impossible to have complete translations. There WILL be 
| some loss in translation (or at least it seems that way to me).

I guess this relates to your comment regarding the Garshol proposal
below. I'll return to it there.
 
| Fidelity: there are two different notions here, both are important, 
| and you are only addressing one.
|
| 1. Naturalness: which seems to be inversely related to the need for 
| workarounds. 2. Accuracy: this is the most common meaning of 
| 'fidelity' in my experience
| 
| Accuracy is an important criteria referring to the correctness of the 
| translation - you seem to be ignoring this.

This is an interesting comment, given that the criterion was called
"naturalness" for a while, and that some of the contributors found it
difficult to stop calling it that. Maybe we should revert to that term,
because that really is what is meant, and not accuracy/ correctness.

We actually had a "correctness" criterion for a while, but dropped it in
favour of just discussing correctness while explaining the proposals.

English is not my native language, so I'm leaving the issue of which
term is best for what concept for others.
  
| 3.1 Moore Proposal
| Overall, good description, I did not follow all the details.
| 
| I had difficulty understanding the essential nature of and difference 
| between 'modeling the model' and 'mapping the model'.  Is the 
| distinction one between syntax and semantics? You don't quite say 
| that, but you do say the one is essentially a semantic approach.

That's an interesting comment, and I guess you are right that we should
explain it better. 

Basically the difference is that if a topic map has an association of
type 'parenthood' containing three roles (one of type 'father, one
'mother', and one 'child') then when "modeling the model" this will
actually be expressed that way in RDF.  That is, there will be one
resource for the association of type 'association', with a property
'tm:type' for giving the type, and another property 'tm:roles' for the
roles, and role resources for each of the roles, etc.

When "mapping the model" one would instead have a 'parenthood' resource
with three properties ('father', 'mother', and 'child') relating it to
the resources involved (each presumably of type 'person').

I hope this explains the difference. 
 
| 3.3 The O...y Proposal
| 
| Excellent observation about the impact of syntactic presentation, 
| comparing to "3rd RDF basic abbreviated form"

I find it very interesting that in the two reviews we've had so far, one
reviewer wants this statement removed, and the other thinks it is
excellent. I hope we get an odd number of reviews in the end. :-)
 
| I'm amazed that you say the translations are more or less complete. 
| How can that be, when there are so many things that RDF has that TMs 
| lack and vice versa (from section 2.2) how are containers handled? If 
| you throw them away when translating to TM, you can't get them back, 
| so this is incomplete. Ditto for the many other 'issues' in section 
| 2.2.

There are ways around these issues, such as creating TM vocabulary for
expressing containers in TMs. However, I'm not sure the statement that
these proposals are complete is correct, so we should revisit that.
 
| At some point, it will be necessary to analyze why there is such an 
| explosion of new statements (1 to 26 after a round-trip translation).

This follows more or less directly from the "modelling the model"
approach. The ternary association example above turns into four
resources with lots of properties. Going back to topic maps each
statement would most likely become a separate association in the
"modelling the model" approach, and so the 1 association turned into 3
associations...

| Is there reason to hope that new approaches could do better, and still

| achieve semantically accurate translation?

The "mapping the model" approaches all do better. In fact, they are
quite close to full naturalness (naturality, fideliousness, ...). I
would also question whether the "modelling the model" approaches
actually do produce a semantically accurate translation.
 
| It would be helpful to give an explanation with an example 
| illustrating why the extra statements are added.

That's effectively what the test cases do, and I think the reason this
is less than obvious is that they are rather big (and so require a lot
of concentrated reading) and that the evaluated "modeling the model"
proposals use a model of topic maps (PMTM4) that is very far removed
from most people's understanding of topic maps.

I tried to argue that Garshol02, which is a "modeling the model"
approach based on TMDM (the example above is roughly similar to it),
should be covered in order to avoid this problem. I didn't win that
argument for various reasons (getting close to deadline, lack of detail
in the proposal, etc), and since the proposal was written by me, I found
it difficult to argue very hard.

It would be useful to get feedback on whether this would help, or
whether this could be addressed in a way that would require less new
text.
 
| 3.4 Garshol proposal
| RDF2TM mapping. The extra information seems a bit over the top, one 
| has to do a lot of manual work annotating the RDF specifically for the

| purpose of mapping it to TM.

That's true. We could debate whether one annotation per property in the
RDF vocabulary is a "lot", but compared to the "modeling the model"
approaches which require no work it certainly is more work.

| This is unsatisfactory.

So far the alternative seems to be the "graph bloat" incurred by the
"modeling the model" approach. In theory the mapping information could
be intuited by introspecting the RDF to work out the semantics, but in
practice nobody knows of an effective way of doing this. 

I think you are entirely correct, however, that this is the biggest
problem with the proposal discussed in section 3.4, and in practice with
the RDF/TM interoperability enterprise as a whole. In fact, this is the
problem the Unibo proposal meant to address, as far as I've understood,
by falling back to "modeling the model" when mapping information is not
available.

(I think your comment regarding completeness above is related to this.
If humans annotate the vocabularies to tell us what the mapping is we
can be complete. If they don't we can be complete, but not natural. Or
so it seems right now, anyway.)

| In other words, the translation is human-assisted which may not be 
| practical in many cases.  This point is hinted at, but not stressed 
| enough, IMHO.

I agree.
 
| What is the import of the fact that none of the approaches discuss how

| to represent RDF containers and collections, language tags, XML and 
| typed literals?  Does this mean they are really hard? Does it matter?

I think it means that these features are the "corner cases" of the
model, the rarely-used and little-loved features of disputable utility
which all models seem to acquire somehow. (XML has entities and
notations and whatnot, topic maps variant names...)

I also think the authors of these proposals probably found it better to
focus on the heart of the problem (which was unsolved when they did
their work) and leave these corner cases to be mopped up later once the
main work was done.

And, no, I don't think these really are hard. XML and typed literals are
effectively solved by the latest TM standard revision, containers and
collections just require some TM vocabulary, and language tags map
directly to scope in topic maps.

The real difficulty is working out what

  (x, y, "z")

means in topic maps. Is it a name or a property? And once you've solved
that, what does

  (x, w, v)

mean in topic maps? (Occurrence? Association? Subject indicator?) We can
solve this if we get humans to annotate the properties, but if that is
too much work we've got a problem that's much harder than the "corner
cases". (And it may be too much work in some cases.)

The heart of the problem can actually be stated in two short
sentences:

 - when is an RDF literal a name?

 - when is an RDF resource an information resource?

If there were a defined way of answering these (without having to
annotate the RDF vocabularies) the problem would effectively be solved.
However, RDF doesn't do this, although IMHO there would be great
benefits to doing so, and solving RDF/TM interoperability would only be
one.

(The TM->RDF conversion challenge remains, but that is actually
easier.)

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >
Received on Tuesday, 1 March 2005 23:28:40 UTC