Re: [RDFTM] Review of Survey Document

I skip over most of your excellent comments for the simple reason that
I agree with nearly all of them. I'm just going to try to deal with
your questions here, and leave the discussion of where we take the
document for later, since I don't want to talk for the TF as a whole.

* Michael F. Uschold
| 
| 2.1 Translation Features
| Completeness: By the definition given, a complete translation need
| not be reversible. One can translate from source to target and lose
| no information one way, but lose information the other way.

That's true. This was meant as a criterion on RDF->TM proposals and
TM->RDF proposals in isolation from each other, so "reversible" here
means "reversible in theory given the right conversion in the opposite
direction".

I guess this is much less clear than it could be, in part because this
section was finished so late in the process. We should go back and
make this clearer.

| Perhaps you mean to say that complete means there is no loss either
| way?  That is only possible when the two formats are capable of
| representing the exact same set of concepts, i.e. have same
| expressive power.

That depends how you define these things, I guess. It's entirely
possible for RDF to express a "parenthood" relationship between three
people despite the fact that RDF only has binary relationships. The
main difference is really how much help each data model (or metamodel
or model, or ...) gives you in expressing what you want, and whether
when converting you can make the result of the conversion natural in
the target model.

| Yet RDF and TM both have some important differences. This means it
| is inherently impossible to have complete translations. There WILL
| be some loss in translation (or at least it seems that way to me).

I guess this relates to your comment regarding the Garshol proposal
below. I'll return to it there.
 
| Fidelity: there are two different notions here, both are important,
| and you are only addressing one.
|
| 1. Naturalness: which seems to be inversely related to the need for
| workarounds.
| 2. Accuracy: this is the most common meaning of 'fidelity' in my
| experience
| 
| Accuracy is an important criteria referring to the correctness of
| the translation - you seem to be ignoring this.

This is an interesting comment, given that the criterion was called
"naturalness" for a while, and that some of the contributors found it
difficult to stop calling it that. Maybe we should revert to that
term, because that really is what is meant, and not accuracy/
correctness.

We actually had a "correctness" criterion for a while, but dropped it
in favour of just discussing correctness while explaining the
proposals.

English is not my native language, so I'm leaving the issue of which
term is best for what concept for others.
  
| 3.1 Moore Proposal
| Overall, good description, I did not follow all the details.
| 
| I had difficulty understanding the essential nature of and
| difference between 'modeling the model' and 'mapping the model'.  Is
| the distinction one between syntax and semantics? You don't quite
| say that, but you do say the one is essentially a semantic approach.

That's an interesting comment, and I guess you are right that we
should explain it better. 

Basically the difference is that if a topic map has an association of
type 'parenthood' containing three roles (one of type 'father, one
'mother', and one 'child') then when "modeling the model" this will
actually be expressed that way in RDF.  That is, there will be one
resource for the association of type 'association', with a property
'tm:type' for giving the type, and another property 'tm:roles' for the
roles, and role resources for each of the roles, etc.

When "mapping the model" one would instead have a 'parenthood'
resource with three properties ('father', 'mother', and 'child')
relating it to the resources involved (each presumably of type
'person').

I hope this explains the difference. 
 
| 3.3 The O...y Proposal
| 
| Excellent observation about the impact of syntactic presentation,
| comparing to "3rd RDF basic abbreviated form"

I find it very interesting that in the two reviews we've had so far,
one reviewer wants this statement removed, and the other thinks it is
excellent. I hope we get an odd number of reviews in the end. :-)
 
| I'm amazed that you say the translations are more or less complete.
| How can that be, when there are so many things that RDF has that TMs
| lack and vice versa (from section 2.2) how are containers handled?
| If you throw them away when translating to TM, you can't get them
| back, so this is incomplete. Ditto for the many other 'issues' in
| section 2.2.

There are ways around these issues, such as creating TM vocabulary for
expressing containers in TMs. However, I'm not sure the statement that
these proposals are complete is correct, so we should revisit that.
 
| At some point, it will be necessary to analyze why there is such an
| explosion of new statements (1 to 26 after a round-trip translation).

This follows more or less directly from the "modelling the model"
approach. The ternary association example above turns into four
resources with lots of properties. Going back to topic maps each
statement would most likely become a separate association in the
"modelling the model" approach, and so the 1 association turned into 3
associations...

| Is there reason to hope that new approaches could do better, and
| still achieve semantically accurate translation?

The "mapping the model" approaches all do better. In fact, they are
quite close to full naturalness (naturality, fideliousness, ...). I
would also question whether the "modelling the model" approaches
actually do produce a semantically accurate translation.
 
| It would be helpful to give an explanation with an example
| illustrating why the extra statements are added.

That's effectively what the test cases do, and I think the reason this
is less than obvious is that they are rather big (and so require a lot
of concentrated reading) and that the evaluated "modeling the model"
proposals use a model of topic maps (PMTM4) that is very far removed
from most people's understanding of topic maps.

I tried to argue that Garshol02, which is a "modeling the model"
approach based on TMDM (the example above is roughly similar to it),
should be covered in order to avoid this problem. I didn't win that
argument for various reasons (getting close to deadline, lack of
detail in the proposal, etc), and since the proposal was written by
me, I found it difficult to argue very hard.

It would be useful to get feedback on whether this would help, or
whether this could be addressed in a way that would require less new
text.
 
| 3.4 Garshol proposal
| RDF2TM mapping. The extra information seems a bit over the top, one
| has to do a lot of manual work annotating the RDF specifically for
| the purpose of mapping it to TM. 

That's true. We could debate whether one annotation per property in
the RDF vocabulary is a "lot", but compared to the "modeling the
model" approaches which require no work it certainly is more work.

| This is unsatisfactory.

So far the alternative seems to be the "graph bloat" incurred by the
"modeling the model" approach. In theory the mapping information could
be intuited by introspecting the RDF to work out the semantics, but in
practice nobody knows of an effective way of doing this. 

I think you are entirely correct, however, that this is the biggest
problem with the proposal discussed in section 3.4, and in
practice with the RDF/TM interoperability enterprise as a whole. In
fact, this is the problem the Unibo proposal meant to address, as far
as I've understood, by falling back to "modeling the model" when
mapping information is not available.

(I think your comment regarding completeness above is related to
this. If humans annotate the vocabularies to tell us what the mapping
is we can be complete. If they don't we can be complete, but not
natural. Or so it seems right now, anyway.)

| In other words, the translation is human-assisted which may not be
| practical in many cases.  This point is hinted at, but not stressed
| enough, IMHO.

I agree.
 
| What is the import of the fact that none of the approaches discuss
| how to represent RDF containers and collections, language tags, XML
| and typed literals?  Does this mean they are really hard? Does it
| matter?

I think it means that these features are the "corner cases" of the
model, the rarely-used and little-loved features of disputable utility
which all models seem to acquire somehow. (XML has entities and
notations and whatnot, topic maps variant names...)

I also think the authors of these proposals probably found it better
to focus on the heart of the problem (which was unsolved when they did
their work) and leave these corner cases to be mopped up later once
the main work was done.

And, no, I don't think these really are hard. XML and typed literals
are effectively solved by the latest TM standard revision, containers
and collections just require some TM vocabulary, and language tags map
directly to scope in topic maps.

The real difficulty is working out what

  (x, y, "z")

means in topic maps. Is it a name or a property? And once you've
solved that, what does

  (x, w, v)

mean in topic maps? (Occurrence? Association? Subject indicator?) We
can solve this if we get humans to annotate the properties, but if
that is too much work we've got a problem that's much harder than the
"corner cases". (And it may be too much work in some cases.)

The heart of the problem can actually be stated in two short
sentences:

 - when is an RDF literal a name?

 - when is an RDF resource an information resource?

If there were a defined way of answering these (without having to
annotate the RDF vocabularies) the problem would effectively be
solved. However, RDF doesn't do this, although IMHO there would be
great benefits to doing so, and solving RDF/TM interoperability would
only be one.

(The TM->RDF conversion challenge remains, but that is actually
easier.)

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >

Received on Tuesday, 1 March 2005 21:57:37 UTC