- From: Lars Marius Garshol <larsga@ontopia.net>
- Date: Tue, 01 Mar 2005 22:52:54 +0100
- To: <public-swbp-wg@w3.org>
I skip over most of your excellent comments for the simple reason that I agree with nearly all of them. I'm just going to try to deal with your questions here, and leave the discussion of where we take the document for later, since I don't want to talk for the TF as a whole. * Michael F. Uschold | | 2.1 Translation Features | Completeness: By the definition given, a complete translation need | not be reversible. One can translate from source to target and lose | no information one way, but lose information the other way. That's true. This was meant as a criterion on RDF->TM proposals and TM->RDF proposals in isolation from each other, so "reversible" here means "reversible in theory given the right conversion in the opposite direction". I guess this is much less clear than it could be, in part because this section was finished so late in the process. We should go back and make this clearer. | Perhaps you mean to say that complete means there is no loss either | way? That is only possible when the two formats are capable of | representing the exact same set of concepts, i.e. have same | expressive power. That depends how you define these things, I guess. It's entirely possible for RDF to express a "parenthood" relationship between three people despite the fact that RDF only has binary relationships. The main difference is really how much help each data model (or metamodel or model, or ...) gives you in expressing what you want, and whether when converting you can make the result of the conversion natural in the target model. | Yet RDF and TM both have some important differences. This means it | is inherently impossible to have complete translations. There WILL | be some loss in translation (or at least it seems that way to me). I guess this relates to your comment regarding the Garshol proposal below. I'll return to it there. | Fidelity: there are two different notions here, both are important, | and you are only addressing one. | | 1. Naturalness: which seems to be inversely related to the need for | workarounds. | 2. Accuracy: this is the most common meaning of 'fidelity' in my | experience | | Accuracy is an important criteria referring to the correctness of | the translation - you seem to be ignoring this. This is an interesting comment, given that the criterion was called "naturalness" for a while, and that some of the contributors found it difficult to stop calling it that. Maybe we should revert to that term, because that really is what is meant, and not accuracy/ correctness. We actually had a "correctness" criterion for a while, but dropped it in favour of just discussing correctness while explaining the proposals. English is not my native language, so I'm leaving the issue of which term is best for what concept for others. | 3.1 Moore Proposal | Overall, good description, I did not follow all the details. | | I had difficulty understanding the essential nature of and | difference between 'modeling the model' and 'mapping the model'. Is | the distinction one between syntax and semantics? You don't quite | say that, but you do say the one is essentially a semantic approach. That's an interesting comment, and I guess you are right that we should explain it better. Basically the difference is that if a topic map has an association of type 'parenthood' containing three roles (one of type 'father, one 'mother', and one 'child') then when "modeling the model" this will actually be expressed that way in RDF. That is, there will be one resource for the association of type 'association', with a property 'tm:type' for giving the type, and another property 'tm:roles' for the roles, and role resources for each of the roles, etc. When "mapping the model" one would instead have a 'parenthood' resource with three properties ('father', 'mother', and 'child') relating it to the resources involved (each presumably of type 'person'). I hope this explains the difference. | 3.3 The O...y Proposal | | Excellent observation about the impact of syntactic presentation, | comparing to "3rd RDF basic abbreviated form" I find it very interesting that in the two reviews we've had so far, one reviewer wants this statement removed, and the other thinks it is excellent. I hope we get an odd number of reviews in the end. :-) | I'm amazed that you say the translations are more or less complete. | How can that be, when there are so many things that RDF has that TMs | lack and vice versa (from section 2.2) how are containers handled? | If you throw them away when translating to TM, you can't get them | back, so this is incomplete. Ditto for the many other 'issues' in | section 2.2. There are ways around these issues, such as creating TM vocabulary for expressing containers in TMs. However, I'm not sure the statement that these proposals are complete is correct, so we should revisit that. | At some point, it will be necessary to analyze why there is such an | explosion of new statements (1 to 26 after a round-trip translation). This follows more or less directly from the "modelling the model" approach. The ternary association example above turns into four resources with lots of properties. Going back to topic maps each statement would most likely become a separate association in the "modelling the model" approach, and so the 1 association turned into 3 associations... | Is there reason to hope that new approaches could do better, and | still achieve semantically accurate translation? The "mapping the model" approaches all do better. In fact, they are quite close to full naturalness (naturality, fideliousness, ...). I would also question whether the "modelling the model" approaches actually do produce a semantically accurate translation. | It would be helpful to give an explanation with an example | illustrating why the extra statements are added. That's effectively what the test cases do, and I think the reason this is less than obvious is that they are rather big (and so require a lot of concentrated reading) and that the evaluated "modeling the model" proposals use a model of topic maps (PMTM4) that is very far removed from most people's understanding of topic maps. I tried to argue that Garshol02, which is a "modeling the model" approach based on TMDM (the example above is roughly similar to it), should be covered in order to avoid this problem. I didn't win that argument for various reasons (getting close to deadline, lack of detail in the proposal, etc), and since the proposal was written by me, I found it difficult to argue very hard. It would be useful to get feedback on whether this would help, or whether this could be addressed in a way that would require less new text. | 3.4 Garshol proposal | RDF2TM mapping. The extra information seems a bit over the top, one | has to do a lot of manual work annotating the RDF specifically for | the purpose of mapping it to TM. That's true. We could debate whether one annotation per property in the RDF vocabulary is a "lot", but compared to the "modeling the model" approaches which require no work it certainly is more work. | This is unsatisfactory. So far the alternative seems to be the "graph bloat" incurred by the "modeling the model" approach. In theory the mapping information could be intuited by introspecting the RDF to work out the semantics, but in practice nobody knows of an effective way of doing this. I think you are entirely correct, however, that this is the biggest problem with the proposal discussed in section 3.4, and in practice with the RDF/TM interoperability enterprise as a whole. In fact, this is the problem the Unibo proposal meant to address, as far as I've understood, by falling back to "modeling the model" when mapping information is not available. (I think your comment regarding completeness above is related to this. If humans annotate the vocabularies to tell us what the mapping is we can be complete. If they don't we can be complete, but not natural. Or so it seems right now, anyway.) | In other words, the translation is human-assisted which may not be | practical in many cases. This point is hinted at, but not stressed | enough, IMHO. I agree. | What is the import of the fact that none of the approaches discuss | how to represent RDF containers and collections, language tags, XML | and typed literals? Does this mean they are really hard? Does it | matter? I think it means that these features are the "corner cases" of the model, the rarely-used and little-loved features of disputable utility which all models seem to acquire somehow. (XML has entities and notations and whatnot, topic maps variant names...) I also think the authors of these proposals probably found it better to focus on the heart of the problem (which was unsolved when they did their work) and leave these corner cases to be mopped up later once the main work was done. And, no, I don't think these really are hard. XML and typed literals are effectively solved by the latest TM standard revision, containers and collections just require some TM vocabulary, and language tags map directly to scope in topic maps. The real difficulty is working out what (x, y, "z") means in topic maps. Is it a name or a property? And once you've solved that, what does (x, w, v) mean in topic maps? (Occurrence? Association? Subject indicator?) We can solve this if we get humans to annotate the properties, but if that is too much work we've got a problem that's much harder than the "corner cases". (And it may be too much work in some cases.) The heart of the problem can actually be stated in two short sentences: - when is an RDF literal a name? - when is an RDF resource an information resource? If there were a defined way of answering these (without having to annotate the RDF vocabularies) the problem would effectively be solved. However, RDF doesn't do this, although IMHO there would be great benefits to doing so, and solving RDF/TM interoperability would only be one. (The TM->RDF conversion challenge remains, but that is actually easier.) -- Lars Marius Garshol, Ontopian <URL: http://www.ontopia.net > GSM: +47 98 21 55 50 <URL: http://www.garshol.priv.no >
Received on Tuesday, 1 March 2005 21:57:37 UTC