- From: Uschold, Michael F <michael.f.uschold@boeing.com>
- Date: Tue, 1 Mar 2005 12:16:27 -0800
- To: <public-swbp-wg@w3.org>
GENERAL COMMENTS: My first impression is that this is a very well thought out and structured document. The problem is approached in a good systematic manner. I'm glad to see specific criteria by which all approaches are judged. There is a LOT of content, an impressive amount of work had been done to lay the foundation for coming up with an eventual RDF/TM proposal. I am broadly familiar with TMs, but not with the gory details. After reading it through, I realize there are some things that could be improved, in terms of the document structure, though most of the content is fine. When publishing a survey, there is always a question of whether to arrange the content according to specific issues (as you do in section 4) or by published approaches or system (as you do in section 2). You have done both, which has pros and cons. The major con of the current draft (IMHO) is that the good analysis is so far removed from the initial requirements. Also, the major existing systems are not directly assessed against the major requirements (what you are calling 'issues') until the end of the document. After listing the issues in section 2, I thought they would form an excellent structuring device for the rest of the paper. The issues are in many ways, requirements, and thus can be regarded as highly specific criteria for assessing the suitability of each approach. You do this in section 4, but it would be better to be weaved through the discussion of the approaches. It would be good to have a big summary table with all the issues and approaches and a tick or some indicator of how well a given approach works for a given issue/requirement. Of course, it is much easier for me to say this, that it is for the authors to do a major re-org of the document. Given that the purpose of this document is ultimately to lay the groundwork for coming up with a proposal for RDF/TM interoperability, it is not actually necessary to have specific detailed sections on each of the existing approaches. It would suffice to have brief descriptions of each. The details for each approach could be introduced in the analysis section which considers each major requirement, one by one (which is currently done in section 4). Then the main document would consist of: 1. Introduction, (as now). 2. Requirements, Issues and Evaluation Criteria (keep as is, but add something about what end users require from translated representations (e.g. querying?, data translation? semantic integration?) 3. Existing Translation Approaches (much shorter than current section) 4. Analysis: an elaboration of section 4 which pulls in much of the details currently in section 3, but is always focused on particular requirements, issues and/or criteria. 5. Conclusion - needs spruced up, it is too brief and says little. It might be good to summarize next steps here. The broad criteria of naturalness and completeness are also important and worth keeping. It would be good to state the assumptions/expectations of the reader. What are they assumed to know? Not that many people will be very familiar with BOTH TMs and RDF. So pointers to simple tutorial material on each would be appropriate. In addition, it might be helpful to have a 1-2 page introduction to each in an appendix. SPECIFIC COMMENTS: 1. Introduction 1.1 Background Very good introduction Grammatical quibble: "Topic Maps is a model" has apparent number disagreement. Can fix by saying something like: "Topic Maps provide a model" or "at the heart of TMs is a model". This issue arises in several places. 1.2 good 2.1 Translation Features Completeness: By the definition given, a complete translation need not be reversible. One can translate from source to target and lose no information one way, but lose information the other way. Perhaps you mean to say that complete means there is no loss either way? That is only possible when the two formats are capable of representing the exact same set of concepts, i.e. have same expressive power. Yet RDF and TM both have some important differences. This means it is inherently impossible to have complete translations. There WILL be some loss in translation (or at least it seems that way to me). Fidelity: there are two different notions here, both are important, and you are only addressing one. 1. Naturalness: which seems to be inversely related to the need for workarounds. 2. Accuracy: this is the most common meaning of 'fidelity' in my experience Accuracy is an important criteria referring to the correctness of the translation - you seem to be ignoring this. These criteria are not the same. You can have a perfectly accurate translation that is not very natural. If you really just mean naturalness, then perhaps use that word, not 'fidelity' which suggests accuracy. More importantly, what is the practical import of 'unnatural' translations. If they are intended for human consumption, then they will be much harder to read. If not, then what does it matter? If the information is correctly translated, and queries are correctly answered, who cares if the translation is 'unnatural'? Might it have impact on query response times? What other consequences of being unnatural are there? By itself, it may not pose any real problems. You address these questions much later on in the document, it needs to be moved forward. 2.2 Major Issues It seemed surprising to talk about issues first. I would have expected you to present first the requirements. Some may be easy to meet. The ones that are hard to meet are the 'issues'. Perhaps by 'issues' you really do mean requirements. If so, perhaps that could be made explicit? To some extent, this is just a minor terminology point, you are calling 'requirements' 'issues'. But there is more. The requirements fall into 3 categories. 1. general: naturalness, completeness 2. specific language translation capabilities 3. end user requirements: what will the translators be used for? querying? data translation? semantic integration? You talk about this much later in the document, it needs to be brought forward into this section. 2.2.1 TM issues In what sense is identity a TM issue? It seems to be an issue for translating from TM to RDF. One can argue that this is an 'issue' for RDF because RDF cannot do these things. It is also an 'issue' for people interested in using RDF when they need to translate from TMs. Indeed, most of the "TM Issues" talk about are things that RDF cannot do, and vice versa. You might change the section headings to reflect this, since 'issue' is a bit ambiguous. e.g. what you call "TM/RDF Issues" might be called: "Issues in translating from TM/RDF to RDF/TM" Better still, call them requirements? 3.1 Moore Proposal Overall, good description, I did not follow all the details. I had difficulty understanding the essential nature of and difference between 'modeling the model' and 'mapping the model'. Is the distinction one between syntax and semantics? You don't quite say that, but you do say the one is essentially a semantic approach. I would like to see this distinction explained better. It is used as a basis of comparison for all approaches, so it is important. For me, the terms were unhelpful, I could not adequately relate the meaning to the words in the term. It would be fine to think of other terms that worked better -- no need to be tied to terms from old papers, if they are not helpful. On the other hand, if this is just me, and if most readers are likely to find the terms helpful in understanding their meaning, then they are fine. Later on, you do use different terms. I think the document would read better if you got the terminology straight in the beginning and used it consistently throughout. 3.2 Stanford Proposal Good description, good attention to detail. I did not follow all the details. 3.3 The O...y Proposal Excellent observation about the impact of syntactic presentation, comparing to "3rd RDF basic abbreviated form" I'm amazed that you say the translations are more or less complete. How can that be, when there are so many things that RDF has that TMs lack and vice versa (from section 2.2) how are containers handled? If you throw them away when translating to TM, you can't get them back, so this is incomplete. Ditto for the many other 'issues' in section 2.2. I expected that in the completeness assessment for each approach, there would be much discussion of these issues; indeed that would be a major part of the discussion on the adequacy of the different approaches. Indeed, one good way to structure the whole document would be to start with the requirements for a good translation, note which are easy, and which are challenging, then for each challenging one, note some possible ways to approach them. You kind of do this in section 4. I think it would be better to move much of the content of section 4 to the front of the document. At some point, it will be necessary to analyze why there is such an explosion of new statements (1 to 26 after a round-trip translation). Is there reason to hope that new approaches could do better, and still achieve semantically accurate translation? It would be helpful to give an explanation with an example illustrating why the extra statements are added. 3.4 Garshol proposal In many ways, this approach seems the most advanced. He give more attention and clear thinking to his approach than for the others. RDF2TM mapping. The extra information seems a bit over the top, one has to do a lot of manual work annotating the RDF specifically for the purpose of mapping it to TM. This is unsatisfactory. In other words, the translation is human-assisted which may not be practical in many cases. This point is hinted at, but not stressed enough, IMHO. 3.5 Unibo proposal "is alone is" should be "is alone in" 4. Analysis Overall, good section, good analysis. My only problem is that this stuff seems like it should have come much sooner, as per my prior comment. What is the import of the fact that none of the approaches discuss how to represent RDF containers and collections, language tags, XML and typed literals? Does this mean they are really hard? Does it matter?
Received on Tuesday, 1 March 2005 20:17:01 UTC