Re: Review of RDFTM Survey from Natasha Noy on 2005-02-28 (public-swbp-wg@w3.org from February 2005)

From: Natasha Noy <noy@smi.stanford.edu>
Date: Mon, 28 Feb 2005 14:35:13 -0800
To: swbp <public-swbp-wg@w3.org>
Cc: Fabio Vitali <fabio@cs.unibo.it>, David Wood <dwood@tucanatech.com>, Steve Pepper <pepper@ontopia.net>, Michael F Uschold <michael.f.uschold@boeing.com>
Message-Id: <67df9e0c64d12a79ae84c83f3d4893bc@smi.stanford.edu>
My review below is fairly high-level since I didn't have time to go 
into the details of translation and test cases before the f2f. Also, I 
know very little about Topic Maps (part of the reason I volunteered to 
review was to learn more), hence I couldn't verify many of the details. 
So, with that caveat, here are my comments.

I was very impressed with the depth and comprehensiveness of the 
discussion. It seemed almost a waste to have this as a WG note, not a 
journal paper :) However, I have (quite a few) comments and concerns. 
FWIW....

First, I felt a bit odd with the fact that the document is too much of 
a literature review to be a W3C working note (at least to my gut 
feeling). Many passages in the document, are along the lines of "[XYZ 
2004] starts the paper by reviewing other approaches and then goes on 
to do x", "he doesn't present a diagram of this in the paper, but", 
etc. I almost had the feeling of "who cares?" It seemed to me that the 
note should focus more on the approaches themselves than on the papers 
that represent the approaches. It also seems a bit subjective and I 
would be interested to know what other reviewers think (I know that 
Mike Uschold is also reviewing this, in addition to me and David). This 
issue of being objective and non-judgmental has been raised before 
viz-a-vis other WG notes. Since  the WG documents are often scrutinized 
for objectivity and can be treated by some as endorsements, I would be 
a bit more careful about this.

I would have also liked a better indication of why the specific 
approaches that you reviewed were chosen. For instance, the first one, 
by Moore, seems rather immature -- why even include it at all? This is 
not a journal paper that should cover everything possible after all. 
There are several other approaches that you omit for being immature, 
why not this. Some more detailed criteria on why specific approaches 
were chosen for detailed discussion would have been helpful.

On the overall structure, I think it would have helped a lot if the 
document started with some discussion of what the goal of such RDF/TM 
integration/translation is. Then the discussion of pros and cons of 
different approaches would feel more grounded. This information is 
technically there, in the three bullet points in section 4.1 on the 
consequences for reduced interoperability. But until this point (i.e., 
through most of the document), it is unclear that what you are looking 
for is the ability of merging the data, vocabulary conformance, and the 
ability to write queries against the target model. I think this 
discussion should come at the very beginning of the document, and in 
some detail. Otherwise, the qualitative good/bad judgments that the 
document makes are somewhat puzzling since it is unclear what the goal 
is. Similarly, you give a lot of importance to the quantitative 
measure: how many RDF or TM statements you get. Without some goal of 
what you are trying to achieve, simply saying that "more is worse" 
seems somewhat unsubstantiated.

Another high-level point. I think the discussion would have been much 
easier to read (in particular to someone not very steeped in at least 
one of TM/RDF, such as myself) if the major issues section (2.2) was 
more detailed. Again, you come back to this discussion in the end, 
explaining what is difficult about these issues, but perhaps you could 
spend more time before moving on to specific approaches to explain what 
the difficulties are. This would basically mean moving a lot of text 
from section 4 to section 2, and rewording it slightly.

I really like the idea of test cases and this certainly makes the 
document very practical and useful. I would have liked to see more 
discussion though of why these particular test cases: presumably they 
should cover most of the major issues, no? Do they? Also, is there any 
reason why the test case for RDF2TM translation is different from the 
one for TM2RDF. Why not have both of them represent the Puccini opera? 
Then you can see the round-tripping (or lack of it) much more clearly.

Then, when discussing each of the approaches, I would simply focus on 
the gist of what they do (regardless of who they reference and which 
part of the paper they do this in), and, more specifically, how they 
address the major issues you identified in the beginning. I think 
putting the whole discussion at the end, gets the reader lost and, 
frankly, I forgot what the approaches were by the time I got to section 
4. At the same time, at the end of each subsection of section 3, I was 
left wondering on how does this particular approach addresses those 
difficult issues. Also, when reading descriptions of each of the 
approaches,  I was really lacking small snippets of TM/RDF code 
illustrating the examples, before the test cases. This is most probably 
largely due to my very limited familiarity with Topic Maps, but I think 
such snippets would have been useful.

Finally, can the WG notes refer to results from proprietary products 
(such as some of the solutions that are only hinted at in the papers 
and are implemented inside Ontopia products)?

A number of more specific points. Again, I haven't checked the 
intricacies of the translations themselves.

Before you use RDF2TM and TM2RDF for the first time, perhaps spell them 
out?

In section 1.2, I would remove the information on which other works 
each of the works references. It doesn't seem that this is really that 
salient an information to include in a 2-sentence description of the 
approach.

Your definition of completeness in section 2.1 seems to imply that an 
approach cannot be complete if it is only one-way, when you say that "A 
complete translation will by definition be reveresible". I don't think 
you mean the two-way requirement here (i.e., RDF2TM translation can be 
complete by itself), but perhaps you could make it more clear.

In section 3.3.1, the paragraph that starts with "Interestingly, what 
appears to be a very opaque RDF" seems very subjective. I would suggest 
removing it.

In section 3.4, you often mention what Garshol discusses, but this 
sounds a lot more like a review of the paper than of a particular 
approach. The fact that the authors raises some difficult issues in a 
paper, but his approach doesn't solve them is probably out of scope of 
a WG note. Similarly, examples of translations for n-ary associations 
that Gentilucci proposes and then rejects as clearly wrong, probably 
don't belong in this note either.

I would be careful about statements like the following in section 3.6: 
"superset of the most popular proposed semantic web metamodels (viz 
XML, RDF, and Topic Maps)". I doubt we want any document coming out ot 
this WG to refer to XML as a semantic web metamodel, do we? Another 
contentious passage:
"... is difficult because of RDFs "more primitive nature"".
I know the primitive is the quote, but again, do we want these 
statements in the WG document? In fact, the note has lots of things 
like this that make it sound more colloquial than it really should 
(references to "before breakfast" and as an "evening's exercise" are 
another example.) I would suggest tightening things up a b it.

A naive question: RDF has metamodeling capabilities: a class can be an 
instance of another class. Is a similar think available in Topic Maps? 
If not, perhaps it is another issue to consider?

Minor point: where available, it would have been helpful to have URLs 
for references, particularly for the ones that are solely web 
documents, such as various reports from Ontopia.

Natasha
Received on Monday, 28 February 2005 22:54:59 UTC