[RDFTM] Review of Survey Document from Uschold, Michael F on 2005-03-01 (public-swbp-wg@w3.org from March 2005)

From: Uschold, Michael F <michael.f.uschold@boeing.com>
Date: Tue, 1 Mar 2005 12:16:27 -0800
To: <public-swbp-wg@w3.org>
Message-ID: <823043AB1B52784D97754D186877B6CF0583C2D7@xch-nw-12.nw.nos.boeing.com>
GENERAL COMMENTS: 
My first impression is that this is a very well thought out and
structured document. The problem is approached in a good systematic
manner. I'm glad to see specific criteria by which all approaches are
judged. There is a LOT of content, an impressive amount of work had been
done to lay the foundation for coming up with an eventual RDF/TM
proposal.

I am broadly familiar with TMs, but not with the gory details.

After reading it through, I realize there are some things that could be
improved, in terms of the document structure, though most of the content
is fine.  When publishing a survey, there is always a question of
whether to arrange the content according to specific issues (as you do
in section 4) or by published approaches or system (as you do in section
2).  You have done both, which has pros and cons. The major con of the
current draft (IMHO)  is that the good analysis is so far removed from
the initial requirements. Also, the major existing systems are not
directly assessed against the major requirements (what you are calling
'issues') until the end of the document.

After listing the issues in section 2, I thought they would form an
excellent structuring device for the rest of the paper. The issues are
in many ways, requirements, and thus can be regarded as highly specific
criteria for assessing the suitability of each approach. You do this in
section 4, but it would be better to be weaved through the discussion of
the approaches. It would be good to have a big summary table with all
the issues and approaches and a tick or some indicator of how well a
given approach works for a given issue/requirement.

Of course, it is much easier for me to say this, that it is for the
authors to do a major re-org of the document.  Given that the purpose of
this document is ultimately to lay the groundwork for coming up with a
proposal for RDF/TM interoperability, it is not actually necessary to
have specific detailed sections on each of the existing approaches.  It
would suffice to have brief descriptions of each.  The details for each
approach could be introduced in the analysis section which considers
each major requirement, one by one (which is currently done in section
4).  Then the main document would consist of:
	1.	Introduction, (as now).
	2.	Requirements, Issues and Evaluation Criteria (keep as
is, but add something about what end users require from translated
representations (e.g. querying?, data translation? semantic
integration?)
	3.	Existing Translation Approaches (much shorter than
current section)
	4.	Analysis: an elaboration of section 4 which pulls in
much of the details currently in section 3, but is always focused on
particular requirements, issues and/or criteria.
	5.	Conclusion - needs spruced up, it is too brief and says
little. It might be good to summarize next steps here.

The broad criteria of naturalness and completeness are also important
and worth keeping.

It would be good to state the assumptions/expectations of the reader.
What are they assumed to know? Not that many people will be very
familiar with BOTH TMs and RDF. So pointers to simple tutorial material
on each would be appropriate. In addition, it might be helpful to have a
1-2 page introduction to each in an appendix.

SPECIFIC COMMENTS: 


1. Introduction


1.1	Background
Very good introduction

Grammatical quibble: "Topic Maps is a model" has apparent number
disagreement. Can fix by saying something like: "Topic Maps provide a
model"  or "at the heart of TMs is a model". This issue arises in
several places.

1.2	good

2.1 Translation Features
Completeness:  By the definition given, a complete translation need not
be reversible. One can translate from  source to target and lose no
information one way, but lose information the other way. Perhaps you
mean to say that complete means there is no loss either way?  That is
only possible when the two formats are capable of representing the exact
same set of concepts, i.e. have same expressive power. Yet RDF and TM
both have some important differences. This means it is inherently
impossible to have complete translations. There WILL be some loss in
translation (or at least it seems that way to me).

Fidelity: there are two different notions here, both are important, and
you are only addressing one. 
	1.	Naturalness: which seems to be inversely related to the
need for workarounds.
	2.	Accuracy: this is the most common meaning of 'fidelity'
in my experience 

Accuracy is an important criteria referring to the correctness of the
translation - you seem to be ignoring this.
 
These criteria are not the same. You can have a perfectly accurate
translation that is not very natural. If you really just mean
naturalness, then perhaps use that word, not 'fidelity' which suggests
accuracy.

More importantly, what is the practical import of 'unnatural'
translations. If they are intended for human consumption, then they will
be much harder to read. If not, then what does it matter? If the
information is correctly translated, and queries are correctly answered,
who cares if the translation is 'unnatural'?  Might it have impact on
query response times? What other consequences of being unnatural are
there? By itself, it may not pose any real problems.

You address these questions much later on in the document, it needs to
be moved forward.

2.2 Major Issues
It seemed surprising to talk about issues first. I would have expected
you to present first the requirements. Some may be easy to meet. The
ones that are hard to meet are the 'issues'. Perhaps by 'issues' you
really do mean requirements.  If so, perhaps that could be made
explicit?

To some extent, this is just a minor terminology point, you are calling
'requirements' 'issues'. But there is more. The requirements fall into 3
categories.
1. general: naturalness, completeness
2. specific language translation capabilities
3. end user requirements: what will the translators be used for?
querying? data translation? semantic integration?  You talk about this
much later in the document, it needs to be brought forward into this
section.

2.2.1 TM issues
In what sense is identity a TM issue? It seems to be an issue for
translating from TM to RDF. One can argue that this is an 'issue' for
RDF because RDF cannot do these things. It is also an 'issue' for people
interested in using RDF when they need to translate from TMs.

Indeed, most of the "TM Issues" talk about are things that RDF cannot
do, and vice versa.
You might change the section headings to reflect this, since 'issue' is
a bit ambiguous.
e.g. what you call "TM/RDF Issues" might be called: "Issues in
translating from TM/RDF to RDF/TM" Better still, call them requirements?


3.1 Moore Proposal
Overall, good description, I did not follow all the details.

I had difficulty understanding the essential nature of and difference
between 'modeling the model' and 'mapping the model'.  Is the
distinction one between syntax and semantics? You don't quite say that,
but you do say the one is essentially a semantic approach. 

I would like to see this distinction explained better. It is used as a
basis of comparison for all approaches, so it is important.  For me, the
terms were unhelpful, I could not adequately relate the meaning to the
words in the term.  It would be fine to think of other terms that worked
better  -- no need to be tied to terms from old papers, if they are not
helpful.  On the other hand, if this is just me, and if most readers are
likely to find the terms helpful in understanding their meaning, then
they are fine. 

Later on, you do use different terms.  I think the document would read
better if you got the terminology straight in the beginning and used it
consistently throughout.

3.2 Stanford Proposal
Good description, good attention to detail. I did not follow all the
details.

3.3 The O...y Proposal

Excellent observation about the impact of syntactic presentation,
comparing to "3rd RDF basic abbreviated form"

I'm amazed that you say the translations are more or less complete. How
can that be, when there are so many things that RDF has that TMs lack
and vice versa (from section 2.2) how are containers handled? If you
throw them away when translating to TM, you can't get them back, so this
is incomplete. Ditto for the many other 'issues' in section 2.2.

I expected that in the completeness assessment for each approach, there
would be much discussion of these issues; indeed that would be a major
part of the discussion on the adequacy of the different approaches.
Indeed, one good way to structure the whole document would be to start
with the requirements for a good translation, note which are easy, and
which are challenging, then for each challenging one, note some possible
ways to approach them. You kind of do this in section 4. I think it
would be better to move much of the content of section 4 to the front of
the document.  

At some point, it will be necessary to analyze why there is such an
explosion of new statements (1 to 26 after a round-trip translation). Is
there reason to hope that new approaches could do better, and still
achieve semantically accurate translation?

It would be helpful to give an explanation with an example illustrating
why the extra statements are added. 

3.4 Garshol proposal
In many ways,  this approach seems the most advanced. He give more
attention and clear thinking to his approach than for the others.

RDF2TM mapping. The extra information seems a bit over the top, one has
to do a lot of manual work annotating the RDF specifically for the
purpose of mapping it to TM. This is unsatisfactory. In other words, the
translation is human-assisted which may not be practical in many cases.
This point is hinted at, but not stressed enough, IMHO.

3.5 Unibo proposal
"is alone is" should be "is alone in"

4. Analysis
Overall, good section, good analysis.

My only problem is that this stuff seems like it should have come much
sooner, as per my prior comment.  

What is the import of the fact that none of the approaches discuss how
to represent RDF containers and collections, language tags, XML and
typed literals?  Does this mean they are really hard? Does it matter?
Received on Tuesday, 1 March 2005 20:17:01 UTC