[RDFTM] Comments on draft survey

I've just read through the latest version of the survey; my comments
are below. Generally, I think the document is too long. The
discussions of the five proposals would probably benefit from cutting
some of the author rationales and stating the conversion rules more
systematically and briefly (if possible), and they should also be more
complete.


--- 1.2 Overview of proposals

In the descriptions of the five proposals a substantial percentage of
the verbiage is devoted to which conference each was presented at.
This doesn't really seem like it is adding anything. The dates may be
useful, but I suggest removing mentions of the actual conferences.
(This information should be in the bibliography anyway.)

The section on /Garshol/ is a bit clunky. Maybe best to move the
discussion of [Garshol02] down together with the others which are not
examined in detail? (The text about it can be kept as it is.)

/Garshol/ is actually implemented twice: once in OKS and once in
tmapi-utils[1]. (The latter implementation is not complete yet, but is
likely to be so before this survey is finished.)

Kaminski's thesis should definitely be mentioned here.


--- 2.1 Basic features ...

I spent a lot of time writing a reasoned critique of this list of
criteria[2].  Was this ever received? I have received no response to
it, and the most important suggestion in it does not seem to have been
followed. (For the record: all of the comments still seem to apply.)


--- 3.1.1

The description of the author's reasoning is very clear, but having
read the section I am really not much wiser as to how the proposal
works. This really seems to be the wrong way around; surely Moore's
proposal is more interesting than the reasoning behind it? Further,
the RDF->TM and TM->RDF proposals are not clearly distinguished from
one another (see [2]), which makes the confusion worse.

I realize that the ultimate source of confusion here may be Moore, but
I still think this section would benefit from focusing much more on
the proposed conversions. I also think distinguishing what must
effectively be two independent proposals from one another as advocated
in [2].


--- 3.1.2

  Reversibility: Neither approach is reversible. In the case of the
  "modelling" approach, the assumption is that one is working in one
  domain or the other, but not in both. In the case of the "mapping"
  approach, the fact that a statement maps to a single association
  whereas an association maps to two statements shows that
  translations cannot be reversed.

Are there not really four approaches? Two modelling approaches (one
for each direction) and two mapping approaches? To judge reversibility
probably each should be examined individually. (Or, alternatively, we
could follow Moore's deprecation of the modelling approaches and
consider only the mapping approaches.)

That a statement maps to a single associations would be reversible if
/Garshol/ or /UniBo/ were used to do the reverse mapping. Ie: it *is*
reversible.

That an association maps to two statements would actually also be
reversible with /Garshol/ (and perhaps also /UniBo/). (The two
statements would create two associations, which would merge into one.)

However, since constructs not discussed would necessarily be lost the
conversion as a whole would not be reversible.


The discussion of the fidelity of the mapping approach seems to me
further proof that the argument in [2] is correct. Given that I would
advocate a new structure for 3.1 as follows:

  3.1 Moore
  3.1.1 General discussion (very similar to current 3.1.1)
  3.1.2 RDF->TM proposal
  3.1.2.1 Description (mostly lifted from 3.1.1, but extended if possible)
  3.1.2.2 Analysis (half of 3.1.2, essentially)
  3.1.2.3 Test case
  3.1.3 TM->RDF proposal
  3.1.3.1 Description (mostly lifted from 3.1.1, but extended if possible)
  3.1.3.2 Analysis (half of 3.1.2, essentially)
  3.1.3.3 Test case

Some of the sections could perhaps be merged, but I broke them up for
clarity here. Note that this is likely to be much less work than it
would seem, given that the only new text required is more description
of the two conversions, and that would IMHO be required anyway.


--- 3.1.3.2

The last paragraph here seems like it was left over from the previous
version during editing.


--- 3.2.1

The term "bijective" used by Lacher & Decker may actually be quite
useful for us in our discussion of reversibility. See this Wikipedia
article:

  <URL: http://en.wikipedia.org/wiki/Bijection >

I think what we want are injective proposals, but not necessarily
bijective ones. Moore's proposals (by section 3.1.3) clearly are not
injective. (This is obvious as all TMs containing two topics and one
association between them would map to the same RDF model, regardless
of any differences in topic names and occurrences.)

In PMTM4 strings are not part of the model, either, if I remember
correctly. 


--- 3.2.2

  Reversibility: The transformation is theoretically reversible but
  this is of academic interest since the proposal only covers one
  direction.

I may be sounding like a broken record by now, but it really is
interesting to us if the proposal is reversible. Had it had fidelity
reversibility and completeness would have been the next things to
consider.

Regarding correctness I don't quite know what to say. It does seem to
model PMTM4 correctly, but I would say that PMTM4 does not model topic
maps correctly, and so it would be difficult to consider this proposal
entirely correct, without this necessarily being the fault of the
authors.

  Fidelity: ... an information content ...

Maybe just "information" would be better?


--- 3.3.1

Does /Ogievetsky/ really require the use of XSLT, or is it just that
the proposal is implemented in XSLT? (This is in part answered later,
but it seems strange to state it in this way here.)

Does /Ogievetsky/ use the same PMTM4 version as /Stanford/?

In the composed-by example (which I agree is only readable in RDF/XML
form :-) it's not clear to me why the rtm:member node is there. The
prose above mentions the existence of this node in passing, but does
not say anything about the rationale for it, either. To me it seems
like pure graph bloat, but presumably there was a reason for it?

  This translates to a topic map consisting of six TAOs (five topics
  and one association), which in turn translates back to RDF as a set
  of no less than [@@fixme] RDF statements. "Obviously we accumulated
  a lot of semantic luggage during our roundtrip" is Ogievetsky's
  laconic comment.

*LOL*

   In addition, a brief comparison is made with a tolog-like query
   language.

The language in question looks an awful lot like early versions of
RDQL. Could that be what it is?


--- 3.3.2

Same comment on reversibility here as elsewhere.

/Ogievetsky/ seems to me for the most part correct, except that in
associations different properties are used for topics depending on
their identity. Maybe this isn't incorrect in a strict sense, but it
certainly seems highly questionable, since the association should be
the same in either case. (Neither TMDM nor PMTM4 would have different
association structures in this case, and XTM makes it clear that the
two cases are equivalent.)


--- 3.4.1

The term "subject address" is used in places, instead of the correct
"subject locator".

I don't think the TM->RDF proposal is described in sufficient depth
here. Given the lack of usable proposals for this direction, that
seems a serious shortcoming. (The UniBo proposal in this direction is
discussed in great detail, and I think the discussion in Garshol03a
would be valuable for comparison.)


--- 3.4.2

Probably the analysis should be extended somewhat to make it clearer
what, precisely, the failings of the proposal are. (This does not seem
to be as important in 3.1-3.3, where the failings are more obvious.)


--- 3.5.1

This mixes the RDF->TM and TM->RDF conversions together, making the
discussion rather difficult to follow. It would be easier if they were
separated. The RDF->TM conversion also seems to have received less
attention than it should.

  The default behaviour in the Unibo proposal is to equate subject
  addresses with resource URIs and to represent subject identifiers
  using the RDFS property isDefinedBy. Topics that have no subject
  address are translated to blank nodes whose ID is generated from the
  topic's base name.

This seems problematic, since the property in an RDF statement is
required to have a URI.

  The Unibo proposal is alone is assuming a fundamental equivalence of
  semantics between base names and the rdfs:label

Specific mappings: this section seems the most important, but is too
brief for me to be able to understand it.


--- 3.5.2

Regarding reversibility:

   The proposal permits a high degree of reversibility, but the result
   of a round-trip may not be the same as the starting point.

I would claim that that fails the test of reversibility.

   For example, using the generic mappings, most RDF statements would
   be converted to typed associations with untyped roles [...]

This seems a classic example of a failure in reversibility, since the
information about which topic was the subject and which the object is
lost. 


Regarding correctness: I have earlier pointed out errors in the
mapping which are not mentioned here. Nor do the things I considered
errors seem to be mentioned in 3.5.1, for what reason I don't know.
I've also pointed out another problem above, and I could list more if
desired.


--- 4

What is the purpose of this section? It seems to be about to turn into
a sixth proposal rather than a survey. Maybe it should revert to being
a survey?


--- 4.1

  Semantic mappings have much higher fidelity but suffer from the
  disadvantage of tending to be less complete and requiring additional
  information that is not normally present in the source document.

I would say that the disadvantage of semantic mappings is that making
them complete is harder. (The point about additional information still
applies, of course.)


--- 4.2

I think a set of requirements, based on the evaluation criteria, would
be useful. Or even, a set of requirements, used to evaluate the
proposals.


--- 4.2.1

The problem with rdfs:isDefinedBy has been pointed out earlier.

Otherwise it mostly seems OK, but inserting rdf:type statements when
going TM->RDF seems questionable.

Regarding owl:sameAs and RDF->TM: why not just merge? (Actually, there
are more OWL properties with the same semantics.)


--- 4.2.2

  The semantic equivalence between topic names and the rdfs:label
  property is fairly obvious.

They are not equivalent. rdfs:label implies a topic name, but the
inverse does not hold.

TM->RDF: I prefer Garshol03a.

RDF->TM: What about properties that have literal values, yet are *not*
         subproperties of rdfs:label? Very few vocabularies bother
         spelling this out, and the same goes for instance data.

         The proposed approach is semantically more "correct" than
         Garshol03, but I think Garshol03 is more likely to actually
         give the correct results.

The draft says:

  In a semantic mapping there are two approaches that can be taken to
  handling variant names: reification and complex objects.

True, but faced with this choice I think many users will opt for the
third alternative: ignore variants altogether. Reification and complex
objects are both painful and heavy-weight. However, RDF
implementations with optimized support for reification will handle the
former more gracefully. I think handling this is likely to involve
significant pain, no matter what we do.

Anyway, I don't think recommendations belong in the survey.


[1] <URL: http://tmapi-utils.sourceforge.net/ >
[2] <URL: http://lists.w3.org/Archives/Public/public-swbp-wg/2005Feb/0089.html >

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >

Received on Friday, 18 February 2005 08:34:40 UTC