- From: Lars Marius Garshol <larsga@ontopia.net>
- Date: Mon, 24 Apr 2006 10:29:12 +0200
- To: SWBPD list <public-swbp-wg@w3.org>
This is my comments on http://www.ontopia.net/work/guidelines.html,
version of March 20, 2006. The comments were written for the RDFTM
editors rather than the list, so parts may be a bit obscure unless
you know all the detail.
Generally it seems that we've reached the point where probably most
of the conversion rules are the way they should be, and what remains
now is to work out what kind of document we want to produce, firm up
the rules, and find a formalism for expressing them.
I think we probably are quite close to the goal, provided we don't
create too much work for ourselves.
--- What is this document?
I should start with noting that I find the title and abstract of the
document misleading. This is not guidelines that authors can use for
interoperability; this is a specification for a mapping between the
two data models. I know it has been described as guidelines ever
since the charter has been written, but that doesn't make this a
guideline document. I'm not sure what to do about this, to be honest,
and the problem recurs throughout.
Throughout I would much prefer to see "this document" rather than
"these guidelines", as the only thing that resembles guidelines in
the document is 5-6 bullet points in 6.2. (I'll get back to why the
count is 5-6, and not 10.)
--- Tutorialism
The document goes to great lengths to include a lot of tutorial
material (explaining the two models, explaining the conversion
procedure, explaining the document itself, examples, ...). This
greatly increases the amount of work the TF has to do, and the more
of it we can cut, the better off we are, I think. We have tight
deadlines and very limited resources, and so I think it's better if
we make the document as minimal as possible. Explanations and
tutorials can always be written after the fact as conference papers
etc and reach a greater audience that way. We also reduce the risk of
inconsistencies (at the moment there are lots of them, and with the
current format it will be an enormous task to make sure there are
none left when we finish).
IMHO it's especially important to cut all the examples. They
represent a huge rewriting burden as we move forward, and if we write
the conversion rules precisely then there is no need for the examples
anyway.
--- The conversion rules
These are generally too vague. We really need *some* kind of
formalism to clear this up, and make it easier to read. I think what
we should do is come up with a simple convention for addressing into
the TMDM, and switch over to a style like (this is the "More
generally" rule from 3.3):
To convert a topic item t to a resource:
If t[subject locators] is not empty:
choose an element r in t[subject locators] at random, and make it the
resource. Add the following statement:
(r, rdf:type, rdftm:InformationResource)
For each l in t[subject locators] where l is not r add:
(r, owl:sameAs, l)
For each l in t[subject identifiers] add:
(r, rdfm:subject-identifier, l)
And so on. This is pretty rough, and can definitely be improved on,
but it should do the trick of being specific enough, without being so
much of a formalism that it makes the document hard to read.
--- Untyped whatnots
Throughout the document there are lots of references to untyped
names, occurrences, and associations. These can all be removed, since
there are no such things in Topic Maps. This means that there's no
need to cater for these issues in any way, since they quite simply do
not occur. I've pointed this out several times before, so I'm a bit
puzzled as to why references to untyped constructs remain.
--- 1.2
Para 1 talks about "authors" and "documents". I think we should lose
both terms. Instances of Topic Maps and RDF are not necessarily
documents, and the process by which they are created is rarely
"authoring". I think we really mean to address owners of Topic Maps
and/or RDF information.
Para 1 also says the authors are ensured "a high level of
interoperability", which is very vague. I think we can say what we
ensure them: the conversion of data between the two models.
Para 2 I think does not discuss the purpose of the document at all.
It probably belongs in 1.1.
I've lots more to say about this section, but it's really all to do
with what the document actually *is*. The details are probably not
worth writing down before we've resolved the issue of what it is we
are creating.
Para 5 probably shouldn't list the namespaces, nor describe them too
much, as the result was pretty confusing for me. There should be a
complete list of all defined/used properties in an appendix
somewhere, and a reference to it here. (Producing the complete list
is tricky; I'll return to that.)
It would probably be beneficial to split 1.2 into: "Goals" and
"Approach", or something like it, where "Goals" describes what we're
trying to do, and "Approach" covers how we do it.
--- 1.3
I think we should lose the "willingness" point in the first sentence;
it goes without saying, and sounds very strange. The "authors" and
"documents" thing recurs.
The "creators of tools" are really implementors of the nameless
translation mechanism specified in this document, are they not?
The point about "people who seek assurance" I would delete. Yes, that
is part of Ontopia's reason for participating in this work, but it's
not really appropriate for the W3C to be saying this kind of thing.
The second para is better lost, IMHO.
--- 1.4
I think the prose repeat of the table of contents is better omitted.
The remained could beneficially be turned into a "Notation" or
"Conventions" section.
--- 1.5
I think this should be deleted. The namespace URIs can be given in a
table in the new 1.4, and the acronyms are better expaneded on first
use, anyway.
--- 2
This is so closely related to 1.2 that it might be better
incorporated there.
--- 2.1
Point 1: It should be made clear that this is *after* conversion.
Point 5: Advice? Shouldn't that be guidelines?
Point 7: We violate this point.
--- 3
The title is misleading, and also in conflict with the first para.
The first para is also misleading, I think, but in a different way. I
suspect this is just incomplete editing.
The second para says "guidance consists ... in asserting
properties ... to be ... or to have ...". This is 100% RDF-centric,
but shouldn't be. It's probably better to generalize and be less
specific, by saying something like "guidance consists in annotation
of ontology terms using, for the most part, the rdftm vocabulary".
--- 3.1
I think we should cut most of this, maybe even all of it.
Nit: the TM2RDF list seems very incomplete compared with the RDF2TM one.
--- 3.2
I think we should cut this.
Nits: the first para oversimplifies.
--- 3.3
Again: I would cut most of this down to the big gray box.
Para 4 says the document "advocates" a specific solution, but in fact
it specifies one (or should).
"The rules for translating identity are ...". I'm not sure this is
the best way to describe this. These are actually the rules for
converting between topics and x (where x is the missing term for "RDF
nodes that are not literals"). I know this sounds horribly pedantic,
but this is important, because we'll want to refer to these specific
rules from pretty much everywhere in the rest of the document.
TM2RDF: "When a topic": I would just lose this. Instead, make it an
error in the steps that produce statements to use typing topics that
have no subject identifier. Much simpler, and the result is the same.
TM2RDF: The main rule can be simplified quite a lot. Note that it
should be possible to retain item identifiers (as item identifiers).
Otherwise TM2RDF translation will be once-only, and thus pretty much
worthless in real life.
TM2RDF: The rules say "(e.g., through the type of the resource being
made an subclass of this class)", but this doesn't work. It's
entirely possible for the resource to be an instance of a *supertype*
of rdftm:InformationResource, and we can't know whether this is the
case or not. Also, we have to say something definite about what to do
here. I suggest simply cutting this.
The note about "In the examples below" I can't make any sense. I'm
not sure whose fault that is.
Example 5 is a bit odd. Are item identifiers discarded or not?
The RDF2TM part is not really very clear. Also, the first bullet
point could be replaced with a statement in the OWL ontology for the
RDFTM vocabulary that says that the rdfs:Property class and
rdftm:InformationResource are mutually exclusive. (I forget the exact
property used for this in OWL.) We need to discuss formalism issues
to really settle that point.
--- What vocabulary do we use?
Each subsection of section 3 that has an RDF2TM/TM2RDF box contains a
list of the vocabulary terms used in that section. I think that's way
too repetitious, and that we should instead do this for the entire
guidance vocabulary in a single list. We're just going to mess up
otherwise, and n separate lists of terms add nothing useful, anyway.
However, there is a deeper issue here, since some of the translation
rules depend on the types of resources in RDF. The question is: how
do you judge whether X is an instance of Y or not *when doing a
conversion*? According to the RDF semantics X is an instance of Y in
the following model:
(X, _foo, Z)
(_foo, rdfs:range, Y)
If you use OWL this can be about as complicated as you wish. We need
to come up with a story on this point.
--- 3.4
Let's not repeat built-in guidance here. Let's list all the built-in
guidance somewhere, and be done with it.
TM2RDF: Untyped names don't exist.
--- 3.4.1
Para 1 is tutorialism.
TM2RDF: This should be made more precise, and easier to understand.
Also, do we really want the rdftm:variant-scope property? I think I
know why it's there, but it's really hard to make sure.
RDF2TM: Here it seems to be rdftm:scope, which is inconsistent with
example 19.
--- 3.5
Para 1 is tutorialism, and untyped occurrences don't exist (para 3 +
TM2RDF box).
TM2RDF: "The value of the occurrence...": This oversimplifies a bit.
This would be much easier if written more formally, as I suggest
above. It would then run as:
An occurrence item o is converted into (with implicit "topic-to-X-
conversion"):
(o[parent], o[type], o[value]) /* well, not quite */
(o[type], rdf:type, rdftm:OccurrenceProperty)
--- 3.6
All of this (up to 3.6.1) is tutorialism.
--- 3.6.1
Everything up to the TM2RDF box is tutorialism. The reference to
untyped associations should go, both before the box and inside it.
TM2RDF: I've noted that this is incomplete, but can't remember what I
was referring to. We should rewrite this with the formalism, anyway,
so it's not that important.
Example 24: The two lines of guidance are very confusing. What are they?
RDF2TM: The point about "inverse statements" is not necessary, since
duplicates in TMDM are not possible, anyway.
--- 3.6.1.1 and 3.6.1.2
We should be able to lose these to sections completely, and instead
replace them with built-in guidance.
--- 3.6.2
If we purge the document of tutorialism we can merge this with 3.6.1.
As it is, this is just enormously much more voluminous than what is
really needed. For this reason I haven't reviewed it properly; it's
just obviously too much.
--- 3.6.3
The same applies here.
--- 3.7
Para 1 is tutorialism. Para 2 is tutorialism; replace with a
constraint on RDF2TM name conversion.
TM2RDF: This is fine for us, but needs to be formalized for when we
go real. Likewise for RDF2TM.
--- 3.8 and 3.9
The overall approach seems fine, as far as I'm able to follow it.
However,
these should be simplified, and then worked into the name,
occurrence, and association sections. As the document stands now it's
much harder for the reader to see how this fits together. This of
course implies that it's much harder for us to make sure that it
actually does fit together, too. The relationship between scope and
reification isn't really described anywhere now, and that definitely
needs to be taken care of.
--- 3.10
I don't think we should have this section at all. The document is
already way too long.
--- 4
This doesn't really distinguish between providing guidance for
conversion and guidelines for how to structure your information so
that you won't run into trouble when you want to convert it. This
really needs to be reconsidered, I think.
--- 5
We should make an OWL ontology for the rdftm vocabulary. That could
serve as both definition of the terms and a complete enumeration of
them, as well as the built-in guidance. Point 4 in 5.3 is very odd,
and needs to be looked at more closely.
--- 6.2
Point 1: The first point is just an error; should we list it?
Points 3-5: The three untyped points should go.
Point 6: Reified roles work in n-aries.
Point 7: Reified TMs might be made to work.
Point 8: Topics that have no identifiers cannot occur, so that point
can go.
Point 9: This is true, but not really an "unsupported construct".
Point 10: This is again just an error.
--- 8
LTM is now in version 1.3.
TMDM should be referenced as ISO 13250-2, and editors should not be
included when referencing ISO standards.
--
Lars Marius Garshol, Ontopian http://www.ontopia.net
+47 98 21 55 50 http://www.garshol.priv.no
Received on Monday, 24 April 2006 08:29:38 UTC