[RDFTM] Comments on the guidelines from Lars Marius Garshol on 2006-04-24 (public-swbp-wg@w3.org from April 2006)

From: Lars Marius Garshol <larsga@ontopia.net>
Date: Mon, 24 Apr 2006 10:29:12 +0200
To: SWBPD list <public-swbp-wg@w3.org>
Message-Id: <908FCBB7-B8F7-4AD9-9FF1-DE40B9B80F35@ontopia.net>
This is my comments on http://www.ontopia.net/work/guidelines.html,  
version of March 20, 2006. The comments were written for the RDFTM  
editors rather than the list, so parts may be a bit obscure unless  
you know all the detail.


Generally it seems that we've reached the point where probably most  
of the conversion rules are the way they should be, and what remains  
now is to work out what kind of document we want to produce, firm up  
the rules, and find a formalism for expressing them.

I think we probably are quite close to the goal, provided we don't  
create too much work for ourselves.

--- What is this document?

I should start with noting that I find the title and abstract of the  
document misleading. This is not guidelines that authors can use for  
interoperability; this is a specification for a mapping between the  
two data models. I know it has been described as guidelines ever  
since the charter has been written, but that doesn't make this a  
guideline document. I'm not sure what to do about this, to be honest,  
and the problem recurs throughout.

Throughout I would much prefer to see "this document" rather than  
"these guidelines", as the only thing that resembles guidelines in  
the document is 5-6 bullet points in 6.2. (I'll get back to why the  
count is 5-6, and not 10.)

--- Tutorialism

The document goes to great lengths to include a lot of tutorial  
material (explaining the two models, explaining the conversion  
procedure, explaining the document itself, examples, ...). This  
greatly increases the amount of work the TF has to do, and the more  
of it we can cut, the better off we are, I think. We have tight  
deadlines and very limited resources, and so I think it's better if  
we make the document as minimal as possible. Explanations and  
tutorials can always be written after the fact as conference papers  
etc and reach a greater audience that way. We also reduce the risk of  
inconsistencies (at the moment there are lots of them, and with the  
current format it will be an enormous task to make sure there are  
none left when we finish).

IMHO it's especially important to cut all the examples. They  
represent a huge rewriting burden as we move forward, and if we write  
the conversion rules precisely then there is no need for the examples  
anyway.

--- The conversion rules

These are generally too vague. We really need *some* kind of  
formalism to clear this up, and make it easier to read. I think what  
we should do is come up with a simple convention for addressing into  
the TMDM, and switch over to a style like (this is the "More  
generally" rule from 3.3):

To convert a topic item t to a resource:

If t[subject locators] is not empty:
   choose an element r in t[subject locators] at random, and make it the
   resource. Add the following statement:
     (r, rdf:type, rdftm:InformationResource)

   For each l in t[subject locators] where l is not r add:
     (r, owl:sameAs, l)

   For each l in t[subject identifiers] add:
     (r, rdfm:subject-identifier, l)

And so on. This is pretty rough, and can definitely be improved on,  
but it should do the trick of being specific enough, without being so  
much of a formalism that it makes the document hard to read.

--- Untyped whatnots

Throughout the document there are lots of references to untyped  
names, occurrences, and associations. These can all be removed, since  
there are no such things in Topic Maps. This means that there's no  
need to cater for these issues in any way, since they quite simply do  
not occur. I've pointed this out several times before, so I'm a bit  
puzzled as to why references to untyped constructs remain.

--- 1.2

Para 1 talks about "authors" and "documents". I think we should lose  
both terms. Instances of Topic Maps and RDF are not necessarily  
documents, and the process by which they are created is rarely  
"authoring". I think we really mean to address owners of Topic Maps  
and/or RDF information.

Para 1 also says the authors are ensured "a high level of  
interoperability", which is very vague. I think we can say what we  
ensure them: the conversion of data between the two models.

Para 2 I think does not discuss the purpose of the document at all.  
It probably belongs in 1.1.

I've lots more to say about this section, but it's really all to do  
with what the document actually *is*. The details are probably not  
worth writing down before we've resolved the issue of what it is we  
are creating.

Para 5 probably shouldn't list the namespaces, nor describe them too  
much, as the result was pretty confusing for me. There should be a  
complete list of all defined/used properties in an appendix  
somewhere, and a reference to it here. (Producing the complete list  
is tricky; I'll return to that.)

It would probably be beneficial to split 1.2 into: "Goals" and  
"Approach", or something like it, where "Goals" describes what we're  
trying to do, and "Approach" covers how we do it.

--- 1.3

I think we should lose the "willingness" point in the first sentence;  
it goes without saying, and sounds very strange. The "authors" and  
"documents" thing recurs.

The "creators of tools" are really implementors of the nameless  
translation mechanism specified in this document, are they not?

The point about "people who seek assurance" I would delete. Yes, that  
is part of Ontopia's reason for participating in this work, but it's  
not really appropriate for the W3C to be saying this kind of thing.

The second para is better lost, IMHO.

--- 1.4

I think the prose repeat of the table of contents is better omitted.  
The remained could beneficially be turned into a "Notation" or  
"Conventions" section.

--- 1.5

I think this should be deleted. The namespace URIs can be given in a  
table in the new 1.4, and the acronyms are better expaneded on first  
use, anyway.

--- 2

This is so closely related to 1.2 that it might be better  
incorporated there.

--- 2.1

Point 1: It should be made clear that this is *after* conversion.

Point 5: Advice? Shouldn't that be guidelines?

Point 7: We violate this point.

--- 3

The title is misleading, and also in conflict with the first para.  
The first para is also misleading, I think, but in a different way. I  
suspect this is just incomplete editing.

The second para says "guidance consists ... in asserting  
properties ... to be ... or to have ...". This is 100% RDF-centric,  
but shouldn't be. It's probably better to generalize and be less  
specific, by saying something like "guidance consists in annotation  
of ontology terms using, for the most part, the rdftm vocabulary".

--- 3.1

I think we should cut most of this, maybe even all of it.

Nit: the TM2RDF list seems very incomplete compared with the RDF2TM one.

--- 3.2

I think we should cut this.

Nits: the first para oversimplifies.

--- 3.3

Again: I would cut most of this down to the big gray box.

Para 4 says the document "advocates" a specific solution, but in fact  
it specifies one (or should).

"The rules for translating identity are ...". I'm not sure this is  
the best way to describe this. These are actually the rules for  
converting between topics and x (where x is the missing term for "RDF  
nodes that are not literals"). I know this sounds horribly pedantic,  
but this is important, because we'll want to refer to these specific  
rules from pretty much everywhere in the rest of the document.

TM2RDF: "When a topic": I would just lose this. Instead, make it an  
error in the steps that produce statements to use typing topics that  
have no subject identifier. Much simpler, and the result is the same.

TM2RDF: The main rule can be simplified quite a lot. Note that it  
should be possible to retain item identifiers (as item identifiers).  
Otherwise TM2RDF translation will be once-only, and thus pretty much  
worthless in real life.

TM2RDF: The rules say "(e.g., through the type of the resource being  
made an subclass of this class)", but this doesn't work. It's  
entirely possible for the resource to be an instance of a *supertype*  
of rdftm:InformationResource, and we can't know whether this is the  
case or not. Also, we have to say something definite about what to do  
here. I suggest simply cutting this.

The note about "In the examples below" I can't make any sense. I'm  
not sure whose fault that is.

Example 5 is a bit odd. Are item identifiers discarded or not?

The RDF2TM part is not really very clear. Also, the first bullet  
point could be replaced with a statement in the OWL ontology for the  
RDFTM vocabulary that says that the rdfs:Property class and  
rdftm:InformationResource are mutually exclusive. (I forget the exact  
property used for this in OWL.) We need to discuss formalism issues  
to really settle that point.

--- What vocabulary do we use?

Each subsection of section 3 that has an RDF2TM/TM2RDF box contains a  
list of the vocabulary terms used in that section. I think that's way  
too repetitious, and that we should instead do this for the entire  
guidance vocabulary in a single list. We're just going to mess up  
otherwise, and n separate lists of terms add nothing useful, anyway.

However, there is a deeper issue here, since some of the translation  
rules depend on the types of resources in RDF. The question is: how  
do you judge whether X is an instance of Y or not *when doing a  
conversion*? According to the RDF semantics X is an instance of Y in  
the following model:

   (X, _foo, Z)
   (_foo, rdfs:range, Y)

If you use OWL this can be about as complicated as you wish. We need  
to come up with a story on this point.

--- 3.4

Let's not repeat built-in guidance here. Let's list all the built-in  
guidance somewhere, and be done with it.

TM2RDF: Untyped names don't exist.

--- 3.4.1

Para 1 is tutorialism.

TM2RDF: This should be made more precise, and easier to understand.  
Also, do we really want the rdftm:variant-scope property? I think I  
know why it's there, but it's really hard to make sure.

RDF2TM: Here it seems to be rdftm:scope, which is inconsistent with  
example 19.

--- 3.5

Para 1 is tutorialism, and untyped occurrences don't exist (para 3 +  
TM2RDF box).

TM2RDF: "The value of the occurrence...": This oversimplifies a bit.  
This would be much easier if written more formally, as I suggest  
above. It would then run as:

An occurrence item o is converted into (with implicit "topic-to-X- 
conversion"):

   (o[parent], o[type], o[value]) /* well, not quite */
   (o[type], rdf:type, rdftm:OccurrenceProperty)

--- 3.6

All of this (up to 3.6.1) is tutorialism.

--- 3.6.1

Everything up to the TM2RDF box is tutorialism. The reference to  
untyped associations should go, both before the box and inside it.

TM2RDF: I've noted that this is incomplete, but can't remember what I  
was referring to. We should rewrite this with the formalism, anyway,  
so it's not that important.

Example 24: The two lines of guidance are very confusing. What are they?

RDF2TM: The point about "inverse statements" is not necessary, since  
duplicates in TMDM are not possible, anyway.

--- 3.6.1.1 and 3.6.1.2

We should be able to lose these to sections completely, and instead  
replace them with built-in guidance.

--- 3.6.2

If we purge the document of tutorialism we can merge this with 3.6.1.  
As it is, this is just enormously much more voluminous than what is  
really needed. For this reason I haven't reviewed it properly; it's  
just obviously too much.

--- 3.6.3

The same applies here.

--- 3.7

Para 1 is tutorialism. Para 2 is tutorialism; replace with a  
constraint on RDF2TM name conversion.

TM2RDF: This is fine for us, but needs to be formalized for when we  
go real. Likewise for RDF2TM.

--- 3.8 and 3.9

The overall approach seems fine, as far as I'm able to follow it.  
However,
these should be simplified, and then worked into the name,  
occurrence, and association sections. As the document stands now it's  
much harder for the reader to see how this fits together. This of  
course implies that it's much harder for us to make sure that it  
actually does fit together, too. The relationship between scope and  
reification isn't really described anywhere now, and that definitely  
needs to be taken care of.

--- 3.10

I don't think we should have this section at all. The document is  
already way too long.

--- 4

This doesn't really distinguish between providing guidance for  
conversion and guidelines for how to structure your information so  
that you won't run into trouble when you want to convert it. This  
really needs to be reconsidered, I think.

--- 5

We should make an OWL ontology for the rdftm vocabulary. That could  
serve as both definition of the terms and a complete enumeration of  
them, as well as the built-in guidance. Point 4 in 5.3 is very odd,  
and needs to be looked at more closely.

--- 6.2

Point 1: The first point is just an error; should we list it?

Points 3-5: The three untyped points should go.

Point 6: Reified roles work in n-aries.

Point 7: Reified TMs might be made to work.

Point 8: Topics that have no identifiers cannot occur, so that point  
can go.

Point 9: This is true, but not really an "unsupported construct".

Point 10: This is again just an error.

--- 8

LTM is now in version 1.3.

TMDM should be referenced as ISO 13250-2, and editors should not be  
included when referencing ISO standards.

--
Lars Marius Garshol, Ontopian               http://www.ontopia.net
+47 98 21 55 50                             http://www.garshol.priv.no
Received on Monday, 24 April 2006 08:29:38 UTC