RE: [ALL] Review requested for "Cool URIs"


find some review comments from my side below (enclosed in the "pythonic" comment triple quotes).

Review of the document "Cool URIs for the Semantic Web" [1],
elaborated on behalf of the W3C SWD WG 

General remarks

The general structure of the document is good. It also provides reasonable solutions for many practical scenarios in the scope of (URI) naming strategies on the Semantic Web. The presented running examples facilitate easy and enjoyable reading of the document.

However, there is one rather conceptual issue, also related to relevant realistic scenarios, which is not covered by the document at all. Maybe the solution of the following problem is implicitly "entailed" by the document, but if it is indeed, it is not very clear from the presented strategies how to actually deal with such situations. 

Consider for instance Gene Ontology (GO in the following text), which is being monthly updated in many formats, RDF/XML being one of them [2]. According to the requirement number 1 of the document under review, every resource identifiable by a URI should be on the web. But it is not very clear how to actually publish all resources GO contains following any of the strategies presented in the document. As stated in the Conclusion frame in Section 4.3, the hash URI strategy is not appropriate for GO, since its RDF/XML serialisation has currently about 30MB. It is definitely neither "rather small", nor "stable" (changes may occur every month). So, should we use the 303 URI approach? Possibly yes, but this would mean that we should establish and regularly maintain tens of thousands of different URIs for every resource represented in GO, each of this URIs representing generally very small piece of information (since the GO descriptions are usually rather shallow). It is questionable whether such approach is either reasonable or practical... Maybe we could combine both approaches, which is one of the possibilities mentioned in Section 4.3 as well. However, it is not very clear from the document, how we should actually combine the two approaches in order to deal with situations similar to this "GO problem" optimally. 

In conclusion, there may be two possible solutions - either allow and discuss "off-line" exceptions from the requirement number 1 (be on the web), or propose a reasonable way of combining the two approaches and give respective examples to cover the aforementioned problem. The former seems to be not very systematic and would also possibly require to change the "attitude" of the whole document substantially, so the latter seems to be more appropriate.

Specific remarks

- The first paragraph of Section 1 could be a little more descriptive when mentioning the challenge of "...distributed modelling of the world with a shared data model...". If the audience of the document is (partially) not supposed to be absolutely familiar with the Semantic Web principles, the meaning of this could be explained in a little more detail.

Minor remarks

- The note about public archive in the fifth paragraph of the "Status of this document" section is perhaps not needed - the reader who is familiar with the W3C mailinglists will know that, others will be notified anyway when sending an e-mail there.

- Maybe it would be better to change the font in the example statements presented in the second paragraph of Section 1 - for instance put "subjects" and "objects" in bold face and "predicates" in italic. This could improve the readability of this piece in line with the intended impact.

- The remark on the possible MediaWiki adoption for Wikipedia is perhaps not completely appropriate in the end of Section 5, until the software would have been actually adopted.

- The following comment about the last note in Section 6.2 about personal URIs is rather "philosophical". Even though the note comes out from a strongly backed opinion;), it is really questionable how to actually establish such personal URIs in an ideal, practical and systematic way. Imagine creating URI of John Smith in a company XYZ - we can use company's dedicated namespace to distinguish among this John Smith and other John Smiths around the world. But what if more John Smiths are in a company? We can use perhaps a namespace or URI prefix according to the departments these guys work in. But what if they work in the same department? So maybe we can use a time-stamp or a respective slashed namespace inferred from the date when they joined the company. Etc... Such situations may of course rarely occur in practice for persons, but we should have a recommended way how to solve them, since similar problems may be encountered also with names of products, services and so on. Thus, the document should either propose a recommended way of how to deal with such problems, or be a little more "careful" and avoid such potentially questionable strict statements.

Language remarks

- Though the overall language quality of the document is quite high, it would be good to do some spell-checking and language clean-up by a native speaker. Just to mention few things noticed at the first sight even by a reviewing non-native speaker - the fourth paragraph of Section 1 is inconsistent in the tense used (from the sentence "In the remainder of this paper..." on - future and present tense are mixed); the third paragraph of Section 6.1 contains a typo ("resourcse").




I'm not sure if we are supposed to make explicit distinction between obligatory and recommendation character of particular comments. But this can surely be discussed in the upcoming telco, anyway.


Vit Novacek

Semantic Information Systems and Language Engineering Group (SmILE)
Digital Enterprise Research Institute (DERI)
National University of Ireland, Galway
Tel: +353 91 495738

Received on Sunday, 23 September 2007 13:42:55 UTC