Re: URI note snapshot available from Booth, David (HP Software - Boston) on 2008-04-29 (public-semweb-lifesci@w3.org from April 2008)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Tue, 29 Apr 2008 03:53:11 +0000
To: Jonathan Rees <jar@creativecommons.org>
CC: "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>
Message-ID: <184112FE564ADF4F8F9C3FA01AE50009FCF22AD65F@G1W0486.americas.hpqcorp.net>

Hi Jonathan,

Comments on
http://www.w3.org/2001/sw/hcls/notes/uris/
version of 28 April 2008 11:57 -0400.

General thoughts:

- It's been a while since I read a draft, but this looks like great progress.

- Overall it feels heavy on the rationale and light on getting to the point of what to do. The rationale is helpful, but does bog things down a bit. Maybe a summary up front along the line of EricP's Quick Tips would help:
http://lists.w3.org/Archives/Public/public-semweb-lifesci/2008Apr/att-0075/QuickTips.html

- The discussion hardly mentions URIs. It's good that the problem of naming is pointed out as being broader than just with URI, I think a bit more emphasis specifically on URIs would help, because that's the purpose of the advice.

- I think it would be good to give more prominence to the idea that a URI definition should clearly indicate its change policy. This is really more fundamental than just saying "don't change the URI definition", since it would be okay to change a definition if it clearly indicates that it is unstable, released only for testing, and subject to change. Similarly, the FOAF documentation specifically states:
http://xmlns.com/foaf/spec/#sec-evolution
"we do not update the namespace URI as the vocabulary matures", so users can set expectations accordingly.

Specific comments, in document sequence:

1. I like the real naming examples in the intro, but I think it could be shortened. The explanation of the Linnaean system introduces more detail than needed to get the point across.

2. I think these sentences could be dropped without loss:
[[
Several innovations of technique come with URI-based naming: Systems of schemes and registries, network protocols such as the domain name system (DNS) and their associated oversight organizations, techniques for assigning globally unique names, and the behaviour of network-based protocols for communication keyed by URIs. However, successful use of these techniques for naming in science depends, as in the case of Linnaean system, on additional factors such as clear documentation and how well naming and documenting fit in to the practice of doing science.
]]

3. I don't see how the scanner example illustrates this point: "The second lesson is that how we name things matters."

4. I suggest rewording "the consequences of mistakes such as these will be more severe" as "it may be more difficult to diagnose such problems"

5. I think the section on "Capturing context using global names" can be substantially shortened.

6. When mentioning "URI owner" it would be good to reference:
http://www.w3.org/TR/webarch/#uri-ownership

7. This statement is problematic:
[[
Even if the URI owner has not made any clear statement about the URI's meaning, a community may still establish a meaning for a URI through use. As participant in such communities, it would be wise for a URI owner to respect that meaning, as contradictory statements would probably be ignored.
]]
This seems to imply that it is okay to squat on other people's URIs, and emphatically it is *not* okay to do so. I think it would be best to just delete this statement. The previous paragraph already admonishes against a URI owner changing the URI definition from its accepted usage.

8. These paragraphs are problematic too:
[[
A naming system that has an associated protocol relates to the protocols only in that the protocol provides what can be construed as a standard catalog or dictionary that aids in the understanding of the names. Regardless of whether or how the naming system exploits a technical apparatus such as the Web, meanings of names are not hostage to mistakes or technical or administrative failures, because the meaning of a name is infused in all communication that uses the name, and the name's documentation is only one such communication. This is easy to see in the case of the Linnaean system, which is universally understood to be based on primary literature, not catalogs. Only recently has it had comprehensive catalogs at all, and even these are considered secondary sources subject to verification. However, even a naming system such as GenBank [citation] that is very closely associated with a web-accessible source of primary documentation is ultimately based on what its names (accession numbers) are believed to mean, not on what the database says. If GenBank were to become corrupt or drop off the face of the earth, the community would scramble to create an alternative source for the retrieval of sequence information associated with the accession numbers, because so many scientific communications depend on the accession numbers to name the information that the records carry. As with any naming system, GenBank's technical infrastructure is a community trust, not an authority.

A naming system that has an associated protocol relates to the protocols only in that the protocol documentation and/or specific documentation received using the protocol help us understand what names mean. Regardless of whether or how the naming system exploits a technical apparatus such as the Web, meanings of names are not hostage to mistakes or technical or administrative failures, because meaning takes root in a different arena: from [flushed: a recognized initial communication and followed by] meaningful use in communication.
]]

First of all, the Linnaean system seems to illustrate the opposite of what you said it illustrates: the meaning of a Linnaean term is univerally understood to be based on the authority of its initial publication -- *not* on how the community uses the term in catalogs.

It seems like the main point you are trying to make is: a name definition, once published and adopted by the community, should be unchangeable. I think that point is good. The problem comes up when there is any suggestion that the community's usage of the term defines the term's meaning. That indeed is the way natural language works, and it is okay for humans, but it is not okay for the Semantic Web:

- Who defines "the community"? Different "communities" often come up with different definitions for the same term. With URIs this becomes URI collision
http://www.w3.org/TR/webarch/#URI-collision
and it is harmful, as you explain in your discussion of polysemy.

- Usage-based definitions can drift over time, again leading to URI collision (or polysemy). This is exactly what happens in natural language.

9. These look very good:
[[
1. [581 JAR:] Is available documentation about the use of the URI sufficiently clear and unambiguous to guide effective use?
2. Will [was: Does] the documentation remain faithful to the meaning of the URI? [was: over time]
3. Is documentation available when needed? [maybe: will it be available?]
4. Is documentation available to computational agents via a well-known protocol and in a form that is useful to them?
]]

However, the way #2 is phrased in a way that suggests that the meaning of the URI can be independent of its published definition. Also, I think the word "definition" is more to the point than "documentation", though documentation beyond the definition (such as usage examples) would be good to suggest also. You might consider rewording these to:
[[
1. Is the URI definition sufficiently clear and unambiguous to guide effective use? Usage examples can also help.
2. Will the URI definition remain unchanged?
3. Will the URI definition be available when needed?
4. Is the URI definition available to computational agents via a well-known protocol and in a form that is useful to them?
]]

10. In he section on "Polysemy (one name, many meanings)", you might say straight out: "Never use the same URI to denote both a Web page or Web site and something else."

11. Regarding these sentences:
[[
It is tempting to assume that a successful request for a document (one that elicits an HTTP "200 OK" response) tells us what the URI denotes - that it denotes the response to the request. However, making a similar assumption about the URI following a change to the document would result in a polysemy because then the URI would seem to denote two different responses.[jar check] If the HTTP responses vary over time, the URI, asumming no server error, denotes not a single unchanging document, but rather a draft series, the changing output of an instrument, a blog, the changing bylaws of an organization, or an otherwise evolving entity. Here, the publisher who wishes to avoid the risk of polysemy would make clear via documentation that the URI denotes a changing thing.
]]
I think you should be careful not to imply that the URI might legitimately denote the response itself, since that would be an extremely rare case, even for a Web page that never changes. Also, I think the best advice to publishers is that they should be clear about their change policy, rather than only stating their change policy if the the document may change or if the document will not change.

How about this wording instead:
[[
When a request for a document yields an HTTP "200 OK" response, it might be tempting to assume that the URI denotes a document only in its current state. However, making a similar assumption about the URI following a change to the document would result in a polysemy because then the URI would seem to denote a document with two different contents. In a situation like this the URI should instead be taken to denote not a single unchanging document, but a draft series, the changing output of an instrument, a blog, the changing bylaws of an organization, or an otherwise evolving entity. To avoid such misunderstandings, a document should clearly state its change policy.
]]

12. The section on Polysemy should mention that in the AWWW this is called "URI collision":
http://www.w3.org/TR/webarch/#URI-collision

13. The section on synonymy should mention that in the AWWW this is called "URI aliasing":
http://www.w3.org/TR/webarch/#uri-aliases

14. Word missing in this sentence? "URI schemes that lack protocol association, or that are explicit in making protocol association advisory instead of central, might be seen as preferable to those that do for the purpose of naming."

15. Garbled sentence or missing word: "The conclusion is that the meaning of the URI has changed, making the correct interpretation prior uses dependent on which meaning is intended."

16. Nice job on the "protocol association tradeoff" section.

17. Nice job on the "Insurance against technical failure" section also. Personally, I do think there is room for additional technical tools in this areas. For example, as I mentioned privately:
[[
Much of the consternation around persistance seems to be concern that a document might change at all though the community expects it to remain totally unchanged. In this case, I would think that trusted timestamping
http://en.wikipedia.org/wiki/Trusted_timestamping
in conjunction with distributed archiving mechanisms could be used to obtain a sufficient level of confidence that users could at least detect a change.
]]
This is a community service that purl.org or a similar might offer, but specific guidance on doing this in the context of URI minting still needs to be worked out.

18. The section on "What the standards community needs to do" is excellent. You might add something like:

- Techniques and guidance for achieving persistance, perhaps involving trusted timestamping.

David Booth, Ph.D.
HP Software
+1 617 629 8881 office | dbooth@hp.com
http://www.hp.com/go/software

Opinions expressed herein are those of the author and do not represent the official views of HP unless explicitly stated otherwise.

Received on Tuesday, 29 April 2008 03:54:44 UTC