RE: COMURI comments from Manuel.CARRASCO-BENITEZ@ec.europa.eu on 2014-10-13 (public-dwbp-wg@w3.org from October 2014)

From: <Manuel.CARRASCO-BENITEZ@ec.europa.eu>
Date: Mon, 13 Oct 2014 16:35:43 +0000
To: <laufer@globo.com>
CC: <public-dwbp-wg@w3.org>
Message-ID: <39DB516E46C0E842A2CFFF1BBB7412F15F855833@S-DC-ESTF03-B.net1.cec.eu.int>
Laufer, 

Find below my comments and thanks for yours.

Regards
Tomas


* Following the standards
COMURI follows the existing standards and practices: it does not redefine anything. COMURI should be regarded as subset of URI and allowed in RFC3986; similar in spirit to the relation of SGML and XML.

In particular, COMURI always refers to resources, never to documents.


* Meaning of variants
The meaning of the variants "version", "format" and "language" for the resource "laufer" is the same as if one requests these variants using content negotiation: COMURI does not redefines anything; it just allows the direct identification of variants without the recourse to content negotiation, that it is always available anyhow. For example, and end-user could type the following in a browser to obtain the Portuguese variant without installing any add-on in the browser to manipulate the header fields:

   http://example.com/laufer.pt  

Using dot extensions is a current practice: it has been in Apache for over 15 years: again, nothing new, just recommended the current practices. 


* URI dereference
Similar to the "Meaning of variants": COMURI does not redefines anything.

"URI resolution is the process of determining an access mechanism and the appropriate parameters necessary to dereference a URI; this resolution may require several iterations.  To use that access mechanism to perform an action on the URI's resource is to dereference the URI."
    https://tools.ietf.org/html/rfc3986#section-1.2.2



* Opaque URIs
URIs should be opaque: but no need to be pitch-black, even if machine can deal with pitch-blackness :-)

Mnemonic is just for "... assisting or intended to assist memory". Mnemonics are language and cultural dependent. For multilingual audience, mnemonics should be as language neutral as possible. Numbers are arguably the most language neutral characters: "3" should be acceptable to Arabic speakers, so some people might insist on "٣". Base 36 (a-z 0-9) should be a good compromise to a wide number of people around the world. One can always use IRI with a richer character repertoire; just pay the price:

For example, http://example.com/北京，中�� :
  - Whole world: bad
  - China: good

URI guessing is along the same lines: a kind of dynamic mnemonics :-) 

I will change the example of http://example.com/mon to a more language neutral, though it was on purpose to use an English based mnemonic: the point is that basic English mnemonics should be considered acceptable. In  http://example.com/mon there was no intention to "... defines or embodies the identity of what is referenced ...".


* Direct identification of metadata
The intention is to have a URI mechanism to obtain the resource metadata:

  http://example.com/foo             # resource
  http://example.com/foo?           # resource metadata - "?" is an empty query
  http://example.com/foo#           # resource metadata - "#" is an empty fragment - alternative

Currently, the above three URIs will return the same: the resource. Both "?" and "#" could be used as a mechanism to obtain the resource metadata. The rationale for proposing "?" is because it looked more appropriate as per RFC3989:

"The query component contains non-hierarchical data that, along with data in the path component (Section 3.3), serves to identify a resource within the scope of the URI's scheme and naming authority (if any)."
  https://tools.ietf.org/html/rfc3986#section-3.4


"The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary  resource and additional identifying information."
  https://tools.ietf.org/html/rfc3986#section-3.5



* Eternal URIs
Good. But this is in policy domain. We have to take into account that resources are archived.


* URL
Just restating the obvious:

"... Uniform Resource Locator (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism ..."
   https://tools.ietf.org/html/rfc3986#section-1.1.3


<endtomas />



From: Laufer [mailto:laufer@globo.com] 
Sent: Friday, October 10, 2014 5:20 PM
To: CARRASCO BENITEZ Manuel (DGT); DWBP WG
Subject: COMURI comments

Hi, Tomas,
I am sending my comments.
Thank you for your work. The COMURI document is very well written and I agree with short URIs. What I do not agree is to recommend a unique scheme to do that.

The discussion about URIs began with the Web. In this very first document, Universal Resource Identifiers -- Axioms of Web Architecture, there is already an assertion about opacity: "A very important axiom of the Web is that in general the only thing you can use an identifier for is to refer to an object. When you are not dereferencing, you should not look at the contents of the URI string to gain other information.".
I think the most important thing when defining a scheme to name URIs is to have a way of defining names that should last (Cool URIs don't change). Each admin could define a scheme that she thinks could help to manage the assigning of URI's names. But I, personally, do not agree with guessing as the motivation.
In the COMURI document, the first thing that is said is
1.1 Rationale
The intention is to have compact URIs easy for the users (humans and machines); URI patterns should be intuitive to facilitate URI guessing.
I think that intuition and guessing is difficult to control and depends on culture. Different cultures, chance of different guessing.

Let´s see the example of the document:
2.2 New terms
Some are news terms and some are just rewriting of existing terms. URI guessing
    From a pattern, it should be easy to guess other URIs. For example, if http://example.com/mon gives the Monday weather, http://example.com/tue should give the Tuesday weather. 

Well, this is a guess for people who speak English. What about someone that wants the document in Portuguese for segunda-feira?
If we use the recommendation of the COMURI document about suffixes for different languages we should have "http://example.com/mon.pt".
How to guess the document for terça-feira. If it is a person that do not speak English? How to guess "http://example.com/tue.pt"?
I am pointing to that to show how easy is to define a guessing scheme that will not work.
I also think that the COMURI document talk about URIs that identify documents and not all kinds of resources. If, for example, the URI is an identification of the resource "laufer", what is the meaning of a version, format or language for this resource. In the DWBP WG we are dealing with a lot of different types of metadata that will not be identified by a suffix in the URI. The idea of suffixes applies only to resources that are documents (a mix of URL with URI).
As Phil pointed in the call, the discussion about dereferencing resources has some proposals:  Cool URIs for the Semantic Web. I think that proposing a way of obtaining metadata with an empty query should be a separate discussion and should take into account all the other approaches and issues.
I repeat that I agree that short URIs could be a good practice. But URIs are only identifiers and I think that is more safe to use searching mechanisms, using metadata about the resources, to find what you want.
Best Regards,
Laufer

-- 
.  .  .  .. .  . 
.        .   . ..
.     ..       .
Received on Monday, 13 October 2014 16:36:15 UTC