RE: COMURI

Dear Sumit,

Thanks for your very valid comments. See below.

Regards
Tomas

-----Original Message-----
From: Purohit, Sumit [mailto:Sumit.Purohit@pnnl.gov] 
Sent: Friday, September 26, 2014 10:37 AM
To: CARRASCO BENITEZ Manuel (DGT); christophe.gueret@dans.knaw.nl
Cc: public-dwbp-wg@w3.org
Subject: RE: COMURI

Hi Tomas,

Thanks for your efforts in preparing this draft. I must say it is an interesting read.
I have few more comments:

 1. "URI guessing should work" : A good URI design should inherently support "guessing" whether it is compact URI or not, but when we ask people to use "common sense" to design COMURI, achieving this goal seems challenging to me. Do you think this may leads to poor interoperability of concepts/resources. Should we also recommend use of standard abbreviations ? such as :

https://www.itap.purdue.edu/ea/data/standards/abbrev%20A_B.cfm
http://www.cas.org/content/cas-standard-abbreviations#listinga

#v
A very valid comment and it will be included. In our context is harder because we have 24 official languages
  http://publications.europa.eu/code/en/en-370200.htm

Saying use "common sense" is an admission that one does not know how to explain it better :-)
#^

2. Default variant of a resource: in the absence  language.format.version variant should owner/domain explicitly declare its default values in some machine readable/hybrid version ? 

#v
The defaults or lack of it is out of scope: it is  up to the system administrator. For example, in the USA the default might be English; in other places one might not declare one default language: the server must respond with the list of available languages (variant list) in a language neutral fashion and let the user choose the language. Something like
  http://dragoman.org/metadata.html
#^

3. "Capability to identify big and small data – whole database and record" : It is a really innovative goal and it will be interesting to see how we incorporate it into URI. 'usually'  the default extensions ( which this draft propose to avoid) are good indicator of its size . I expect a .zip to be larger then a .html resource. At the same time are you thinking of having this information as part of URI itself or as its metadata.? IN either case some of the options i see are as query parameter "?sizeorder=K|M|G|T|" or having a reserved place in URI. I would love to discuss this further.

#v
This needs refining, starting with the terminology as it should be referred as "granularity". It is important as our groups is about data. I am making some test to see how it can be incorporated into a "common sense" URI :-) 
#^

4. "Language neutral URI" : I am not sure about this capability adds value to COMURI. In the draft is it mentioned as a challenge/reference of as part of recommendations ( sorry if i failed to get the intent from paragraph ) . I do see it contradicting parts of our first goal " human friendly", "simple" , "with mnemonics"

#v
In certain context language neutral URI is essential. For example, in the URI Task Force of the European institutions by default the first segment of the path is a numeric string (the most language neutral possible) and mnemonics tolerated.
#^

Thanks again for this draft.

--Sumit Purohit

________________________________________
From: Manuel.CARRASCO-BENITEZ@ec.europa.eu [Manuel.CARRASCO-BENITEZ@ec.europa.eu]
Sent: Thursday, September 25, 2014 8:32 AM
To: christophe.gueret@dans.knaw.nl
Cc: public-dwbp-wg@w3.org
Subject: RE: COMURI

Dear Christophe,

First, *thanks* very much for the detailed commenting and  I am fully aware of the tediousness of this task: I am knackered after commenting your comments :-)

My comments below are inserted between the marking "#v" and "#^".

Regards
Tomas

From: Christophe Guéret [mailto:christophe.gueret@dans.knaw.nl]
Sent: Thursday, September 25, 2014 2:59 PM
To: CARRASCO BENITEZ Manuel (DGT)
Cc: public-dwbp-wg@w3.org
Subject: Re: COMURI

Dear Tomas,
Interesting document!

Here are some remarks per section:

* 3.3 Official data server
The example says that it is best to avoid doing as data.gov or data.gov.uk did...This is somewhat surprising and could be better argued for. I don't think the length of the URI matter here as a argument, especially considering that the construct http://foo.example.com will probably still want to use the path "data" somewhere and may end up with a "http://foo.example.com/data" of the same size as "http://data.example.com/foo". The point about what goes into one server and what goes into another is also to be discussed as the selection of a URI pattern can be totally dissociated from the actual storage/hosting of the data behind it. E.g. use "{register}.data.gov.nl" and then delegate each register to a specific machine as suggested in http://www.pilod.nl/w/images/a/aa/D1-2013-09-19_Towards_a_NL_URI_Strategy.pdf

#v
- Official data server
It is recommended to use "data" for second or third level domain for official data server as this is the most common practice. Indeed, I recommended to use "data.europa.eu" for the European Institutions; it will administer by the Publications Office, though is not working yet. Obviously, it is not well explained in the document and it will be changed.

- Own domain vs. official data server
The jury is out on this, but there is a default tendency to put it into official data server. The point that all resources cannot go into the official data server and "significant" resource collections should have its own domain. For examples :

    http://eur-lex.europa.eu                   - this how it is                                       - recommended
    http://data.europa.eu/eur-lex         - today it could have gone like this  - avoid

    http://eli.europa.eu                            -
    http://data.europa.eu/eli                  - this is how it is planned

eli = European Legislation Identifier
#^

* 4. URI patterns
Why is it needed to define such a list of naming patterns ? The list reads as "that, or that, or anything else you want" anyway ;-)

#v
I also have my doubts on this section. There is only one reference to a pattern in the document and if no more reference are made, it could be taken out.
#^

* 5. URI variants
Besides the naming of the variant using extensions the document could also mention using different paths. This is, for instance, what DBpedia and other sites using Pubby as a front-end do with the resource/page/data scheme. It would be good to let BP adopters follow this approach if they like it.
Also, nothing is said about content negotiation in this part. It could be recommended that this gets implemented in order to link all the variants to each other. Another good thing to recommend is to augment the HTTP responses with links in the header (again, as Pubby does)

#v
On purpose only the dot extension technique is put forward for the direct identification of variants; it does not exclude other techniques, such as segments in the path.

The rationale is to stay in the scope of URI and avoid trespassing into other domains such as TCN or data preservation; though TCN is mentioned: dot extension it should be considered a technique complementing others techniques such as TCN.
#^

* 6.2.1 http
I don't think this will fly: "For the URI metadata request, empty query can be used". Asking for a resource without any parameters is like asking for the resource directly. That is "GET http://example.org/test" and "GET http://example.org/test?" are the same query (or maybe not ? please correct me there if needed). Considering this it not be possible to differentiate between asking for a description of a resource and asking for its meta-data.

#v
COMURI is mostly a compilation of existing conventions: no new developments or conventions. *Except* for the empty query and this is acknowledged in the document
  http://dragoman.org/comuri.html#comuri-readiness

You are totally correct. Most server will return the same (the resource) for:
  http://example.org/test
  http://example.org/test?

And this is the beauty of it: it break nothing. Servers supporting COMURI will return the resource metadata. It is a minimalist syntax to get the resource metadata.
#^

* 6.3 Comuri authority
"Fourth level domains and beyond should be avoided as it makes URIs too long" hummm... what if I use "ship-dgt-foo.ec.europa.eu" instead of "ship.dgt-foo.ec.europa.eu" then ? It is a long but perfectly acceptable third level domain ;-) It's not really the number of sub domains that contribute most to the length of an URI as they only require an extra ".". It's what's in the namespace name what matters most. What about recommending a maximum length for the FQDN ? and maybe ground this maximum length to some cognitive this. E.g. people in general can not easily remember a string of symbols longer than 15 caracters (just picking up a random number here).

#v
The overall URI should be short and usually URI with fourth level domain are longer:
 ship.europa.eu                        # easiest to remember
 ship-dgt-foo.ec.europa.eu   # good reason to avoid "-" :-)
 ship.dgt-foo.ec.europa.eu
#^

* 6.4 Comuri path
Should there be a sister document "COMIRI" that will let users use IRI ? This would be useful for those that will use RDF 1.1 and/or OWL2
Or considering that IRI are a generalisation of URI the present document could already be adjusted (and renamed ?) to enable their usage.

#v
Perhaps, but not me :-)

Not that I am against IRIS (look at the acknowledgment) .
#^

* 6.4.2 Without dot extensions
"The path must not contain unnecessary dot extensions such as php." could be extended to say "The path must not contain unnecessary dot extensions such as php,jsp,asp or cgi" to cover a bit more example and let users know what it specificaly all extensions coming from server-side rendering engines that are not allowed. Otherwise, this recommendation goes directly against saying variants should be indicated using ".html", ".pdf" etc. Maybe a good idea to merge 6.4.2 and 6.4.3 to better explain the restriction.
This leaves an ambiguity by the way, what do I do if my resource is a PHP script displayed as a resource ?
Lastly, the example "http://example.com/foo.language.format" indicates that language always comes first and format second, which is a problem if one want to specify only a format. Using the example 10, looking at "http://example.com/palma.es" and "http://example.com/palma.xml" it is unclear why "xml" would not be a language like "es" is.

#v
- Programming extensions
php,jsp,asp or cgi will be added: it will be clearer.

- One extension
Good catch. I turned around this one for a while, but the alternative is to impose on user that do not have multilingual needs the typing of a dummy language tag. I came to the conclusion that it is not a problem. For "foo.ext":

   "If only one dot extension, servers should be capable of making the difference between a language and a format. Servers must respond with the two extensions, as per the negotiation."
#^

* 6.4.4 Dot character in archival
Typo "or two transform" -> "or to transform"
I don't get this... If we look at what the web archive is doing, they just stick the target archived resource name to the resolver. E.g. "http://web.archive.org/web/20010201000000*/http://www.google.fr" without renaming it. Why would it be best to change every dot into a dash ?

#v
Typo noted.

Explaining a situation, not offering a solution; both are probably fine:

http://example.com                                   # original site
http://example.org/example.com           # archival site with dot
http://example.org/example-com           # archival site with dash
#^

* 6.5 Comuri query
"This mechanism does not exist in servers and it has to be implemented" -> Does this mean we need an update of the spec of HTTP ?
How would this section apply to the "file" scheme ? Is it just not supported ?

#v
***I will never, ever suggest to update HTTP***  - hopefully, nobody uses this against me in the future :-)
COMURI is just a best practice and it cannot break anything. Indeed, one must follow the standards and practices. As long as I am aware, an empty query breaks nothing.

http servers will have to recognise the empty query and return the metadata and not the resource. Either by parameterisation or code modification; for Apache, I suspect that it should be possible with mod_rewrite.
#^

* 7. URI metadata
"URI metadata is the metadata associated to the resource, such as the Dublin Core. " -> Dublin Core is a standard that can be used to describe the metadata, it's is not the metadata of the URI.

#v
It will be rephrased.
#^

* 7.2 URI metadata structure
Typo "for appropriate for the" -> "appropriate for the"

#v
Noted.
#^

* 7.4 XHTML-ID
What is the motivation for proposing a new extension to XHTML instead of just using HTML5 and microdata ? Using "itemid" instead of "id" is not a big stretch.

#v
The rationale is having a "combined human-machine format", either XHTML-ID or another.

It might need rephrasing: it is not an extension to XHTML; just a way of using XHTML or a layer of top.
#^

* 8. Ultrapersistent URI
I love this section title ^_^

#v
Thanks, but I am not sure :-)
#^

* 8.2 Data archival
Could you add references for the classification of archival services ? That is "Online archival sites", "Offline archival" and "Pack"

#v
Noted.
#^

* 8.5.1 Online data
"If other techniques are used, URI should take priority, For example, if the appropriate header field request the German variant of the resource and the URI request the Spanish variant, the server should send the Spanish variant. " -> I think this is actually up to HTTP, or the implementers of it, to decide. COMURI is just using it and can not make any assumption on its behaviour.

#v
Here one is skating on thin ice: COMURI is about URI; data preservation, etc, is out of scope. This section is more a primer to put the whole thing in context.
#^

* 8.6.1 Static data
In this section the example for "Static data" shows what was called "Offline data" in section 8.5.2 just a few lines before
As I already indicated earlier in some mails, I really don't think restricting the usage of numbers to indicate version is a workable solution. The precision of versions should follow a specific pattern which is not just saying "whatever number is found at the end of the resource identifier is the version". There is already a strict specification for languages and format, why not just extend it with version then ? For instance, "http://example.com/foo.language.version.format" or "http://example.com/foo.version.language.format" ? Not that I would find this a really good solution either but at least this would be consistent with the rest of the specification

#v
More examples needed.

Static data and offline data are different. Static data can be accessed (served) with "file" and "http". Offline data only with "file". It might need some re-working.
#^

* 9.1 Language neutral URI
"http://example.com/1234" -> version 1234 of the default resource for "example.com" ?

#v
No. The resource
 http://example.com/1234

Particularly in the path, one might need to be language neutral and number are the best.
#^

* 9.2 Language identification in URIs
Indicating the language as part of the domain name does not seem to be consistent with the rest of the document.

#v
Correct. Just pointing out another technique.  From the abstract:

 "This best practice guide explores many challenges about URIs. If the proposals put forward are not valid for some circumstances, it should help to find other solutions."

In other words: here is the problem, I do not know the solution :-)
#^

Cheers,
Christophe

On 18 September 2014 15:48, Manuel.CARRASCO-BENITEZ@ec.europa.eu <Manuel.CARRASCO-BENITEZ@ec.europa.eu> wrote:
Dear WG members,

Please could you comment on
  Compact Uniform Resource Identifier (COMURI)
  http://dragoman.org/comuri
  mirror -  https://joinup.ec.europa.eu/site/med/dragoman/comuri

It is nearly completed and as per the calendar, the First Public Working Draft is planned by the 30 Sep
  http://www.w3.org/2013/meeting/dwbp/2014-08-22#URI_construction

The language in the final version will be corrected by a proof-reader.

Regards
Tomas




--
Onderzoeker
+31(0)6 14576494
christophe.gueret@dans.knaw.nl

Data Archiving and Networked Services (DANS)
DANS bevordert duurzame toegang tot digitale onderzoeksgegevens. Kijk op www.dans.knaw.nl voor meer informatie. DANS is een instituut van KNAW en NWO.

Let op, per 1 januari hebben we een nieuw adres:
DANS | Anna van Saksenlaan 51 | 2593 HW Den Haag | Postbus 93067 | 2509 AB Den Haag | +31 70 349 44 50 | info@dans.knaw.nl | www.dans.knaw.nl

Let's build a World Wide Semantic Web!
http://worldwidesemanticweb.org/

e-Humanities Group (KNAW)

Received on Friday, 26 September 2014 12:15:38 UTC