Re: Action-79 A review of COMURI

Phil,

I had thought that short URL's were the bastard child of Twitter and the
need to cram as much (t)wit as possible in 140 characters or less...

Manuel Tomas, your document is well written and meticulously described.
While today the use of URI's is limited and the need for shortened forms of
them may be academic, good of you to think ahead because the day will come
when we have tens of millions of URI's describing all the trees, and other
immovable objects in the world, with data.  But then again, one can easily
imagine a number of URI's that equals or exceeds the number of web pages
that URL's may describe.  How will shortened versions keep up the number of
roots, branches, limbs, and leafs necessary to describe it all?

Finally, how does this relate to the Data on the Web BP charter and goals?
I don't fully see the connection.


Best Regards,

Steve

Motto: "Do First, Think, Do it Again"


|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Phil Archer <phila@w3.org>                                                                                                                        |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Public DWBP WG <public-dwbp-wg@w3.org>, Manuel Tomas Carrasco-Benitez <Manuel.CARRASCO-BENITEZ@ec.europa.eu>                                      |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |10/03/2014 08:59 AM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Action-79 A review of COMURI                                                                                                                      |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





Tomas,

I have finally sat down to read your work properly and make the
following comments (which I'm doing deliberately without reading others
so I apologise for overlap - I want to come to this 'fresh.')

Initial comments:

The title makes clear that we're talking about URIs cf. URLs. Good. I
wouldn't want to overlap current W3C work on URLs that is the subject of
some disquiet (not least from me). http://w3ctag.github.io/url/

Others have already pointed out existing W3C work on CURIEs
(http://www.w3.org/TR/curie/)

We need to be clear, I think, that we're not talking about Web sites and
pages and all that. We're talking about URIs as resolvable identifiers.

We have a lot of competition in this space - and even more cultural
objection to the idea that anything beginning with http:// is a
persistent identifier. So we need to be careful and, as Makx will warn
us (in his usual sage - ignore at your peril - manner) that we shouldn't
allow ourselves to get drawn into a discussion here about what an
identifier identifies. I will try hard to desist!


The rationale says:

"The most common URI pattern should be similar to shortened URI
[SHORT-URI]; i.e., pattern-21. Indeed, the existence of URI shortening
services is a symptom that something is wrong and that native short URIs
are needed."

I don't agree, I'm afraid. URL shortners were developed to allow people
to include short strings that mapped to potentially very complex URIs
when writing e-mail or printed articles. Short URIs are brittle as the
shorter the term, i.e. the fewer the number of path segments, the more
likely it is that the term will have multiple uses that might conflict
in future.

However, I do think URIs should be no longer than necessary and that in
general, short is good. An example of a URI that is *way* too long and
utterly unguessable is
https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat-application-profile-data-portals-europe-final


So I imagine that I'm able to say what you as an EC official cannot -
for an example of how not to mint URIs look no further than the Joinup
platform.

I think statements like "Unwarranted complexity must be avoided. For
example, only use longer URIs (third level domain, multisegment paths)
when it cannot be avoided; many web sites can get by without language
and format variants, so avoid this mechanism." can be made less
confrontationally. Yes, you can avoid format variants in URIs but to do
it you need to make use of content negotiation. A URI like
http://www.w3.org/Icons/w3c_home returns either the gif or the png
version of the image depending on conneg.

But... w3.org is unusual. We can configure that kind of thing. Most
people have very little option to alter the set up of their online
system and so we need to show why conneg is good and therefore why it's
worth the effort.

Also, the Rationale section talks about Web sites - see above.

Design goals:

'Simplest possible characters such as number and lower case letters;
base 36 recommended'

Nope, sorry. We're a global organisation and the Web is for everyone.
There's nothing wrong with http://example.com/北京,中国

Which is a pointer to the fact that we should make explicit mention of
IRIs as well. I would say it's OK to state at the top that we use the
more common term URI and apply where relevant to IRIs, noting that
things like case sensitivity has no meaning in many languages.

If this remains a separate document,  we would need to think about CR
Exit criteria - i.e. how do we prove that it has multiple independent
implementations. IMO guidance like this is probably best done in a Note
- so conformance is not required.

I need to spend more time looking at the later sections but those are my
initial comments.

I don't mean to sound negative or disheartening. This is a complex issue
and I hope that the WG can indeed some up with useful, repeating and
practical advice - it's going to take a while.

Time to get ready for the weekly call...

Phil.


--


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1

Received on Tuesday, 7 October 2014 15:29:50 UTC