- From: <Manuel.CARRASCO-BENITEZ@ec.europa.eu>
- Date: Tue, 7 Oct 2014 14:15:14 +0000
- To: <phila@w3.org>
- CC: <public-dwbp-wg@w3.org>
Phil, * Criticism Phil, I am grateful that you take your time to go through the document: the tone is appropriate and you do *not* sound negative. To put it in perspective, this is a *very old* subject to me and I am very aware of the pitfalls. A bit of archaeology :-) Internationalization and URLs - grandfather of IRI http://lists.w3.org/Archives/Public/www-international/1996JulSep/0006.html Internationalized Uniform Resource Identifiers (IURI) - father of IRI http://tools.ietf.org/html/draft-masinter-url-i18n-03 * Other URI, URL activities It is different from CURIE or http://w3ctag.github.io/url. The comment about CURIE was due to the similarity of the name and it was changed from the previous one (CURI) to the present one (COMURI) that it is sufficiently different to avoid confusion. * Identifier It is clear: "Comuri is a compact mnemonic URI " ... "A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource". This is a best practice document and it must follow the current standards (RFC, recommendations, etc) and in particular [RFC3986]. Opinions, warnings, and comments must also follow the current standards, otherwise they should be taken with a pinch of salt. * Shortening service It might be not properly explained: Compact URIs do not need shortening services and this is good. * URI conflict due to shortness I disagree: it is up to the URI minter to avoid conflict, short or long URIs. * Joinup examples I agree: a case of how not to mint URIs. * Variants Using content negotiation or direct identification in the URI are just two complementary techniques to get variants. * Multilingualism Addressing the: - Whole world: recommend the use of language neutral URIs, such as numbers, including base 36 - One language: one could use language specific URIs For example, http://example.com/北京,ä¸å›½ : - Whole world: bad - China: good Anecdote.- In a trip to Armenia I noticed that wifi passwords were always numeric: no Latin or Armenian characters. I agree that IRI should be mentioned, though stating that it should be used only when needed. I can hardly forget Web multilingualism in my proposals :-) I organised the first ever event on Web multilingualism (Third conference, 1995); I contributed to IRI; I work for the Directorate-General for Translation of the European Union with 24 official languages and translation to a few more. * Implementations Comuri does not break any of the existing standards and it works with existing software as they are only conventions. Only the "direct identification of metadata" would require finer parameterisation (in some servers) or new development, where the principle of at least two independent implementations must be respected. Regards Tomas -----Original Message----- From: Phil Archer [mailto:phila@w3.org] Sent: Friday, October 03, 2014 2:59 PM To: Public DWBP WG; CARRASCO BENITEZ Manuel (DGT) Subject: Action-79 A review of COMURI Tomas, I have finally sat down to read your work properly and make the following comments (which I'm doing deliberately without reading others so I apologise for overlap - I want to come to this 'fresh.') Initial comments: The title makes clear that we're talking about URIs cf. URLs. Good. I wouldn't want to overlap current W3C work on URLs that is the subject of some disquiet (not least from me). http://w3ctag.github.io/url/ Others have already pointed out existing W3C work on CURIEs (http://www.w3.org/TR/curie/) We need to be clear, I think, that we're not talking about Web sites and pages and all that. We're talking about URIs as resolvable identifiers. We have a lot of competition in this space - and even more cultural objection to the idea that anything beginning with http:// is a persistent identifier. So we need to be careful and, as Makx will warn us (in his usual sage - ignore at your peril - manner) that we shouldn't allow ourselves to get drawn into a discussion here about what an identifier identifies. I will try hard to desist! The rationale says: "The most common URI pattern should be similar to shortened URI [SHORT-URI]; i.e., pattern-21. Indeed, the existence of URI shortening services is a symptom that something is wrong and that native short URIs are needed." I don't agree, I'm afraid. URL shortners were developed to allow people to include short strings that mapped to potentially very complex URIs when writing e-mail or printed articles. Short URIs are brittle as the shorter the term, i.e. the fewer the number of path segments, the more likely it is that the term will have multiple uses that might conflict in future. However, I do think URIs should be no longer than necessary and that in general, short is good. An example of a URI that is *way* too long and utterly unguessable is https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat-application-profile-data-portals-europe-final So I imagine that I'm able to say what you as an EC official cannot - for an example of how not to mint URIs look no further than the Joinup platform. I think statements like "Unwarranted complexity must be avoided. For example, only use longer URIs (third level domain, multisegment paths) when it cannot be avoided; many web sites can get by without language and format variants, so avoid this mechanism." can be made less confrontationally. Yes, you can avoid format variants in URIs but to do it you need to make use of content negotiation. A URI like http://www.w3.org/Icons/w3c_home returns either the gif or the png version of the image depending on conneg. But... w3.org is unusual. We can configure that kind of thing. Most people have very little option to alter the set up of their online system and so we need to show why conneg is good and therefore why it's worth the effort. Also, the Rationale section talks about Web sites - see above. Design goals: 'Simplest possible characters such as number and lower case letters; base 36 recommended' Nope, sorry. We're a global organisation and the Web is for everyone. There's nothing wrong with http://example.com/北京,ä¸å›½ Which is a pointer to the fact that we should make explicit mention of IRIs as well. I would say it's OK to state at the top that we use the more common term URI and apply where relevant to IRIs, noting that things like case sensitivity has no meaning in many languages. If this remains a separate document, we would need to think about CR Exit criteria - i.e. how do we prove that it has multiple independent implementations. IMO guidance like this is probably best done in a Note - so conformance is not required. I need to spend more time looking at the later sections but those are my initial comments. I don't mean to sound negative or disheartening. This is a complex issue and I hope that the WG can indeed some up with useful, repeating and practical advice - it's going to take a while. Time to get ready for the weekly call... Phil. -- Phil Archer W3C Data Activity Lead http://www.w3.org/2013/data/ http://philarcher.org +44 (0)7887 767755 @philarcher1
Received on Tuesday, 7 October 2014 14:15:55 UTC