W3C home > Mailing lists > Public > www-tag@w3.org > May 2009

Re: a few URI/href issues captured with test cases

From: John Kemp <john.kemp@nokia.com>
Date: Thu, 21 May 2009 13:33:19 -0400
Cc: "www-tag@w3.org" <www-tag@w3.org>
Message-Id: <2029C43F-F0B4-4BAE-8058-5F601619463D@nokia.com>
To: ext Dan Connolly <connolly@w3.org>
Hello Dan,

On May 21, 2009, at 11:52 AM, ext Dan Connolly wrote:

[...]

>  http://www.w3.org/2001/tag/group/track/actions/265
>
> In particular...
>
>  http://www.w3.org/html/wg/href/elab.html
>  http://www.w3.org/html/wg/href/elab10.html
>

[...]

>
> The issues covered are
>
> Space in Path
> Colon in path
> Non-ASCII characters in path
> Non-ASCII characters in path and query/search
>
> Larry, I showed you an earlier draft and you weren't too
> excited. I still find this is the way my brain needs
> to capture issues.
>
> John, could you take a look at see if I'm making sense, at least?

It makes sense in that I think I understood your test cases, and  
(somewhat) their relationship to the issue at hand.

Summarizing my (basic) understanding:

* Links are good, and a basic feature of the Web
* However, links are used in different contexts (for example, a link  
in an HTML href is then used to make an HTTP request, specifically in  
the case of an HTML form submission)
... and the characters, character set and encoding used in one context  
may not be appropriate in another context
* Some specifications defer to 3986 for URI encoding rules. 3986  
defers to scheme specifications in particular with regard to "reserved  
characters". None of the relevant specifications say anything about  
the use of IRIs in links (correct?)

Your examples appear to indicate:

i) That a space is not allowed in the path component of an HTTP  
request, but a space should be escaped as %20 in HTML, as specified by  
RFC3986
ii) That a colon in the path creates a link which is not useful  
outside of the context of the document within which it appears (at  
least, I _think_ that's what you mean here?)
iii) That URIs only allow US ASCII characters per RFC3986

I'm not totally sure how these relate directly to the issue we  
discussed on the 7th (and paraphrasing, hopefully not too terribly,  
Tim's description) that a document encoded in one character set may  
contain a link which contains characters encoded with a different  
character-set - in particular when that link is used in a form  
submission (other than as a result of adhering to the URI  
specification rules instead of the IRI specification rules).

After reading what I've written, my general feedback seems to be that  
your examples are interesting, appear that they might be relevant, but  
could probably be better placed into some context. I've attempted to  
provide (hopefully not too oversimplified) in this email the context  
in which I feel your examples make sense. Does that make sense to you?

Cheers,

- johnk
Received on Thursday, 21 May 2009 17:34:21 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:48:13 GMT