- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 21 May 2009 19:05:52 +0200
- To: Dan Connolly <connolly@w3.org>
- CC: www-tag@w3.org
Dan Connolly wrote: > Larry, Henry, John, > > I made some progress on ACTION-265 > > "Work with Larry, Henry to frame technical issues relating to the > vairous overlapping specs. about URIs, IRIs and encoding on the wire" > -- > http://www.w3.org/2001/tag/group/track/actions/265 > > In particular... > > http://www.w3.org/html/wg/href/elab.html > http://www.w3.org/html/wg/href/elab10.html > > This is a successive elaboration of the issues with > issues captured as test cases. > > It's what I was talking about when I wrote... > > (the best way to slow down is to make test cases. here's hoping I find > time) > -- http://www.w3.org/2001/tag/2009/05/07-minutes#item05 > > > The issues covered are > > Space in Path > Colon in path > Non-ASCII characters in path > Non-ASCII characters in path and query/search > > Larry, I showed you an earlier draft and you weren't too > excited. I still find this is the way my brain needs > to capture issues. > > John, could you take a look at see if I'm making sense, at least? > > I gather Henry is out this week... > ... This has been under discussion for something like nine months. I think the issues, as documented by Ian, Henri and now by Dan are well-understood (and thanks for posting examples and test cases). I think when we discussed this last October, Larry and several others (including myself...) pointed out that the additional complexity as compared to IRIs (RFC3987) can easily be layered *above* IRI, mapping HTML5-references to IRIs by just by stating: 1) non-IRI characters found in the query part are encoded using the document's character encoding, then percent-escaped (*) 2) all other non-IRI characters (such as space) are encoded using UTF-8, then percent-escaped Or, if we use LEIRIs as foundation instead (<http://tools.ietf.org/html/draft-duerst-iri-bis-04#section-7>), we end up with a *single* rule: 1') non-IRI characters found in the query part are encoded using the document's character set, then percent-escaped (*) Why does it need to be complex than that? BR, Julian (*) Note that HTML5 considers only links with non-URI characters in the query part as valid if the document's encoding is UTF-8/16 (as far as I recall).
Received on Thursday, 21 May 2009 17:06:39 UTC