- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Mon, 01 Jun 2009 16:14:04 +0200
- To: Larry Masinter <masinter@adobe.com>
- CC: "Roy T. Fielding" <fielding@gbiv.com>, HTML WG <public-html@w3.org>, "public-iri@w3.org" <public-iri@w3.org>
Larry Masinter wrote: > I've found it convenient to use "HRef" as a shorthand > in the document. > > What I'm not sure of is whether I can get away with > just *replacing* the IRI -> URI algorithm, or if > I should leave both HRef -> URI and IRI -> URI. I think the IRI -> URI algorithm should not change (expect for the bit about normalization discussed previously). What should be added is HRef -> IRI (whch implies that in some cases, that mapping would need to map query parameters to plain ASCII). LEIRIs then could become a special case of the thing described above. > Right now, the HTML5/"Web Address" draft is written as > "how to parse" and "how to resolve relative to absolute". > > I'm not sure if it's possible to recast it as > HRef => URI, but it's certainly worth a try. Repeating what I suggested on www-tag a few days ago (<http://lists.w3.org/Archives/Public/www-tag/2009May/0083.html>)...: This has been under discussion for something like nine months. I think the issues, as documented by Ian, Henri and now by Dan are well-understood (and thanks for posting examples and test cases). I think when we discussed this last October, Larry and several others (including myself...) pointed out that the additional complexity as compared to IRIs (RFC3987) can easily be layered *above* IRI, mapping HTML5-references to IRIs by just by stating: 1) non-IRI characters found in the query part are encoded using the document's character encoding, then percent-escaped (*) 2) all other non-IRI characters (such as space) are encoded using UTF-8, then percent-escaped Or, if we use LEIRIs as foundation instead (<http://tools.ietf.org/html/draft-duerst-iri-bis-04#section-7>), we end up with a *single* rule: 1') non-IRI characters found in the query part are encoded using the document's character set, then percent-escaped (*) Why does it need to be *more* complex than that? BR, Julian
Received on Monday, 1 June 2009 14:14:47 UTC