- From: Larry Masinter <masinter@adobe.com>
- Date: Sun, 20 Dec 2009 00:21:38 -0800
- To: "public-iri@w3.org" <public-iri@w3.org>
One of the goals we wanted to make sure of in the charter was to (1) insure that the IRI document going forward was suitable for use by the HTML specification as a normative reference (for certain), and (2) also try to minimize the difference between how browsers treated URLs and how all other Internet applications applications treat URLs. I was hoping that the document already met the goal for (1), and that (2) was something to work on, but I think there needs to be more work to get (1) accomplished. I'd like to ask for some help and discussion around what is in the HTML working group "Bug database" as a "bug" to change HTML to point to draft-duerst-iri-bis with some proposed text. However, there remain some problems. I'd appreciate it if some other mailing list subscribers had some ideas for how to fix the document better to accomplish (1) while retaining the goal for (2). To make progress on (2), I think we'd want to take some of the things in section 7.2 "HREF preprocessing" and move them into the main body of what all normative URI processors should do, and not just the ones in browsers. Things like chopping off initial & final whitespace, hadling single "%" , deleting or encoding otherwise illegal characters, etc. http://www.w3.org/Bugs/Public/show_bug.cgi?id=8207 --- Comment #4 from Ian 'Hixie' Hickson <ian@hixie.ch> 2009-12-16 02:45:15 --- I looked at doing this, but the IRIbis draft isn't yet in a state where I can really do this. There's no algorithm that defines how to resolve an arbitrary string against an absolute base URL, as far as I can tell; in particular, nothing seems to take into account the HRef-charset so as to encode characters differently in different parts of the string. There's no definition of "valid URL" that I can refer to (that takes into account the "HRef-charset"). The parsing algorithm is destructive (e.g. the <path> of "http://example.com/%X" is, as far as I can tell, 5 characters long ("/%25X"), not three as required by Web compat ("/%X"). There's no definition of "absolute URL" that I can use (mostly because the current parsing algorithms are destructive). This is all assuming that the split should be as it is now; this may not be a good assumption. If we should move the interface a bit, that may change matters. For example, it seems to me we probably what the "HRef-charset" definition in HTML5, rather than in the IRI spec. Please advise on how I should proceed.
Received on Sunday, 20 December 2009 08:22:10 UTC