Advice on making IRI document suitable for reference by HTML (and other specs)

One of the goals we wanted to make sure of in the charter was to

(1)  insure that the IRI document going forward was suitable for use 
by the HTML specification as a normative reference (for certain), and 
(2) also try to minimize the difference between how browsers treated URLs
and how all other Internet applications applications treat URLs. 

I was hoping that the document already met the goal for (1), and that
(2) was something to work on, but I think there needs to be more work
to get (1) accomplished.

I'd like to ask for some help and discussion around what is in the
HTML working group "Bug database" as a "bug" to change HTML to point
to draft-duerst-iri-bis with some proposed text. However, there remain
some problems.

I'd appreciate it if some other mailing list subscribers had some
ideas for how to fix the document better to accomplish (1) while retaining
the goal for (2).  To make progress on (2), I think we'd want to take
some of the things in section 7.2 "HREF preprocessing" and move them
into the main body of what all normative URI processors should do, and
not just the ones in browsers. Things like chopping off initial & final
whitespace, hadling single "%" , deleting or encoding otherwise illegal
characters, etc.




http://www.w3.org/Bugs/Public/show_bug.cgi?id=8207


--- Comment #4 from Ian 'Hixie' Hickson <ian@hixie.ch>  2009-12-16 02:45:15 ---
I looked at doing this, but the IRIbis draft isn't yet in a state where I can
really do this. There's no algorithm that defines how to resolve an arbitrary
string against an absolute base URL, as far as I can tell; in particular,
nothing seems to take into account the HRef-charset so as to encode characters
differently in different parts of the string. There's no definition of "valid
URL" that I can refer to (that takes into account the "HRef-charset"). The
parsing algorithm is destructive (e.g. the <path> of "http://example.com/%X"
is, as far as I can tell, 5 characters long ("/%25X"), not three as required by
Web compat ("/%X"). There's no definition of "absolute URL" that I can use
(mostly because the current parsing algorithms are destructive).

This is all assuming that the split should be as it is now; this may not be a
good assumption. If we should move the interface a bit, that may change
matters. For example, it seems to me we probably what the "HRef-charset"
definition in HTML5, rather than in the IRI spec.

Please advise on how I should proceed.

Received on Sunday, 20 December 2009 08:22:10 UTC