Re: uri handling of hosts is too restrictive

I wrote:

> Another (more general) way out is to introduce an explicit
> half-way-house between IRIs and URIs.

After some further thought, I would make a few tweaks to that idea.

First, percent-encoding would always be allowed in all components of all
IRIs; individual schemes would be unable to prohibit percent-encoding
anywhere.  Second, if an individual scheme restricts a component to
contain only a certain subset of Unicode characters (for example, the
ASCII subset), scheme-specific IRI consumers would be required to check
the component before using it, and fail gracefully if any characters are
found outside the subset.

(That would prevent IRIs from suffering some of the problems we are now
seeing with URIs.  In URIs, percent-encoding was prohibited in the host
component, and non-ASCII was prohibited in the host component, and there
was no requirement telling URI consumers what to do if they should find
either of those things in the host component, so now we have different
implementations behaving differently when they encounter such things.)

The rule for converting an IRI to a URI would be:

1) If you recognize the scheme, then verify that no component contains
characters that it's not supposed to contain.  If the verification
succeeds, then apply whatever conversions are appropriate for each
component.

2) If the verification failed, or if you didn't recognize the scheme,
then perform the generic conversion to percent-encoded UTF-8 as described
in the IRI draft, and prepend the prefix i- to the scheme.

(The prefix i- is a better choice than my previous suggestion of i:
because it is less prone to interact strangely with relative references.
The prefix could be registered as an "alternate tree" as described in
RFC-2717.)

To resolve an i-* URI, you conceptually convert it back to an IRI, then
redo the IRI-to-URI conversion using the scheme-specific knowledge
that was lacking in the earlier IRI-to-URI conversion.  Of course an
implementation might use a more direct route.

AMC
http://www.nicemice.net/amc/

Received on Wednesday, 18 February 2004 06:35:41 UTC