- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 21 May 2009 19:44:15 +0200
- To: Anne van Kesteren <annevk@opera.com>
- CC: Dan Connolly <connolly@w3.org>, www-tag@w3.org
Anne van Kesteren wrote:
> On Thu, 21 May 2009 19:05:52 +0200, Julian Reschke
> <julian.reschke@gmx.de> wrote:
>> I think when we discussed this last October, Larry and several others
>> (including myself...) pointed out that the additional complexity as
>> compared to IRIs (RFC3987) can easily be layered *above* IRI, mapping
>> HTML5-references to IRIs by just by stating:
>
> Just for the record, around the same time I pointed out that this could
> not work because of Step 1b in section 3.1 of RFC 3987. This may or may
> not be a bug in RFC 3987, but it is most definitely an issue.
I apologize that I keep forgetting this issue; for the record it is this
one
b. If the IRI is in some digital representation (e.g., an
octet stream) in some known non-Unicode character
encoding, convert the IRI to a sequence of characters
from the UCS normalized according to NFC.
-- <http://tools.ietf.org/html/rfc3987#section-3.1>
...which is weird, because the normalization is only enforced on
non-Unicode encodings. Seems this needs to be discussed in the context
of IRIbis.
>> 1) non-IRI characters found in the query part are encoded using the
>> document's character encoding, then percent-escaped (*)
>
> In addition, for this to work you'd have to define how to get the "query
> part" first.
The part between the first "?" and the first "#", as far as I can tell.
> ...
BR, Julian
Received on Thursday, 21 May 2009 17:45:01 UTC