Re: a few URI/href issues captured with test cases from Julian Reschke on 2009-05-21 (www-tag@w3.org from May 2009)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 21 May 2009 19:44:15 +0200
To: Anne van Kesteren <annevk@opera.com>
CC: Dan Connolly <connolly@w3.org>, www-tag@w3.org
Message-ID: <4A1592EF.1040602@gmx.de>

Anne van Kesteren wrote:
> On Thu, 21 May 2009 19:05:52 +0200, Julian Reschke 
> <julian.reschke@gmx.de> wrote:
>> I think when we discussed this last October, Larry and several others 
>> (including myself...) pointed out that the additional complexity as 
>> compared to IRIs (RFC3987) can easily be layered *above* IRI, mapping 
>> HTML5-references to IRIs by just by stating:
> 
> Just for the record, around the same time I pointed out that this could 
> not work because of Step 1b in section 3.1 of RFC 3987. This may or may 
> not be a bug in RFC 3987, but it is most definitely an issue.

I apologize that I keep forgetting this issue; for the record it is this 
one

             b. If the IRI is in some digital representation (e.g., an
                octet stream) in some known non-Unicode character
                encoding, convert the IRI to a sequence of characters
                from the UCS normalized according to NFC.

-- <http://tools.ietf.org/html/rfc3987#section-3.1>

...which is weird, because the normalization is only enforced on 
non-Unicode encodings. Seems this needs to be discussed in the context 
of IRIbis.

>> 1) non-IRI characters found in the query part are encoded using the 
>> document's character encoding, then percent-escaped (*)
> 
> In addition, for this to work you'd have to define how to get the "query 
> part" first.

The part between the first "?" and the first "#", as far as I can tell.

 > ...

BR, Julian

Received on Thursday, 21 May 2009 17:45:01 UTC