Re: Shrinking HTML5 some more — Anne’s Weblog from Julian Reschke on 2009-03-30 (public-html@w3.org from March 2009)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Mon, 30 Mar 2009 12:05:07 +0200
To: Anne van Kesteren <annevk@opera.com>
CC: "public-html@w3.org" <public-html@w3.org>
Message-ID: <49D09953.8020906@gmx.de>

Anne van Kesteren wrote:
>> It sounds like this is an edge case, in that that encoding could 
>> potentially contain decomposed characters, which would be mapped to a 
>> sequence of decomposed Unicode characters when the mapping is done in 
>> the most simple way.
> 
> Yes, but because of that edge case the specification has this silly 
> requirement which affects all non-Unicode encodings. Decomposed 
> characters are easy to get using character escapes.

Indeed, forgot about that.

In that case, I think it would be best for iri-bis to lift this 
requirement. It doesn't make sense that the same character sequence is 
handled differently depending on where it came from.

>>> IRIs work for that encoding was important but making them work for 
>>> HTML, CSS, etc. was not.)
>>
>> I'd say they work just fine; you just need to preprocess them.
> 
> The preprocessing you need to do involves converting the input to a URI 
> which seems highly suboptimal.

You could also preprocess to an IRI.

Anyway, the actual processing is the same, so what we're really 
discussing is simply how and where it's defined.

>> And also, the work-in-progress revision of RFC 3987 already addresses 
>> this (at least partly), by introducing LEIRIs 
>> (<http://tools.ietf.org/html/draft-duerst-iri-bis-05#section-7>).
> 
> LEIRIs are not a solution.

Please elaborate.

BR, Julian

Received on Monday, 30 March 2009 10:05:49 UTC