Re: Shrinking HTML5 some more — Anne’s Weblog

Anne van Kesteren wrote:
>> It sounds like this is an edge case, in that that encoding could 
>> potentially contain decomposed characters, which would be mapped to a 
>> sequence of decomposed Unicode characters when the mapping is done in 
>> the most simple way.
> 
> Yes, but because of that edge case the specification has this silly 
> requirement which affects all non-Unicode encodings. Decomposed 
> characters are easy to get using character escapes.

Indeed, forgot about that.

In that case, I think it would be best for iri-bis to lift this 
requirement. It doesn't make sense that the same character sequence is 
handled differently depending on where it came from.

>>> IRIs work for that encoding was important but making them work for 
>>> HTML, CSS, etc. was not.)
>>
>> I'd say they work just fine; you just need to preprocess them.
> 
> The preprocessing you need to do involves converting the input to a URI 
> which seems highly suboptimal.

You could also preprocess to an IRI.

Anyway, the actual processing is the same, so what we're really 
discussing is simply how and where it's defined.

>> And also, the work-in-progress revision of RFC 3987 already addresses 
>> this (at least partly), by introducing LEIRIs 
>> (<http://tools.ietf.org/html/draft-duerst-iri-bis-05#section-7>).
> 
> LEIRIs are not a solution.

Please elaborate.

BR, Julian

Received on Monday, 30 March 2009 10:05:49 UTC