Re: Error handling in URIs from Felix Sasaki on 2008-06-26 (uri@w3.org from June 2008)

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 26 Jun 2008 13:33:14 +0900
To: Ian Hickson <ian@hixie.ch>
CC: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>, uri@w3.org
Message-ID: <48631C0A.7020206@w3.org>
Ian Hickson さんは書きました:
> On Thu, 26 Jun 2008, Frank Ellermann wrote:
>   
>>> browsers have already more or less converged on a behaviour.
>>>       
>> But that behaviour is wrong, because it cannot work reliably, outside of 
>> "if it is not UTF-8 then it must be iso-8859-1, redefined to be 
>> windows-1252 in HTML5" scenarios.
>>     
>
> Whether it's right or wrong is neither here nor there, frankly.
>
> It can work reliably insofar as all user agents can do the same thing, 
> which is what we're aiming for in the HTML5 effort.
>
>
>   
>>> Safari and Mozilla encode both as UTF-8 and %-escape both.
>>>       
>> Sounds like they got this right, didn't they ?
>>     
>
> This was in the context of copied-and-pasted URLs, which is user 
> interface, for which interoperability isn't a big deal (at least not 
> compared to handling actual legacy content).
>
>
>   
>>> It's about how to handle legacy, unmaintained, historical documents. 
>>> If we break them, we (humanity) lose part of our legacy. That would be 
>>> unfortunate.
>>>       
>> It would be also a red herring for IRIs specified in RFC 3987 only 3.5 
>> years ago, not permitted in HTML 4 or XHTML 1 pages.
>>     
>
> There are pages that aren't UTF-8 encoded that contain links with 
> non-ASCII characters in query components. Whether those pages existed 
> before or after the IRI spec did isn't really relevant. What's important 
> is that those pages exist and browsers don't want to break them -- and 
> that means that if I want my spec to not be ignored, I have to take them 
> into account and support them.
>
>
>   
>> If we are talking about method="get" forms and corresponding IRIs with 
>> an <iquery> 'human legacy' is an obscure argument - but I don't see 
>> what's wrong with what Safari and Mozilla do.
>>     
>
> Forms are a whole different problem. It's links that are of concern here.
>
>
>   
>>> Ok. HTML5 is an implementation specification.
>>>       
>> Better split the parts where it's a document type definition for 
>> authors, the audience is far too different.  If you tell authors what 
>> they can get away with they won't see the point of say "<s> is 
>> deprecated" vs. "interpret <s> as <del>".
>>     
>
> Yeah, that's on the cards for when the spec is more stable (we'll probably 
> generate two or three documents automatically for different audiences).
>
>
>   
>>  [IRL proposal]
>>     
>>> I think people would be more confused by the use of the term "IRL" 
>>> than "URL" (with the exception of people intimiately familiar with the 
>>> URI spec). Maybe the term "address" would work?
>>>       
>> If you are sure that you don't need "address" for something else it is 
>> fine.  IE-fans would know what you are talking about.  And I finally got 
>> used to the idea that "address" means what I know as "location".
>>
>> In the direction of:  "An 'address' is the URI (STD 66) derived from a 
>> valid IRI (RFC 3987) or invalid constructs as specified below" (etc.)
>>     
>
> It was brought to my attention on IRC that "address" is probably as 
> overloaded as "URL" so this might not be a step forwards for the spec, 
> just a step sideways. I'll see what can be done though. It might be that 
> the spec just uses the term "URL" and ignores the URI spec's definition of 
> the term. 

There is an alternative to ignoring the URI spec's definition: describe 
your usage of "URL" and the usage as indented by the URI spec. See a 
similar problem and a solution for the usage of the terms "URI" and 
"IRI" mentioned at
http://lists.w3.org/Archives/Public/www-tag/2008Jun/0110.html

Felix

> Most people seem to understand the intent, as far as I know 
> you're the only person whom this has confused.
>
>
>   
>>>> Broken URLs have caused real damage last year:
>>>> http://www.microsoft.com/technet/security/advisory/943521.mspx
>>>> http://www.heise-security.co.uk/news/97878
>>>>         
>>> Right, that's why defining error handling is critical, and why a spec 
>>> that doesn't define error handling is, frankly, irresponsible. By 
>>> defining error handling, we help guarantee that any input results in a 
>>> known, predictable, and most importantly _safe_ behaviour.
>>>       
>> IMHO you could leave this at "MUST NOT be interpreted as URI" or 
>> similar, but that might be a matter of taste.
>>     
>
> Well, we could say that, but then browser vendors would ignore us. I don't 
> want browser vendors to ignore us.
>
>
>   
>> Are you going to specify the exact error handling for say surrogates and 
>> overlong encodings in UTF-8 ?  I'd have ideas about this, but I don't 
>> see that it belongs into a HTML5 specificaton.
>>     
>
> These issues were brought to the attention of the Unicode consortium, who 
> are looking into addressing these error handling issues in their specs.
>
> I agree entirely that this kind of error handling stuff shouldn't be in 
> HTML5. The only times HTML5 defines error handling for things outside the 
> "HTML" language itself is when the relevant specs don't define their own 
> error handling, and the relevant groups refuse to do anything about it.
>
>
Received on Thursday, 26 June 2008 04:34:09 UTC