Re: Confusing use of "URI" to refer to IRIs, and IRI handling in the DOM from Henri Sivonen on 2008-06-29 (public-html@w3.org from June 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Sun, 29 Jun 2008 14:19:03 +0300
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Justin James <j_james@mindspring.com>, "'Smylers'" <Smylers@stripey.com>, "'HTML WG'" <public-html@w3.org>
Message-Id: <78949E98-5F71-40A2-A996-80C365A5D0CE@iki.fi>

On Jun 29, 2008, at 14:03, Julian Reschke wrote:

> Henri Sivonen wrote:
>>> It affects anybody who consumes HTML. The fact that HTML5-URLs are  
>>> something different means that you can't use out of the box URI/ 
>>> IRI libraries and reminding readers of this spec by *not* using  
>>> the term URL would be helpful.
>> That's missing the point. The point is that URI/IRI specs don't  
>> give full reality-based Web-compatible details, so if you use an  
>> out-of-the-box pure URI/IRI library, you software isn't compatible  
>> with existing Web content.
>
> It seems I didn't miss the point, because you just repeated what I  
> said :-).

Oops... Sorry.

>> Also, comprehensive libraries don't just implement the RFCs and be  
>> done. In Validator.nu, I use the most comprehensive IRI library for  
>> Java that I could find: the Jena IRI library. The Jena IRI library  
>> already acknowledges the existence of a multitude of URLish specs:  
>> It already supports conformance modes for six (6!: IRI, RDF, URI,  
>> XLink, XML Schema and XML System ID) specs! Unfortunately, none of  
>> those specs is fully Web-compatible. I'd like to see a seventh, Web- 
>> compatible mode implementing Web URLs in a future version.
>
> Thanks for that information.
>
> However I would assume that most programmers will first try the  
> libraries that ship with their programming environment; and these  
> are unlikely to be as flexible. Thus it's good to make it perfectly  
> clear to the readers that there is a difference.

Yes, making a note that there's a difference is worthwhile. However,  
we should seek to establish the reality-based spec as the well-known  
spec so that libraries shipping as part of programming environments  
can converge onto being applicable to Web content.

>> ...
>>> The URI/IRI specs aren't broken.
>> They don't define error handling in such a way that implementing  
>> software to spec results in software that works with existing  
>> content. In my opinion, that counts as broken.
>
> URIs/IRIs are used all over the place, not just in "web content" or  
> HTML. Do you seriously think that the same type of error handling is  
> applicable to all these cases?

I think the reason why URLs exist in the first place is addressing on  
the Web. Also, I think that that's the way URLs are most often used.  
Therefore, I think that the a specification for URLs should define  
their processing and dereferencing in a way that is suitable for Web  
addressing in a way that is compatible with the Web as it exists, and  
all other use cases should be subordinate to this primary purpose of  
compatibly dereferencable Web addressing. If someone wishes to reuse  
URLs for purposes other than Web addressing, they can define that the  
context character encoding is always "utf-8". However, other that the  
trouble of having to know the encoding context, I don't see why the  
rest of HTML5 URL error handling couldn't be reused in any kind of  
context that reuses URLs for non-Web purposes.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Sunday, 29 June 2008 11:19:45 UTC