W3C home > Mailing lists > Public > public-html@w3.org > June 2008

Re: Confusing use of "URI" to refer to IRIs, and IRI handling in the DOM

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sun, 29 Jun 2008 13:03:20 +0200
Message-ID: <48676BF8.9050709@gmx.de>
To: Henri Sivonen <hsivonen@iki.fi>
CC: Justin James <j_james@mindspring.com>, 'Smylers' <Smylers@stripey.com>, 'HTML WG' <public-html@w3.org>

Henri Sivonen wrote:
>> It affects anybody who consumes HTML. The fact that HTML5-URLs are 
>> something different means that you can't use out of the box URI/IRI 
>> libraries and reminding readers of this spec by *not* using the term 
>> URL would be helpful.
> 
> That's missing the point. The point is that URI/IRI specs don't give 
> full reality-based Web-compatible details, so if you use an 
> out-of-the-box pure URI/IRI library, you software isn't compatible with 
> existing Web content.

It seems I didn't miss the point, because you just repeated what I said :-).

> Also, comprehensive libraries don't just implement the RFCs and be done. 
> In Validator.nu, I use the most comprehensive IRI library for Java that 
> I could find: the Jena IRI library. The Jena IRI library already 
> acknowledges the existence of a multitude of URLish specs: It already 
> supports conformance modes for six (6!: IRI, RDF, URI, XLink, XML Schema 
> and XML System ID) specs! Unfortunately, none of those specs is fully 
> Web-compatible. I'd like to see a seventh, Web-compatible mode 
> implementing Web URLs in a future version.

Thanks for that information.

However I would assume that most programmers will first try the 
libraries that ship with their programming environment; and these are 
unlikely to be as flexible. Thus it's good to make it perfectly clear to 
the readers that there is a difference.

> ...
>> The URI/IRI specs aren't broken.
> 
> They don't define error handling in such a way that implementing 
> software to spec results in software that works with existing content. 
> In my opinion, that counts as broken.

URIs/IRIs are used all over the place, not just in "web content" or 
HTML. Do you seriously think that the same type of error handling is 
applicable to all these cases?

> ...
>> You simply can't break all the other software by making incompatible 
>> changes to these specs.
> 
> The software is already broken from the point of view of its users if it 
> isn't compatible with existing Web content.

First of all, I disagree that software is broken because it fails to 
process invalid input the same way HTML5 requires.

That being said, I was referring to software that uses URIs and IRIs in 
completely different contexts.

>> Browsers do not treat URLs as specified, so the best thing is to write 
>> down what they do, and try to discourage the incompatible processing.
> 
> I think the best thing to do is:
>  1) Specify in detail what needs to be done in order to dereference 
> addresses in existing content.
>  2) Implement what needs to be done in multiple programming languages 
> and give away libraries under an extremely liberal license so that no 
> one has an excuse to avoid the libraries for licensing reasons.
>  3) Tell authors to encode their pages in UTF-8 (which they won't all do 
> citing excuses such as imagined or measured but trivial when gzipped 
> byte count inefficiencies).

Yes.

BR, Julian
Received on Sunday, 29 June 2008 11:04:05 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:38:55 UTC