- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Sun, 29 Jun 2008 13:49:50 +0300
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Justin James <j_james@mindspring.com>, "'Smylers'" <Smylers@stripey.com>, "'HTML WG'" <public-html@w3.org>
On Jun 29, 2008, at 12:03, Julian Reschke wrote: > Justin James wrote: >> I posit that this use case is irrelevantly small; it only seems to >> apply to >> people attempting to write applications that implement a particular >> spec, or >> maybe people writing an "URIBuilder" type library component or >> something. > > It affects anybody who consumes HTML. The fact that HTML5-URLs are > something different means that you can't use out of the box URI/IRI > libraries and reminding readers of this spec by *not* using the term > URL would be helpful. That's missing the point. The point is that URI/IRI specs don't give full reality-based Web-compatible details, so if you use an out-of-the- box pure URI/IRI library, you software isn't compatible with existing Web content. Also, comprehensive libraries don't just implement the RFCs and be done. In Validator.nu, I use the most comprehensive IRI library for Java that I could find: the Jena IRI library. The Jena IRI library already acknowledges the existence of a multitude of URLish specs: It already supports conformance modes for six (6!: IRI, RDF, URI, XLink, XML Schema and XML System ID) specs! Unfortunately, none of those specs is fully Web-compatible. I'd like to see a seventh, Web- compatible mode implementing Web URLs in a future version. >> To "real world" people, this is Yet Another Spec That Shall Be >> Ignored. By >> trying to find some way to have all of these slightly different >> items play >> nicely with each other, we're dancing around the elephant in the >> room (I >> know, Managerial Speak) which is that there should only be one *RI/ >> L spec. >> PERIOD. Indeed, there should be one reality-based Web-compatible spec with full error recovery details for Web addresses aka. URLs. >> So let's stop this silly dance, get with the *RI/L group, and tell >> them, >> "this is broken, please provide us with 1 unified spec that makes >> sense." >> But for us to keep trying to Band-Aid the broken *RI/L situation >> within the >> HTML spec itself is pretty pointless. *RI/L is meta to HTML, and >> not within >> our purview. The *RI/L group seems to be unwilling to make their specs comprehensive in a way that is compatible with existing content, and this stuff needs to be specced *somewhere*, so for the time being it's in the HTML5 spec. It would indeed better to spin URLs to a self- contained spec, but we don't have enough *competent* and *willing* editors to go around. What to call these addresses is a total bikeshed, but I think they should be called URLs, because that's the name people use for the kind of addresses that work in browsers (i.e. on the Web). Where disambiguation is *actually* needed, the two kinds of URLs can be referred to as Web URLs and IETF URLs. > The URI/IRI specs aren't broken. They don't define error handling in such a way that implementing software to spec results in software that works with existing content. In my opinion, that counts as broken. > Lots of software implements URI/IRI processing, and browsers are > only one part of it. As far as Web content goes, non-browser software that is meant to dereference Web addresses needs to do it in a browser-compatible way in order to be compatible with the Web. As a developer of such software, I want a reality-based description of what my software needs to do to be compatible with Web content. So far, I have written software to the IRI spec and I'm rather unhappy to find that I've written software to a polite fiction of what people wish the Web to be like. (I *wish* it were UTF-8-only, too!) > You simply can't break all the other software by making incompatible > changes to these specs. The software is already broken from the point of view of its users if it isn't compatible with existing Web content. > Browsers do not treat URLs as specified, so the best thing is to > write down what they do, and try to discourage the incompatible > processing. I think the best thing to do is: 1) Specify in detail what needs to be done in order to dereference addresses in existing content. 2) Implement what needs to be done in multiple programming languages and give away libraries under an extremely liberal license so that no one has an excuse to avoid the libraries for licensing reasons. 3) Tell authors to encode their pages in UTF-8 (which they won't all do citing excuses such as imagined or measured but trivial when gzipped byte count inefficiencies). -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Sunday, 29 June 2008 10:50:33 UTC