Re: Error handling in URIs

On Wed, 25 Jun 2008, Frank Ellermann wrote:
> 
> Sure, but just because everybody does odd things in practice does not 
> necessarily mean that this needs to be noted in a standard.

One of the goals of the HTML5 effort is to define things to the level of 
detail required to implement a Web browser that will be completely 
interoperable with other Web browsers for all Web content, valid or not, 
without ever seeing another Web browser. This puts most "odd things" into 
scope.


> A standard is an abstraction.

Standards, for the purposes of the HTML5 effort, are comprehensive 
documentation intended to make it possible to implement user agents, and 
are thus very much not abstractions.

This isn't intended to disparage other beliefs or opinions as to what 
standards should be. I have no problem with standards that, e.g., leave 
error handling undefined -- they are just not really relevant to the HTML5 
work.


> For HTML 5 you will say that href= wants an RFC 3987 IRI, but you could 
> also say that spaces are no problem, a kind of LEIRI, for href=.  You 
> could also decide that URI is good enough, as it works everywhere, and 
> IRI-producers would know how to get an equivalent URI in the href, while 
> URI consumers might not know what a native IRI, let alone LEIRI, is.

You seem to be conflating the authoring requirements and the user agent 
requirements. The authoring requirements for HTML5 are just "it must be a 
valid URI or IRI". That however has little bearing on what the user agent 
conformance requirements are. The UA requirements have to handle all 
manner of things that _aren't_ valid URIs or IRIs, since in practice such 
invalid content is prevalent.


> > The question is what should a browser do with that document.
> 
> Garbage in, garbage out.

Sure, but what garbage out? And where is that defined? Right now, the 
answer is in the HTML5 spec. The thread started because it was suggested 
to me that maybe the URI specs should be updated, but I understand that 
this is not desireable to the people working on the URI specs, which is 
fine, and just means a bit more work for me. :-) (Not a big deal, the work 
is in fact mostly done at this point.)


> But make sure that you don't end up with *redefining* what is and what 
> is not a valid xyz (URI, IRI, UTF-8, XML, PNG, etc.)

Well, you can now look and see if what the spec says is acceptable to you, 
as I finished the bulk of it earlier today:

   http://www.whatwg.org/specs/web-apps/current-work/#urls

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Wednesday, 25 June 2008 05:34:10 UTC