Re: Confusing use of "URI" to refer to IRIs, and IRI handling in the DOM

On Sun, 29 Jun 2008, Julian Reschke wrote:
> > > > >
> > > > > 3. The distinction between HTML5-URL and RFC3987-IRI *is* 
> > > > > important, because [...] HTML5-URLs can contain spaces
> > > >
> > > > No, they're not allowed to contain spaces.
> > >
> > > *Valid* URLs aren't, but the spec spends a considerable amount of 
> > > space dealing with invalid ones.
> > 
> > Well if you're willing to consider invalid ones, what about invalid 
> > URIs? They can contain spaces too. What's the distinction between an 
> > invalid URL and an invalid URI?
> 
> None? I guess I don't understand the question.

If valid HTML5 URLs and valid IRIs are equivalent, and invalid HTML5 URLs 
and invalid IRIs are indistinguishable, then what's the problem?


> > > > > - mapping of non-ASCII characters in query parts differs from 
> > > > > RFC3987-IRI.
> > > >
> > > > Only in non-conforming documents.
> > >
> > > (In which case documents with valid IRIs get non-conforming when 
> > > using the wrong document encoding...)
> > 
> > Right, otherwise documents with valid IRIs but non-UTF-8 encodings 
> > wouldn't be treated as per the IRI spec, which is bad (presumably) and 
> > shouldn't be encouraged, and should be brought to the author's 
> > attention.
> 
> Understood. The alternative (which I think should be seriously 
> considered) is to break those pages, and to always use UTF-8 for 
> encoding.

I can only spec that if browsers are willing to do it. So far, my 
understanding is that they are not. (There's no point writing a spec that 
isn't followed, the whole point of the spec is to define what should 
happen to get interoperability.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Sunday, 29 June 2008 09:51:58 UTC