Re: Error handling in URIs from Frank Ellermann on 2008-06-25 (uri@w3.org from June 2008)

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Wed, 25 Jun 2008 21:31:43 +0200
To: uri@w3.org
Message-ID: <g3u6bn$qi9$1@ger.gmane.org>
Ian Hickson wrote:

> could you quote the bits that are nonsensical?

With difficulties, the memo needs ages to load over a V.90
line, and then ages to run some scripts, until my browser
asks me if I want to abort whatever it is:

| A URL is a valid URL if at least one of the following
| conditions holds: 
| * The URL is a valid URI reference [RFC3986]. 

Period, end of story, see STD 66.

| The URL is a valid IRI reference and it has no query component.
| [RFC3987] 

Nope, that's an IRI, not an URL (matched in bullet 1).

| The URL is a valid IRI reference and its query component
| contains no unescaped non-ASCII characters. [RFC3987] 

That's also an IRI, not an URL (matched in bullet 1).

There is also nothing special with query parts using
unescaped characters, at least not in RFC 3987.  

| The URL is a valid IRI reference and the character encoding
| of the URL's Document is UTF-8. [RFC3987] 

That's also an IRI, not an URL (matched in bullet 1).

There is nothing special about UTF-8 IRIs, this only
accelerates "transform to UTF-8" in an IRI-to-URI
conversion.

> Actually we're trying to not reinvent the Web, but to
> document it, so that browser vendors can write browsers
> that handle existing Web content in a fashion compatible
> with legacy UAs without reverse-engineering each other.

2.3 claims to define the term URL.  This term is defined
in STD 66.  If you want to define something else, e.g., a
BURL (broken URL), or PURL (pseudo-URL), please pick a new
term - but not BURL or PURL, they are already in use for
other purposes.

Maybe use "IRL", the IRI spec. doesn't use it.  Apparently
what you really want is a new variant of IRI, with special
rules for <iquery> parts in non-UTF-8 documents.

> It's true that this is requiring defining things that are
> at odds with existing specifications, but that's mostly
> because those specifications aren't in fact in line with
> real usage.

"Real usage" is not only what numerous broken Web pages do,
or what a few browsers guess.  Broken URLs have caused real
damage last year:

http://www.microsoft.com/technet/security/advisory/943521.mspx
http://www.heise-security.co.uk/news/97878

> I make no judgement as to whether that's a good thing or
> not, that doesn't much matter to me.

Of course you judge things.  E.g. you judge <i> and <b> as
worth keeping, and you judge <s> and <tt> as worth killing,
and from my POV that is wrong.  Allowing them all as short
and semantically equivalent to corresponding longer tags
would be nice for users forced to type tags in contexts
such as Wikis and comment forms, <s> would be even better
than <del> for old browsers, and some tools don't support
say <sample>, but permit <tt>.

Just an example - I know that the semantic cabale fights 
about any comma in what they consider as "presentational".

 Frank
Received on Wednesday, 25 June 2008 19:30:49 UTC