Re: URL work in HTML 5 (semifork) from Jan Algermissen on 2012-10-16 (www-archive@w3.org from October 2012)

From: Jan Algermissen <jan.algermissen@nordsc.com>
Date: Tue, 16 Oct 2012 14:37:45 +0200
To: Anne van Kesteren <annevk@annevk.nl>
Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, Robin Berjon <robin@w3.org>, Ted Hardie <ted.ietf@gmail.com>, Larry Masinter <masinter@adobe.com>, "plh@w3.org" <plh@w3.org>, "Peter Saint-Andre (stpeter@stpeter.im)" <stpeter@stpeter.im>, "Pete Resnick (presnick@qualcomm.com)" <presnick@qualcomm.com>, "www-archive@w3.org" <www-archive@w3.org>, "Michael(tm) Smith" <mike@w3.org>
Message-Id: <BF9A48C9-748E-4C02-AA41-3964AA153ED3@nordsc.com>

On Oct 16, 2012, at 2:09 PM, Anne van Kesteren wrote:

> On Tue, Oct 16, 2012 at 1:44 PM, Jan Algermissen
> <jan.algermissen@nordsc.com> wrote:
>> On Oct 16, 2012, at 1:29 PM, Anne van Kesteren wrote:
>>> I'm not arguing URLs should be allowed to contain SP, just that they
>>> can (and do) in certain contexts and that we need to deal with that
>>> (either by terminating processing or converting it to %20 or ignoring
>>> it in case of domain names, if I remember correctly).
>> 
>> I am not understanding your perceived problem with two specs.
> 
> I think your context quoting went wrong.
> 
> 
>> In addition to that you can standardize 'recovery' algorithms for turning
>> broken URIs to valid ones. Maybe with different 'heuristics levels' before
>> giving up and reporting an error.
> 
> The algorithm is not for "fixing up". It's for processing URLs,
> including those that happen to be invalid. The end result is not
> always valid per STD 66.

And there lies the problem. Where is the benefit of producing invalid results
as opposed to fixing with best effort?

What can you do with a result that is an invalid URI? You cannot hand it to any
tool that implements the URI spec.

And aything you are ever going to do with a parsed-but-invalid URI is treat it
as a valid one using a set of assumptions.

Why not simply apply these assumptions in the first place and have a valid URI
as a result.

Much cleaner because the concerns are clearly separated.

> 
> 
>> Any piece of software that wishes to be nice on 'URI providers' and process
>> broken URIs to some extend can apply that standardized algorith in a fixup
>> phase before handing it on to the component that expects a valid URI.
> 
> I do not think it makes sense to have different URL parsers (one with
> a "be strict" bit works).

How you implement that is a detail. If e.g. an HTML broswer intends to apply the
fixing algorith it can surely do that as part of the URI parsing.

The important part is that the result is a valid URI.

> Just like it does not make sense to have two
> different HTML parsers in your software stack.

I did not say that.

Jan

> 
> 
> -- 
> http://annevankesteren.nl/

Received on Tuesday, 16 October 2012 12:38:27 UTC