Re: [whatwg] New URL Standard from Anne van Kesteren on 2012-09-24 (public-whatwg-archive@w3.org from September 2012) from David Sheets on 2012-10-24 (uri@w3.org from October 2012)

From: David Sheets <kosmo.zb@gmail.com>
Date: Tue, 23 Oct 2012 20:49:52 -0700
To: Ian Hickson <ian@hixie.ch>
Cc: Christophe Lauret <clauret@weborganic.com>, Jan Algermissen <jan.algermissen@nordsc.com>, Ted Hardie <ted.ietf@gmail.com>, URI <uri@w3.org>, IETF Discussion <ietf@ietf.org>
Message-ID: <CAAWM5Tz3NdprjqwgyoVoV9qUuiwXb2gTQ49u4a4ePGfjyusDkw@mail.gmail.com>

On Tue, Oct 23, 2012 at 4:51 PM, Ian Hickson <ian@hixie.ch> wrote:
> On Wed, 24 Oct 2012, Christophe Lauret wrote:
>>
>> As a Web developer who's had to write code multiple times to handle URIs
>> in very different contexts, I actually *like* the constraints in STD 66,
>> there are many instances where it is simpler to assume that the error
>> handling has been done prior and simply reject an invalid URI.
>
> I think we can agree that the error handling should be, at the option of
> the software developer, either to handle the input as defined by the
> spec's algorithms, or to abort and not handle the input at all.

Yes, input is handled according to the specs' algorithmS.

>> But why not do it as a separate spec?
>
> Having multiple specs means an implementor has to refer to multiple specs
> to implement one algorithm, which is not a way to get interoperability.
> Bugs creep in much faster when implementors have to switch between specs
> just in the implementation of one algorithm.

One algorithm? There seem to be several functions...

- URI reference parsing (parse : scheme -> string -> raw uri_ref)
- URI reference normalization (normalize : raw uri_ref -> normal uri_ref)
- absolute URI predicate (absp : normal uri_ref -> absolute uri_ref option)
- URI resolution (resolve : absolute uri_ref -> _ uri_ref -> absolute uri_ref)

Of course, some of these may be composed in any given implementation.
In the case of a/@href and img/@src, it appears to be something like
(one_algorithm = (resolve base_uri) . normalize . parse (scheme
base_uri)) is in use.

A good way to get interop is to thoroughly define each function and
supply implementors with test cases for each processing stage
(one_algorithm's test cases define some tests for parse, normalize,
and resolve as well).

Some systems use more than the simple function composition of web browsers...

>> Increasing the space of valid addresses, when the set of addressable
>> resources is not actually increasing only means more complex parsing rules.
>
> I'm not saying we should increase the space of valid addresses.

Anne's current draft increases the space of valid addresses. This
isn't obvious as Anne's draft lacks a grammar and URI component
alphabets. You support Anne's draft and its philosophy, therefore you
are saying the space of valid addresses should be expanded.

Here is an example of a grammar extension that STD 66 disallows but
WHATWGRL allows:
<http://www.rfc-editor.org/errata_search.php?rfc=3986&eid=3330>

> The de facto parsing rules are already complicated by de facto requirements for
> handling errors, so defining those doesn't increase complexity either
> (especially if such behaviour is left as optional, as discussed above.)

*parse* is separate from *normalize* is separate from checking if a
reference is absolute (*absp*) is separate from *resolve*.

Why don't we have a discussion about the functions and types involved
in URI processing?

Why don't we discuss expanding allowable alphabets and production rules?

David

Received on Wednesday, 24 October 2012 03:52:28 UTC