Re: design team notes from Julian Reschke on 2011-04-22 (public-iri@w3.org from April 2011)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Fri, 22 Apr 2011 23:42:37 +0200
To: Maciej Stachowiak <mjs@apple.com>
CC: Peter Saint-Andre <stpeter@stpeter.im>, "public-iri@w3.org" <public-iri@w3.org>
Message-ID: <4DB1F64D.4060803@gmx.de>
On 22.04.2011 20:14, Maciej Stachowiak wrote:
> ...
> If the regexp, or something similar to it, works out, then great. We can test it and see how close it comes to defining a reasonable intersection of browser behavior. Just from scanning it, though, I can see at least three structural reasons it is not sufficient for HTML5's purposes:
>
> (1) No normative standing, not even as a MAY-level requirement or a defined term, so it's not suitable to be referenced as-is.

It doesn't need to, as long as it's supposed to do the same thing as the 
ABNF.

> (2) Does not provide the hostname, port, or hostport outputs required by HTML5.

Yes.

> (3) Only defines error-tolerant parsing, not error-tolerant and error-preserving resolution against a base.

Once you have the components, you can apply the rules for resolution to 
those, independently of how you obtained the components.

If you believe that doing that needs a few more bits of normative spec, 
fine. What's interesting here is whether we can *use* this or need an 
entirely new spec.

> Beyond that, other issues can be found by testing.

Indeed.

>> HTML uses URIs/IRIs in separate places, and there are at least two different contexts in which they need to be parsed, one of which uses whitespace as delimiter between identifiers.
>>
>> So special treatment of whitespace will need to be context-dependent.
>
> For HTML purposes, not really. The contexts that take a whitespace-delimited list of URLs split on whitespace before invoking the parsing or resolution algorithms. It's ok if those algorithms also do their own whitespace handling, it will just be a no-op.

It it's ok for those parts then I assume it would be ok for other parts 
as well.

>>> 2) Syntax (or whatever other description that makes sense) of what's
>>> allowed/reasonable for backwards compatibility
>>>
>>> Peter: possible path is to put all the parsing/processing stuff into
>>> Adam's document, fast-track that document, and work on 3987bis in parallel
>>
>> If this just replicates information from RFC 3986/7, it's harmless, but also not critical at all.
>
> I'm sure it won't just replicate information from RFC 3986/7, since those do not currently contain the information required for Web platform purposes.

And it's that delta I'd like to know.

> For purposes of HTML5 and other client-side Web standards, and also from the perspective of most browser implementors, the problem to solve is:
>
>      Define URL processing for URLs found in Web content, with the following constraints:
>      - Compatible with existing Web content.
>      - Once implemented, interoperable among all browsers and other programs that want to do browser-compatible processing of Web content.
>      - Defines behavior in all cases, including errors.

i) would it be sufficient to define preprocessing and then delegate to 
existing specs?

ii) how do measure "interoperable" and "compatible"? For instance, if 
Webkit does A and IE does B, and B conforms to the specs, what does that 
mean?

iii) even if all browsers "interoperate", but other widely deployed 
other consumers do not (such as URI parsing libs), what does that mean?

> Compatibility with RFC3987 is a nice-to-have, but these requirements take strict priority.
>
> I recognize that not everyone may be interested in solving this problem. That is ok, but please do not try to stop those who do wish to solve it.

I'm trying to understand the nature of the problem first. I have some 
idea about some aspects (whitespace handling, I18N in query parms, \ vs 
/, ...). I'm not sure this is a complete list, though.


Best regards, Julian
Received on Friday, 22 April 2011 21:43:17 UTC