Re: design team notes from Maciej Stachowiak on 2011-04-22 (public-iri@w3.org from April 2011)

From: Maciej Stachowiak <mjs@apple.com>
Date: Fri, 22 Apr 2011 11:14:05 -0700
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Peter Saint-Andre <stpeter@stpeter.im>, "public-iri@w3.org" <public-iri@w3.org>
Message-id: <1D109679-DF83-4D5C-9D69-2698F1A7C0F9@apple.com>
On Apr 22, 2011, at 5:33 AM, Julian Reschke wrote:

> On 21.04.2011 17:35, Peter Saint-Andre wrote:
>> 
>> Adam: Ian Hickson thinks we need two things:
>> - parse document url and extract the host (for security purposes / same
>> origin policy)
>> - resolving relative URL (e.g. in script or form)
> 
> I keep hearing this.
> 
> This *is* defined in RFC 3986/3987.
> 
> The ABNFs cover only valid URIs/IRIs, but it's trivial to expand this by just relaxing the character repertoire constraints.
> 
> All that's needed is a simple parser that just acts on the well-defined delimiters. One way to implement such a parser is to just use the regular expression in
> 
>  http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.B
> 
> Does this need a separate document? I don't think so, but it won't hurt *as long* as that document doesn't conflict with what these specs say (in that they treat RFC3986/7-valid identifiers differently than before).

If the regexp, or something similar to it, works out, then great. We can test it and see how close it comes to defining a reasonable intersection of browser behavior. Just from scanning it, though, I can see at least three structural reasons it is not sufficient for HTML5's purposes:

(1) No normative standing, not even as a MAY-level requirement or a defined term, so it's not suitable to be referenced as-is.
(2) Does not provide the hostname, port, or hostport outputs required by HTML5.
(3) Only defines error-tolerant parsing, not error-tolerant and error-preserving resolution against a base.

Beyond that, other issues can be found by testing.

> 
>> Martin:
>> what may also be of interest:
>> 1) Syntax of what to put into the author spec (my personal opinion would
>> be that this should be exactly an IRI)
> 
> Not sure what "author" spec means.
> 
> HTML uses URIs/IRIs in separate places, and there are at least two different contexts in which they need to be parsed, one of which uses whitespace as delimiter between identifiers.
> 
> So special treatment of whitespace will need to be context-dependent.

For HTML purposes, not really. The contexts that take a whitespace-delimited list of URLs split on whitespace before invoking the parsing or resolution algorithms. It's ok if those algorithms also do their own whitespace handling, it will just be a no-op. 

> 
>> 2) Syntax (or whatever other description that makes sense) of what's
>> allowed/reasonable for backwards compatibility
>> 
>> Peter: possible path is to put all the parsing/processing stuff into
>> Adam's document, fast-track that document, and work on 3987bis in parallel
> 
> If this just replicates information from RFC 3986/7, it's harmless, but also not critical at all.

I'm sure it won't just replicate information from RFC 3986/7, since those do not currently contain the information required for Web platform purposes.

> 
>> Julian: We need to partition the work that needs to be done and figure
>> out who is going to do that work. I see three major issues:
>> - do we have a conflict between how browsers parse and what the specs say?
>> - need to clarify handling of non-ASCII characters in query strings
>> - hooks for HTML spec for referencing algorithm to partitioning URIs
>> into components and resolving a reference against a base
>> 
>> Martin: there *are* differences between different browsers w.r.t.
>> parsing and processing
> 
> Yes. Let's collect information about what the differences are, and help the vendors to resolve them; hopefully getting closer to be compliant to 3986/7 for valid identifiers.

[...]

> 
>> Action items...
>> - Adam to publish updated version of draft-barth-url early next week
> 
> Can we *please* first agree on the problem we want to solve?

For purposes of HTML5 and other client-side Web standards, and also from the perspective of most browser implementors, the problem to solve is:

    Define URL processing for URLs found in Web content, with the following constraints:
    - Compatible with existing Web content.
    - Once implemented, interoperable among all browsers and other programs that want to do browser-compatible processing of Web content.
    - Defines behavior in all cases, including errors.

Compatibility with RFC3987 is a nice-to-have, but these requirements take strict priority.

I recognize that not everyone may be interested in solving this problem. That is ok, but please do not try to stop those who do wish to solve it.

Regards,
Maciej
Received on Friday, 22 April 2011 18:14:51 UTC