Re: [whatwg] [url] Feedback from TPAC from Sam Ruby on 2014-11-01 (public-whatwg-archive@w3.org from November 2014)

From: Sam Ruby <rubys@intertwingly.net>
Date: Sat, 01 Nov 2014 07:38:36 -0400
To: Anne van Kesteren <annevk@annevk.nl>
Cc: WHATWG <whatwg@whatwg.org>
Message-ID: <5454C63C.50309@intertwingly.net>
On 11/1/14 5:29 AM, Anne van Kesteren wrote:
> On Sat, Nov 1, 2014 at 1:01 AM, Sam Ruby <rubys@intertwingly.net> wrote:
>> Meanwhile, The IETF is actively working on a update:
>>
>> https://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg-04
>>
>> They are meeting F2F in a little over a week.  URIs in general, and this
>> proposal in specific will be discussed, and for that reason now would be a
>> good time to provide feedback.  I've only quickly scanned it, but it appears
>> sane to me in that it basically says that new schemes will not be viewed as
>> relative schemes.
>
> It doesn't say that. (We should perhaps try to find some way to make
> "{scheme}://" syntax work for schemes that are not problematic (e.g.
> javascript would be problematic). Convincing implementers that it's
> worth implementing might be trickier.)

How should it change?

>> 1) Change the URL Goals to only obsolete RFC 3987, not RFC 3986 too.
>
> See previous threads on the subject. The data models are incompatible,
> at least around "%", likely also around other code points. It also
> seems unacceptable to require two parsers for URLs.

Acknowledging that other parsers exist is quite a different statement 
than requiring two parsers.  I'm only suggesting the former.

As a concrete statement, a compliant implementation of HTML would 
require a URL parser, but not a URI parser.

Also as a concrete statement, such a user agent will interact, primarily 
via the network, with other software that will interpret the 
canonicalized URL's as if they were URIs.

That may not be as we would wish it to be.  But it would be a disservice 
to everyone to document how we would wish things to be rather than how 
they actually are (and, by all indications, are likely to remain for the 
foreseeable future).

>> 3) Explicitly state that canonical URLs (i.e., the output of the URL parse
>> step) not only round trip but also are valid URIs.  If there are any RFC
>> 3986 errata and/or willful violations necessary to make that a true
>> statement, so be it.
>
> It might be interesting to figure out the delta. But there are major
> differences between RFC 3986 and URL. Not obsoleting the former seems
> like a disservice to anyone looking to implement a parser or find
> information on URI/URL.

I do plan to work with others to figure out the delta.  As to the data 
models, at the present time -- and without having actually done the 
necessary analysis -- I am not aware of a single case where they would 
be different.  Undoubtedly we will be able to quickly find some, but 
even so, I would assert that they following statements will remain true 
for the domain of canonicalized URLs, by which I mean the set of 
possible outputs of the URL serializer:

1) the overlap is substantial, and I would dare say overwhelming.

2) RFC 3986 and URL compliant parsers would interpret the same bytes in 
such outputs as delimiters, schemes, paths, fragments, etc.

3) as to data models, the URL Standard is silent as to how such bytes be 
interpreted.  As to the meaning of '%', both the URL Standard and 
RFC3986 recognize that encodings other than utf-8 exist, and that such 
will affect the interpretation of percent encoded byte sequences.

- Sam Ruby
Received on Saturday, 1 November 2014 11:39:03 UTC