Re: resolving the URL mess

On Fri, Oct 3, 2014 at 2:02 AM, Austin William Wright <aaa@bzfx.net> wrote:
>
>
> On Thu, Oct 2, 2014 at 11:07 AM, Sam Ruby <rubys@intertwingly.net> wrote:
>>
>> On 10/02/2014 06:05 AM, David Sheets wrote:
>>>
>>>
>>>> Anne, Dave Thayer, Sam Ruby, John Klensin come to mind…
>>>
>>>
>>> I believe that when we have something to show, we should entice them to
>>> join us.
>>
>>
>> +1
>>
>> At the moment, a non-existent spec doesn't solve any problem that I
>> currently have.  That's not meant to discourage or encourage you to produce
>> a spec, but merely an agreement that the time that I would get interested is
>> when you have something to show.
>>
>> For what it is worth, examples of problem I do have:
>>
>> 1) Neither RFC 3986 nor RFC 3987 define the content you will find here:
>> https://url.spec.whatwg.org/#api
>
>
> I'm fully behind a standard IRI/URI parsing interface. Let's just release a
> spec that's dedicated to a WebIDL interface for representing and performing
> operations on URIs/IRIs, though. There's no reason it needs to be combined
> into RFC3986/7, any more than XML/HTML and the DOM API need to be combined.

A WebIDL interface for URI manipulation would certainly be desirable
but I'm uncertain if it should be within the scope of this group.
Specifically, I worry that attempting to define a WebIDL interface
before having a specification covering the existing and aspirational
behavior of deployed systems may lead us astray. I think a WebIDL
interface need not be bundled with a specification of the functions on
URI strings and their interpretation, as you say.

Additionally, I think we should have non-WebIDL language consumers in
mind for our specification as well and it seems unlikely that a WebIDL
definition could get traction among WHATWG members who already have a
WebIDL definition.

> Not to mention, the referenced document misuses several well-defined terms.
> I'm not sure what a "pathname" or "hash" is, it probably intends "path"
> and/or "hierpart", and "fragment".
>
>>
>>
>>
>> 2) The WHATWG URL Living Standard makes a large number of normative
>> statements, particularly concerning parsing, that do not reflect current
>> implementations.
>>
>
> Most libraries, including mine, implement RFC3986 and RFC3987 to the letter.
> And many programs _depend_ on this behavior.

Could you please add your library to
<https://github.com/urispec/urispec/wiki/Implementations-and-Use-Cases>?

> My proposal for this CG was to do a formal survey of implementations and
> determine compatibility, and incompatibility. By and large I suspect
> violations of RFC3986 are the exception rather than the rule; that those
> violations tend to be isolated occurrences; and that convergence of behavior
> is easy to implement, especially for well-formed URIs and URI References.
>
> The survey would be started by examining the actual code or logic of all the
> known parsers, and crafting a test suite that covers all of their branches.
> The survey would be value-free: For instance, one application might want to
> raise an error on an invalid character; another might want to split URIs by
> whitespace; another might find it desirable to encode to "+" or "%20" or "_"
> depending on the context; we can't really say, and it doesn't really matter
> for the purpose of conducting the survey.
>
> So here are my proposals for first deliverables:
>
> 1. URI/IRI API
> 2. Survey of implementations

I think we should definitely collect a list of implementations and, if
possible, the easiest means to harness their functionality into a test
suite. I think we should also collect a list of existing test suites
for URIs. I'm hesitant about diving directly into analysis of specific
implementations or crafting of a test suite of our own. In my opinion,
we should look for ways to describe sloppy parsing and then derive an
exhaustive test suite from that specification. This is not an easy
task but is definitely achievable with less than 1 year of labor.

I would encourage you and everyone else following along at home to
contribute at <https://github.com/urispec/urispec/wiki>. I am aware
that using GitHub may be unpalatable to some but I would like to
assure those who bristle at its use that there will soon exist export
and automation tools for our activities there that would be difficult
or impossible to implement on top of the W3C CG infrastructure.

Thanks for your proposals, Austin!

Best regards,

David Sheets

Received on Friday, 3 October 2014 10:40:06 UTC