Re: PSA: publishing new WD of URL spec from David Sheets on 2014-09-11 (www-tag@w3.org from September 2014)

From: David Sheets <kosmo.zb@gmail.com>
Date: Thu, 11 Sep 2014 19:11:58 +0100
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Marcos Caceres <marcos@marcosc.com>, Robin Berjon <robin@w3.org>, Arthur Barstow <art.barstow@gmail.com>, "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <CAAWM5TxBBjEftgMdt9YDvs4W95D=T1BG+WNj40X6tkdWDPTRcQ@mail.gmail.com>

On Thu, Sep 11, 2014 at 6:25 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 2014-09-11 18:19, Marcos Caceres wrote:
>>
>>
>>
>>
>> On September 11, 2014 at 11:58:58 AM, Julian Reschke
>> (julian.reschke@greenbytes.de) wrote:
>>>>
>>>> In which case the WHATWG version wouldn't be "canonical" anymore
>>>
>>> anyway.
>>
>>
>> "The proof is in the pudding", as they say. I read a recent blog post that
>> indicated that the IETF failed wrt maintaining the URL specs [1]. I'm
>> optimistic that the WHATWG can handle the task, as browsers are by far the
>> largest and most dependent consumers of URLs of all types. In this sense,
>> the WHATWG URL spec is the most up to date. The bits missing in [1], like
>> registration, can easily be handled in the WHATWG wiki (as is already done
>> for other things).
>>
>> [1] http://masinter.blogspot.ca/2014/09/the-url-mess.html
>
>
> "largest and most dependent" maybe, but that doesn't mean that nobody else
> cares. This is an area where it's not sufficient to reverse-engineer what
> browsers do and document that.

As a non-browser implementor, I do not find the WHATWG URL spec very
helpful. I am also concerned that it is detrimental to the larger
ecosystem of communications and software.

In particular, the WHATWG tactic of describing imperative, stateful
routines for parsing using English prose is very difficult to use
effectively for anyone except implementors of exactly duplicate
functionality writing in an imperative style. In many cases, library
and system implementors wish to offer extra, compatible functionality
that conforms to pre- and post-conditions and obeys certain
compositional equalities. This specification gives no high-level
information about the properties of URIs.

At one point, I had a brief conversation with the editor of the WHATWG
URL specification where I proposed the following:

1. Describe the functions being specified: e.g. parsing, normalizing,
resolving, serializing.
2. Specify the pre- and post-conditions of those functions: via formal
grammars or properties of components.
3. Define the specification in a way that is amenable to both human
and machine reasoning.

I was told, essentially, that formalism was dead and these concerns
didn't matter. If there is space for it, I would be very interested in
contributing a lot (months of labor) of time and effort into producing
a higher-level, more formal, and compatible specification to WHATWG
URL.

I believe that computers are tools which allow us to mechanize a great
number of low-level details and demonstrate that high-level properties
hold. I believe a high-level specification could *automatically*
*generate* the English prose in the present specification and offer
general theorems about URI behavior. Such a specification could also
be used to generate test cases, act as a test oracle, and explain
low-level parsing behavior.

I am presently applying a similar treatment to the entirety of the
POSIX 2008 file system API.

I believe that accuracy and usability of the URI specification is
crucially important to the long-term health of the Web. Neither the
existing IETF specification nor the WHATWG URL specification achieve
both accuracy and usability. I would like to see a venue for work to
take place on the actual formal specification of URI that humanity
deserves. Could W3C be that venue?

Thanks,

David Sheets

Received on Thursday, 11 September 2014 18:12:31 UTC