Re: HTML and URI References compatability conserns from David Sheets on 2014-09-12 (semantic-web@w3.org from September 2014)

From: David Sheets <sheets@alum.mit.edu>
Date: Fri, 12 Sep 2014 22:45:05 +0100
To: "martin.hepp@ebusiness-unibw.org" <martin.hepp@ebusiness-unibw.org>
Cc: Austin William Wright <aaa@bzfx.net>, Damian Steer <pldms@mac.com>, Semantic Web <semantic-web@w3.org>
Message-ID: <CAAWM5Tyu3JwdhiLCRxBqgwgxsbfiV8UZW3kkdUizfFWa_6TYHQ@mail.gmail.com>
On Fri, Sep 12, 2014 at 7:32 AM, martin.hepp@ebusiness-unibw.org
<martin.hepp@ebusiness-unibw.org> wrote:
> Dear David:
>
> FYI: There was a comprehensive discussion on URI comparison in the Semantic Web in the mailing list archive, starting with
>
>     http://lists.w3.org/Archives/Public/public-lod/2011Jan/0134.html
>
> It would be good to consider these aspects when evolving any URI/IRI-relates specs.

Thanks, Martin. That was very interesting reading. I will begin a
collection of resources regarding functions over URIs and place it in
the soon to be announced specification source repository.

Best regards,

David

> Best wishes
> Martin
>
> -------------------------------------------------------
> martin hepp
> e-business & web science research group
> universitaet der bundeswehr muenchen
>
> e-mail:  martin.hepp@unibw.de
> phone:   +49-(0)89-6004-4217
> fax:     +49-(0)89-6004-4620
> www:     http://www.unibw.de/ebusiness/ (group)
>          http://www.heppnetz.de/ (personal)
> skype:   mfhepp
> twitter: mfhepp
>
> Check out GoodRelations for E-Commerce on the Web of Linked Data!
> =================================================================
> * Project Main Page: http://purl.org/goodrelations/
>
>
> On 12 Sep 2014, at 02:05, David Sheets <sheets@alum.mit.edu> wrote:
>
>> On Fri, Sep 12, 2014 at 12:27 AM, Austin William Wright <aaa@bzfx.net> wrote:
>>> Since I maintain URI and IRI libraries, and numerous programs that use URIs
>>> for stating relationships (JSON Schema, RDF Interfaces, Turtle parser, and
>>> more), I'm interested in getting involved, pending some questions about the
>>> purpose of the proposed Community Group. Certainly there's been a lot of
>>> drama, since I sent this message, on public-webapps, www-tag, and
>>> public-w3process about the fork of the "URL" document. Will a Community
>>> Group be able to positively impact the issue?
>>
>> I believe that a Community Group which communicates regularly and
>> openly about its progress on a formal specification will be able to
>> positively affect the present issues. In particular, I think that a
>> Community Group offers a place to work on a well-engineered
>> specification using modern tools without requiring immediate buy-in
>> from existing groups. Once our methods have been demonstrated, I
>> expect work to move to other, more traditional specification venues.
>>
>>> Will we be able to shed light on the Semantic Web uses of the URI, IRI, and
>>> URI Reference? (The current documents seem to think that only Web browsers
>>> consume URIs.)
>>
>> The Semantic Web consumers of URI references (which, to my view,
>> encompasses URL, URI, and IRI) are an important constituency of any
>> URI specification document. However, I do not currently see a place
>> for Semantic Web or Linked Data specific content in such a document.
>> That is not to say that SemWeb concerns shouldn't be considered --
>> just that SemWeb uses of URI references should be clearly possible but
>> not called out specifically.
>>
>>> Most importantly, I don't think it's necessary -- or even normatively
>>> possible -- to re-define how to parse URIs in HTML or any other spec. This
>>> is normatively done _only_ by RFC 3986 or a published successor that
>>> obsoletes it.
>>
>> I intend to incubate a successor with this Community Group. It is my
>> sincere hope that we will, before the end of 2015, have begun the IETF
>> RFC process for a new URI reference standard.
>>
>>> I would like to see a "URI/IRI API" that correctly uses the RFC3986/3987
>>> terminology. Would publishing an ECMAScript API be in scope?
>>
>> Yes, publishing an ECMAScript API would eventually be in scope as such
>> an API would expose functions which the specification describes. I am
>> personally the maintainer of the ocaml-uri library
>> <https://github.com/mirage/ocaml-uri> and I would very much like to
>> see a test suite and test oracle for use against ECMAScript and other
>> languages' libraries.
>>
>> Initially, the definition of the ECMAScript API could be sketched but
>> defining it more elaborately should probably wait until the functions
>> being specified are more clear. At that time, it may be the case that
>> the ECMAScript API we propose actually exposes only composites of the
>> specified functions (e.g.: compose parse normalize resolve).
>>
>>> And as mentioned earlier, I'm interested in research into current
>>> implementation bugs of user agents and non-Web applications that consume
>>> IRIs, and if there's a way to fix them that's not (net) harmful. This is
>>> also one of the intended purposes, correct? For instance, there could
>>> possibly be a document describing how to fix invalid URI References, if that
>>> is acceptable (i.e. no "URI Strict Mode").
>>
>> It's not clear to me if you are referring to fixing implementations or
>> fixing URIs. In general, there doesn't seem to be a valid way to fix
>> URIs that may have been used in a SW context as the only general
>> equality is byte-for-byte. With that said, I am very much interested
>> in specifying functions that consume potentially invalid URIs and
>> normalize them to be valid. If one understands the risks, such a
>> function could be used to "fix" invalid URIs.
>>
>> There are a number of different normalizations:
>>
>> 1. valid -> normal
>> 2. invalid -> valid
>> 3. invalid -> normal
>>
>> Ideally, 3 is 2 compose 1. 1 should be a fixpoint over normal. These
>> functions would be most useful at the publication side and could be
>> used to great effect in careful consumers.
>>
>>> Generally, the goal is to work all the current issues of interoperability
>>> between Working Groups out? Wouldn't e.g. appsawg at the IETF, or another WG
>>> that deals with the URI, also be suited for this purpose?
>>
>> A goal is to work out the issues of interoperability between the
>> Working Groups and the Real World. In addition, another goal is to
>> produce a single specification document that describes as fully as
>> possible the structure and interpretation of URI references, URLs,
>> IRIs, URNs, etc. This single source can then be used to generate a
>> text document, an executable test oracle, theorems about URIs, and
>> potentially an exhaustive test suite.
>>
>> The venues you mention would be the ideal place for this work if the
>> use of formal methods, specifically specification using the Lem
>> <http://www.cl.cam.ac.uk/~pes20/lem/> tool, would be accepted. I do
>> not have high hope that these venues are yet ready for such a
>> proposal. Therefore, I am starting a Community Group in which to
>> incubate this human-readable and machine-executable specification. I
>> believe we should have demonstrable proof that our methods work well
>> and provide value before we approach traditional standardization
>> bodies.
>>
>> I hope that you'll join me in supporting a single, readable source of
>> URI specification which is guaranteed to stay in sync with an
>> executable model and is robust enough to be used to enumerate its own
>> test suite. I will begin with IPv4 and IPv6 address parsing including
>> interface identifiers. I am the primary author of
>> <https://github.com/mirage/ocaml-ipaddr> which does precisely this but
>> does not yet handle interface identifiers. I believe this subcomponent
>> of the specification can easily be written in fewer than 20 hours.
>>
>> Perhaps one of the hardest parts of this specification process will be
>> writing the proofs to demonstrate that high-level properties (e.g.
>> grammars) are satisfied by low-level specifications. Another difficult
>> point will be error recovery and handling. This issue in particular
>> will likely require nearly every syntactic component to allow a error
>> variants which describe the issues with parsing but allow processing
>> to continue. Higher level functions can then specify precisely which,
>> if any, errors are allowed.
>>
>> I understand this is a large amount of work but I believe, together,
>> we can put in place a system of specification that will capture the
>> behavior of URI objects and serve us powerfully for decades to come.
>>
>> Thanks for your interest,
>>
>> David
>>
>>> Thanks,
>>>
>>> Austin.
>>>
>>> On Thu, Sep 11, 2014 at 12:58 PM, David Sheets <sheets@alum.mit.edu> wrote:
>>>>
>>>> On Mon, Aug 18, 2014 at 3:22 PM, Damian Steer <pldms@mac.com> wrote:
>>>>> On 18/08/14 12:54, Austin William Wright wrote:
>>>>>> As the maintainer of a library that converts and parses URIs and IRIs,
>>>>>> as well as many Semantic Web-related libraries that use it, I was
>>>>>> reading through the HTML draft, and it appears that the core ingredient
>>>>>> of RDF and Semantic Web--the URI [1] and IRI [2]--is not, in current
>>>>>> draft, normatively referenced from its key hypertext technology, HTML
>>>>>> [3].
>>>>>
>>>>> For the lazy, what is being referenced is:
>>>>>
>>>>> <http://url.spec.whatwg.org/>
>>>>>
>>>>> Hmm.
>>>>
>>>> I have just proposed a community group to do this properly. Please
>>>> consider supporting it and beginning the discussion of formal
>>>> specification of URI:
>>>> <http://www.w3.org/community/groups/proposed/#urispec>.
>>>>
>>>> Thanks,
>>>>
>>>> David Sheets
>>>>
>>>>> Damian
>>>>>
>>>>
>>>
>>
>
Received on Friday, 12 September 2014 21:45:34 UTC