Re: HTML and URI References compatability conserns from martin.hepp@ebusiness-unibw.org on 2014-09-12 (semantic-web@w3.org from September 2014)

From: <martin.hepp@ebusiness-unibw.org>
Date: Fri, 12 Sep 2014 08:32:25 +0200
To: David Sheets <sheets@alum.mit.edu>
Cc: Austin William Wright <aaa@bzfx.net>, Damian Steer <pldms@mac.com>, Semantic Web <semantic-web@w3.org>
Message-Id: <EDDCB048-3842-4C69-BA42-7F3E00B54EBC@ebusiness-unibw.org>
Dear David:

FYI: There was a comprehensive discussion on URI comparison in the Semantic Web in the mailing list archive, starting with

    http://lists.w3.org/Archives/Public/public-lod/2011Jan/0134.html

It would be good to consider these aspects when evolving any URI/IRI-relates specs.

Best wishes
Martin

-------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  martin.hepp@unibw.de
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/


On 12 Sep 2014, at 02:05, David Sheets <sheets@alum.mit.edu> wrote:

> On Fri, Sep 12, 2014 at 12:27 AM, Austin William Wright <aaa@bzfx.net> wrote:
>> Since I maintain URI and IRI libraries, and numerous programs that use URIs
>> for stating relationships (JSON Schema, RDF Interfaces, Turtle parser, and
>> more), I'm interested in getting involved, pending some questions about the
>> purpose of the proposed Community Group. Certainly there's been a lot of
>> drama, since I sent this message, on public-webapps, www-tag, and
>> public-w3process about the fork of the "URL" document. Will a Community
>> Group be able to positively impact the issue?
> 
> I believe that a Community Group which communicates regularly and
> openly about its progress on a formal specification will be able to
> positively affect the present issues. In particular, I think that a
> Community Group offers a place to work on a well-engineered
> specification using modern tools without requiring immediate buy-in
> from existing groups. Once our methods have been demonstrated, I
> expect work to move to other, more traditional specification venues.
> 
>> Will we be able to shed light on the Semantic Web uses of the URI, IRI, and
>> URI Reference? (The current documents seem to think that only Web browsers
>> consume URIs.)
> 
> The Semantic Web consumers of URI references (which, to my view,
> encompasses URL, URI, and IRI) are an important constituency of any
> URI specification document. However, I do not currently see a place
> for Semantic Web or Linked Data specific content in such a document.
> That is not to say that SemWeb concerns shouldn't be considered --
> just that SemWeb uses of URI references should be clearly possible but
> not called out specifically.
> 
>> Most importantly, I don't think it's necessary -- or even normatively
>> possible -- to re-define how to parse URIs in HTML or any other spec. This
>> is normatively done _only_ by RFC 3986 or a published successor that
>> obsoletes it.
> 
> I intend to incubate a successor with this Community Group. It is my
> sincere hope that we will, before the end of 2015, have begun the IETF
> RFC process for a new URI reference standard.
> 
>> I would like to see a "URI/IRI API" that correctly uses the RFC3986/3987
>> terminology. Would publishing an ECMAScript API be in scope?
> 
> Yes, publishing an ECMAScript API would eventually be in scope as such
> an API would expose functions which the specification describes. I am
> personally the maintainer of the ocaml-uri library
> <https://github.com/mirage/ocaml-uri> and I would very much like to
> see a test suite and test oracle for use against ECMAScript and other
> languages' libraries.
> 
> Initially, the definition of the ECMAScript API could be sketched but
> defining it more elaborately should probably wait until the functions
> being specified are more clear. At that time, it may be the case that
> the ECMAScript API we propose actually exposes only composites of the
> specified functions (e.g.: compose parse normalize resolve).
> 
>> And as mentioned earlier, I'm interested in research into current
>> implementation bugs of user agents and non-Web applications that consume
>> IRIs, and if there's a way to fix them that's not (net) harmful. This is
>> also one of the intended purposes, correct? For instance, there could
>> possibly be a document describing how to fix invalid URI References, if that
>> is acceptable (i.e. no "URI Strict Mode").
> 
> It's not clear to me if you are referring to fixing implementations or
> fixing URIs. In general, there doesn't seem to be a valid way to fix
> URIs that may have been used in a SW context as the only general
> equality is byte-for-byte. With that said, I am very much interested
> in specifying functions that consume potentially invalid URIs and
> normalize them to be valid. If one understands the risks, such a
> function could be used to "fix" invalid URIs.
> 
> There are a number of different normalizations:
> 
> 1. valid -> normal
> 2. invalid -> valid
> 3. invalid -> normal
> 
> Ideally, 3 is 2 compose 1. 1 should be a fixpoint over normal. These
> functions would be most useful at the publication side and could be
> used to great effect in careful consumers.
> 
>> Generally, the goal is to work all the current issues of interoperability
>> between Working Groups out? Wouldn't e.g. appsawg at the IETF, or another WG
>> that deals with the URI, also be suited for this purpose?
> 
> A goal is to work out the issues of interoperability between the
> Working Groups and the Real World. In addition, another goal is to
> produce a single specification document that describes as fully as
> possible the structure and interpretation of URI references, URLs,
> IRIs, URNs, etc. This single source can then be used to generate a
> text document, an executable test oracle, theorems about URIs, and
> potentially an exhaustive test suite.
> 
> The venues you mention would be the ideal place for this work if the
> use of formal methods, specifically specification using the Lem
> <http://www.cl.cam.ac.uk/~pes20/lem/> tool, would be accepted. I do
> not have high hope that these venues are yet ready for such a
> proposal. Therefore, I am starting a Community Group in which to
> incubate this human-readable and machine-executable specification. I
> believe we should have demonstrable proof that our methods work well
> and provide value before we approach traditional standardization
> bodies.
> 
> I hope that you'll join me in supporting a single, readable source of
> URI specification which is guaranteed to stay in sync with an
> executable model and is robust enough to be used to enumerate its own
> test suite. I will begin with IPv4 and IPv6 address parsing including
> interface identifiers. I am the primary author of
> <https://github.com/mirage/ocaml-ipaddr> which does precisely this but
> does not yet handle interface identifiers. I believe this subcomponent
> of the specification can easily be written in fewer than 20 hours.
> 
> Perhaps one of the hardest parts of this specification process will be
> writing the proofs to demonstrate that high-level properties (e.g.
> grammars) are satisfied by low-level specifications. Another difficult
> point will be error recovery and handling. This issue in particular
> will likely require nearly every syntactic component to allow a error
> variants which describe the issues with parsing but allow processing
> to continue. Higher level functions can then specify precisely which,
> if any, errors are allowed.
> 
> I understand this is a large amount of work but I believe, together,
> we can put in place a system of specification that will capture the
> behavior of URI objects and serve us powerfully for decades to come.
> 
> Thanks for your interest,
> 
> David
> 
>> Thanks,
>> 
>> Austin.
>> 
>> On Thu, Sep 11, 2014 at 12:58 PM, David Sheets <sheets@alum.mit.edu> wrote:
>>> 
>>> On Mon, Aug 18, 2014 at 3:22 PM, Damian Steer <pldms@mac.com> wrote:
>>>> On 18/08/14 12:54, Austin William Wright wrote:
>>>>> As the maintainer of a library that converts and parses URIs and IRIs,
>>>>> as well as many Semantic Web-related libraries that use it, I was
>>>>> reading through the HTML draft, and it appears that the core ingredient
>>>>> of RDF and Semantic Web--the URI [1] and IRI [2]--is not, in current
>>>>> draft, normatively referenced from its key hypertext technology, HTML
>>>>> [3].
>>>> 
>>>> For the lazy, what is being referenced is:
>>>> 
>>>> <http://url.spec.whatwg.org/>
>>>> 
>>>> Hmm.
>>> 
>>> I have just proposed a community group to do this properly. Please
>>> consider supporting it and beginning the discussion of formal
>>> specification of URI:
>>> <http://www.w3.org/community/groups/proposed/#urispec>.
>>> 
>>> Thanks,
>>> 
>>> David Sheets
>>> 
>>>> Damian
>>>> 
>>> 
>> 
>
Received on Friday, 12 September 2014 06:32:54 UTC