- From: David Sheets <sheets@alum.mit.edu>
- Date: Fri, 12 Sep 2014 01:05:17 +0100
- To: Austin William Wright <aaa@bzfx.net>
- Cc: Damian Steer <pldms@mac.com>, Semantic Web <semantic-web@w3.org>
On Fri, Sep 12, 2014 at 12:27 AM, Austin William Wright <aaa@bzfx.net> wrote: > Since I maintain URI and IRI libraries, and numerous programs that use URIs > for stating relationships (JSON Schema, RDF Interfaces, Turtle parser, and > more), I'm interested in getting involved, pending some questions about the > purpose of the proposed Community Group. Certainly there's been a lot of > drama, since I sent this message, on public-webapps, www-tag, and > public-w3process about the fork of the "URL" document. Will a Community > Group be able to positively impact the issue? I believe that a Community Group which communicates regularly and openly about its progress on a formal specification will be able to positively affect the present issues. In particular, I think that a Community Group offers a place to work on a well-engineered specification using modern tools without requiring immediate buy-in from existing groups. Once our methods have been demonstrated, I expect work to move to other, more traditional specification venues. > Will we be able to shed light on the Semantic Web uses of the URI, IRI, and > URI Reference? (The current documents seem to think that only Web browsers > consume URIs.) The Semantic Web consumers of URI references (which, to my view, encompasses URL, URI, and IRI) are an important constituency of any URI specification document. However, I do not currently see a place for Semantic Web or Linked Data specific content in such a document. That is not to say that SemWeb concerns shouldn't be considered -- just that SemWeb uses of URI references should be clearly possible but not called out specifically. > Most importantly, I don't think it's necessary -- or even normatively > possible -- to re-define how to parse URIs in HTML or any other spec. This > is normatively done _only_ by RFC 3986 or a published successor that > obsoletes it. I intend to incubate a successor with this Community Group. It is my sincere hope that we will, before the end of 2015, have begun the IETF RFC process for a new URI reference standard. > I would like to see a "URI/IRI API" that correctly uses the RFC3986/3987 > terminology. Would publishing an ECMAScript API be in scope? Yes, publishing an ECMAScript API would eventually be in scope as such an API would expose functions which the specification describes. I am personally the maintainer of the ocaml-uri library <https://github.com/mirage/ocaml-uri> and I would very much like to see a test suite and test oracle for use against ECMAScript and other languages' libraries. Initially, the definition of the ECMAScript API could be sketched but defining it more elaborately should probably wait until the functions being specified are more clear. At that time, it may be the case that the ECMAScript API we propose actually exposes only composites of the specified functions (e.g.: compose parse normalize resolve). > And as mentioned earlier, I'm interested in research into current > implementation bugs of user agents and non-Web applications that consume > IRIs, and if there's a way to fix them that's not (net) harmful. This is > also one of the intended purposes, correct? For instance, there could > possibly be a document describing how to fix invalid URI References, if that > is acceptable (i.e. no "URI Strict Mode"). It's not clear to me if you are referring to fixing implementations or fixing URIs. In general, there doesn't seem to be a valid way to fix URIs that may have been used in a SW context as the only general equality is byte-for-byte. With that said, I am very much interested in specifying functions that consume potentially invalid URIs and normalize them to be valid. If one understands the risks, such a function could be used to "fix" invalid URIs. There are a number of different normalizations: 1. valid -> normal 2. invalid -> valid 3. invalid -> normal Ideally, 3 is 2 compose 1. 1 should be a fixpoint over normal. These functions would be most useful at the publication side and could be used to great effect in careful consumers. > Generally, the goal is to work all the current issues of interoperability > between Working Groups out? Wouldn't e.g. appsawg at the IETF, or another WG > that deals with the URI, also be suited for this purpose? A goal is to work out the issues of interoperability between the Working Groups and the Real World. In addition, another goal is to produce a single specification document that describes as fully as possible the structure and interpretation of URI references, URLs, IRIs, URNs, etc. This single source can then be used to generate a text document, an executable test oracle, theorems about URIs, and potentially an exhaustive test suite. The venues you mention would be the ideal place for this work if the use of formal methods, specifically specification using the Lem <http://www.cl.cam.ac.uk/~pes20/lem/> tool, would be accepted. I do not have high hope that these venues are yet ready for such a proposal. Therefore, I am starting a Community Group in which to incubate this human-readable and machine-executable specification. I believe we should have demonstrable proof that our methods work well and provide value before we approach traditional standardization bodies. I hope that you'll join me in supporting a single, readable source of URI specification which is guaranteed to stay in sync with an executable model and is robust enough to be used to enumerate its own test suite. I will begin with IPv4 and IPv6 address parsing including interface identifiers. I am the primary author of <https://github.com/mirage/ocaml-ipaddr> which does precisely this but does not yet handle interface identifiers. I believe this subcomponent of the specification can easily be written in fewer than 20 hours. Perhaps one of the hardest parts of this specification process will be writing the proofs to demonstrate that high-level properties (e.g. grammars) are satisfied by low-level specifications. Another difficult point will be error recovery and handling. This issue in particular will likely require nearly every syntactic component to allow a error variants which describe the issues with parsing but allow processing to continue. Higher level functions can then specify precisely which, if any, errors are allowed. I understand this is a large amount of work but I believe, together, we can put in place a system of specification that will capture the behavior of URI objects and serve us powerfully for decades to come. Thanks for your interest, David > Thanks, > > Austin. > > On Thu, Sep 11, 2014 at 12:58 PM, David Sheets <sheets@alum.mit.edu> wrote: >> >> On Mon, Aug 18, 2014 at 3:22 PM, Damian Steer <pldms@mac.com> wrote: >> > On 18/08/14 12:54, Austin William Wright wrote: >> >> As the maintainer of a library that converts and parses URIs and IRIs, >> >> as well as many Semantic Web-related libraries that use it, I was >> >> reading through the HTML draft, and it appears that the core ingredient >> >> of RDF and Semantic Web--the URI [1] and IRI [2]--is not, in current >> >> draft, normatively referenced from its key hypertext technology, HTML >> >> [3]. >> > >> > For the lazy, what is being referenced is: >> > >> > <http://url.spec.whatwg.org/> >> > >> > Hmm. >> >> I have just proposed a community group to do this properly. Please >> consider supporting it and beginning the discussion of formal >> specification of URI: >> <http://www.w3.org/community/groups/proposed/#urispec>. >> >> Thanks, >> >> David Sheets >> >> > Damian >> > >> >
Received on Friday, 12 September 2014 00:05:46 UTC