Re: Notes on the discussions at TPAC on the RDF Canonicalization issue from Aidan Hogan on 2018-10-29 (www-archive@w3.org from October 2018)

From: Aidan Hogan <aidhog@gmail.com>
Date: Mon, 29 Oct 2018 00:15:14 -0300
To: Ivan Herman <ivan@w3.org>, Manu Sporny <msporny@digitalbazaar.com>
Cc: W3C Public Archives <www-archive@w3.org>, Dan Brickley <danbri@google.com>
Message-ID: <249a21b8-1981-27aa-60c8-e81838b7cfcc@gmail.com>
Hi Ivan, all,

On 28-10-2018 3:00, Ivan Herman wrote:
> (Aidan, it became very much an ad-hoc discussion, so we could not get 
> you in. But you should be part of the discussions, using email. See the 
> notes below.)
> 
> Manu and I discussed how to move forward. Here are the main points; 
> Manu, tell me if there is anything to add.
> 
> 1. A W3C REC in this area is highly unusual, because the core of the 
> spec would be a non-trivial mathematical algorithm. While there are some 
> engineering to do around it (how to store signature in a graph/dataset, 
> the precise set of crypto methods to use for, e.g., hashing, etc), those 
> are easy to do in comparison. In agreement with also Ralph Swick (Aidan: 
> COO of W3C) we need some sort of a solid review of the mathematics 
> involved before moving on.

Right. I think though let's not overestimate the algorithm either. In my 
opinion, what's defined in something like the OWL 2 Direct Semantics 
docs is far more technical than what we're putting forward. I personally 
don't see this as breaking any new ground for the W3C in terms of 
mathematical detail (though of course I get what is meant).

> 2. We have two inputs: one are the two papers of Aidan[1][2], and the 
> other the draft published by Manu & David[3]. (Any others?)

In Section 7 of the journal paper:

http://aidanhogan.com/docs/rdf-canonicalisation.pdf

... I gave a summary of all the related approaches that I'm aware of in 
the research literature. Some of the folks mentioned there may be 
interested in joining, in particular for example:

[33] Tobias Kuhn and Michel Dumontier. 2014. Trusty URIs: Verifiable, 
Immutable, and Permanent Digital Artifacts for
Linked Data. In ESWC. 395–410

This is not quite in the line of what we've discussed, but perhaps they 
would be interested to be involved. That section might be worth looking 
over as well for other pointers to potentially interested people.

> 3. Aidan's algorithm has undergone a rigorous peer review for the WWW 
> and the journal versions; although the algorithm has to be extended from 
> RDF graphs to RDF datasets, that step seems to be obvious (and has been 
> already outlined by Ivan)[4].

I think perhaps the most rigorous review was by Ivan reimplementing the 
methods, but I guess a few rounds of peer-review don't hurt either. :) 
They were detailed reviews that helped improve the papers.

> 4. There is a need to get a similar writeup and peer review for the 
> algorithm of Manu & David. Digital Bazaar will start the process of (a) 
> writing down the algorithm in mathematical terms, much as Aidan's paper, 
> and (b) get the results essentially peer-reviewed. We will have to find 
> reviewers for step (b). (Whether that result would be published in a 
> journal/conference is orthogonal to the review. Actually, a 'direct' 
> peer review is probably faster…)

I'm not sure if I should be involved in this? I guess perhaps it would 
be better for Manu and David to push this.

I do promise though to take the time to try to understand in more detail 
the algorithm proposed by Manu and David. My understanding is that this 
is the URDNA2015 algorithm described in this document?

http://json-ld.github.io/normalization/spec/index.html

If the decision is to go with this algorithm, I will try to support that 
in whatever way I can. My guess is also that there is probably not much 
difference between both proposals.

Regarding finding reviewers, I think the best reviewers for a more 
mathematical-style proposal would actually be authors of papers from the 
traditional graph isomorphism world, not necessarily the RDF world. 
Ideally, all we are doing is adding a very thin RDF skin to what they've 
worked on for decades.

> 5. As a tentative goal, it would be great to give an overview of what we 
> may try (together with the first results of the peer review for [3]) at 
> the W3C Workshop on Web Standardization for Graph Data[5]. (That may be 
> a bit too tight, though.)

Perhaps.

> 6. Once [4] is done, we will have to start discussions on how to "merge" 
> (algorithmically) the two inputs to end up with a unified, standard one, 
> although part of that work can also be done in a Working Group.

Okay!

> 7. The real goal is to achieve the process such that a draft WG charter 
> could be prepared, and presented, at the TPAC 2019, ie, September 2019. 
> The scope of such a WG still has to be discussed: does it include 
> signatures of Linked Data in general, or should it focus on the 
> canonicalization only. To be seen.

Sounds feasible. I guess in any case, the first step is RDF datasets.

> Manu, did I forget anything? Aidan, does this sound reasonable to you?

Sure!

Not sure if it's at all relevant for the WG but thought I'd mentioned we 
published a paper at ISWC on canonicalising SPARQL queries as well:

Jaime Salas and Aidan Hogan. "Canonicalisation of Monotone SPARQL 
Queries". In the Proceedings of the 17th International Semantic Web 
Conference (ISWC), Monterey, USA, October 8–12, 2018.
http://aidanhogan.com/qcan/extended.pdf

It doesn't (cannot) cover all of SPARQL, only BGPs with unions and 
projection; bag and set semantics are covered as well. My guess is that 
it is not relevant right now, but thought I'd mention it just in case.

Cheers,
Aidan


> Ivan
> 
> 
> [1] http://www.www2015.it/documents/proceedings/proceedings/p430.pdf
> [2] http://aidanhogan.com/docs/rdf-canonicalisation.pdf
> [3] http://json-ld.github.io/normalization/spec/index.html
> [4] https://github.com/iherman/canonical_rdf
> [5] https://www.w3.org/Data/events/data-ws-2019/
> 
> 
> ----
> Ivan Herman, W3C
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: https://orcid.org/0000-0003-0782-2704
>
Received on Monday, 29 October 2018 03:15:38 UTC