- From: David Booth <david@dbooth.org>
- Date: Thu, 07 Aug 2014 23:21:13 -0400
- To: public-rdf-shapes@w3.org
Hi Jeremy, On 08/07/2014 10:13 AM, Jeremy J Carroll wrote: > It is easy to forget that in general RDF canonicalization is > Graph-Isomorphism complete, and hence too difficult for production use > at scale. [1] For unrestricted RDF, yes, but for "Well Behaved RDF" canonicalization is very feasible: http://dbooth.org/2013/well-behaved-rdf/Booth-well-behaved-rdf.pdf You showed this in [1] long before the term "Well Behaved RDF" was coined. > > On the other hand, within any particular application domain, which is > the scope of the users of the proposed working group, normalizing an RDF > graph tends to be fairly straightforward. It is, but it's also a tedious complete waste of time when a general purpose canonicalization tool could be defined and used for *all* applications that are willing to use Well Behaved RDF. The paper by Hogan, Arenas, Mallea and Polleres on "Everything You Always Wanted to Know About Blank Nodes" http://www.websemanticsjournal.org/index.php/ps/article/download/365/387 shows that the vast majority of RDF does not need the problematic uses of blank nodes that cause difficulty in canonicalization. Most uses of blank nodes are benign, like the implicit blank nodes generated by Turtle list "( ... )" syntax and square bracket "[ ... ]" syntax. > > Mindful of this I suggest: > > Section 1 > Replace; > [[ > In addressing these issues, the WG will consider whether it is > necessary, practical or desireable to normalize a graph prior to > validation. That is, whether an algorithm can and should be defined that > creates a canonical form of a given graph. > ]] > With > [[ > In addressing these issues, the WG will consider whether it is > necessary, practical or desireable to normalize a graph as part of > validation. That is, whether an algorithm can and should be defined that > creates a representation of a given graph, or an equivalent graph, that > is canonical for the purpose of processing with respect to a specific > machine-readable interface definition. > ]] > > Rationale: the answer to the current question "should such an algorithm > be defined" is simply "no, it should not" > I weaken the question to indicate that the algorithm is part of > validation, not prior, and that the canonicalization is not independent > of the application but application dependent. > > Section 3: > Replace: > [[ > The WG *MAY* produce a Recommendation for *graph normalization*. > ]] > With > [[ > 3. OPTIONAL - A graph normalization method, suitable for the use cases > determined by the group. This should not be a general purpose RDF > canonicalization algorithm, see [1]. > ]] > Rationale: consistent styling with other deliverables; restricting scope > to avoid the impossible. > > [1] > http://www.hpl.hp.com/techreports/2003/HPL-2003-142.pdf The fact that a canonicalization algorithm may fail to finish on some problematic input does *not* mean that it is useless to define or implement such an algorithm. It just means that the users of that algorithm must be aware if its limitations. And those limitations are very modest: all one has to do is avoid explicit blank nodes. Implicit blank nodes are fine. However, the reason I'm pushing for canonicalization has little to do with RDF "shape" validation. It is more about the broader problem of the need to compare two RDF graphs for "equality" for purposes such as regression testing, which is essential to almost any significant software project. I think it's crazy that we (the RDF community) are promoting the information representation that we think is so great, but it has this blatant gaping flaw: two RDF graphs cannot be easily compared for "equality"! So I saw canonicalization mentioned in the charter and opportunistically thought "Hey, maybe we can *finally* get RDF canonicalization standardized!" I don't view RDF canonicalization as essential for shape validation, but I *do* view it as essential to the future of RDF. David
Received on Friday, 8 August 2014 03:21:42 UTC