Re: Thoughts on the LDS WG chartering discussion from David Booth on 2021-06-10 (semantic-web@w3.org from June 2021)

From: David Booth <david@dbooth.org>
Date: Thu, 10 Jun 2021 19:13:19 -0400
To: semantic-web@w3.org
Message-ID: <19f69a78-d6ab-9bc7-3db4-195b9b1b39db@dbooth.org>
On 6/10/21 11:08 AM, Ivan Herman wrote:
>> On 10 Jun 2021, at 16:13, David Booth <david@dbooth.org 
>> I still feel like I am somehow missing a fundamental assumption that 
>> others are making and I have not yet been able to identify.
> 
> I wonder whether the misunderstanding is not the following: how do you 
> calculate the canonical N-Quads? What will be the bnode labels?

Certainly they need to be canonicalized using an algorithm like 
URDNA2015 or Aidan's algorithm.  Otherwise it would not be canonical 
N-Quads!

> What the canonicalization algorithm does is to calculate the canonical 
> bnode labels. I guess you could describe the algorithm as working on a 
> quad representation of the RDF dataset, essentially transforming the 
> quads by relabeling the bnode labels to a canonical version. But that is 
> mathematically equivalent to making the same calculation on the abstract 
> RDF data model. In this respect, the n-quads and the abstract model is 
> essentially equivalent…

Okay, I didn't realize you were viewing the abstract canonicalization 
and the canonical N-Quads serialization as essentially equivalent, since
the Explainer document makes a point of distinguishing them: 
"Canonicalization, as used in the context of this document and the 
proposed charter, is indeed defined on an abstract data model (i.e., on 
RDF Dataset [rdf11-concepts]), regardless of a specific serialization." 
  In short, it sounds like we agree that only an N-Quads 
canonicalization is *necessary*, but you view the N-Quads 
canonicalization as essentially equivalent to an abstract RDF Dataset 
canonicalization.

Also, I should perhaps point out (though this is a bit pedantic) that 
canonicalization really only applies to serialization anyway, because 
blank node labels do not exist in abstract RDF Datasets.   This fact was 
the source of some of my puzzlement when I read the proposed charter, 
because the charter and the Explainer talk about canonicalizing the 
abstract RDF Dataset. But I eventually managed to convince myself that 
the charter and explainer were just being slightly sloppy in 
terminology.  In reality, the proposed abstract canonicalization 
algorithm does not produce a canonicalized RDF Dataset; rather it 
produces a pair: an isomorphic RDF Dataset and a bijection from the 
blank nodes in that dataset to a set of blank node *labels*. 
Fortunately, the "RDF Dataset Canonicalization" document is a bit more 
precise about this.
https://json-ld.github.io/rdf-dataset-canonicalization/spec/index.html

The other thing that I still fundamentally do not yet grasp about the 
proposed charter is this: Why is it restricted to RDF source documents? 
   Clearly the canonicalization algorithm is about RDF, so that much I 
understand.  But for the digital signature vocabulary, why wouldn't it 
also be useful to be able to sign, say, a PDF document?  Why should the 
RDF signing vocabulary be limited to talking about RDF documents?  Or am 
I misunderstanding the intent here?   Perhaps if there were a simple, 
complete example, it would help.  Again, I feel like I am missing some 
of the assumed context.

Thanks,
David Booth
Received on Thursday, 10 June 2021 23:14:15 UTC