- From: Martynas Jusevičius <martynas@atomgraph.com>
- Date: Tue, 8 Jun 2021 12:15:31 +0200
- To: Peter Patel-Schneider <pfpschneider@gmail.com>
- Cc: Semantic Web <semantic-web@w3.org>
Peter, I have tried to implement your canonicalization algorithm as SPARQL: # SELECT (GROUP_CONCAT(?quadStr ; separator='\n') AS ?nQuads) SELECT (SHA1(GROUP_CONCAT(?quadStr ; separator=' \n')) AS ?hash) WHERE { { SELECT DISTINCT ?g ?s ?p ?o WHERE { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } } } BIND(concat("<", str(?s), ">") AS ?sStr) BIND(concat("<", str(?p), ">") AS ?pStr) BIND(if(isURI(?o), concat("<", str(?o), ">"), concat("\"", str(?o), "\"", if(( lang(?o) != "" ), concat("@", str(lang(?o))), concat("^^<", str(datatype(?o)), ">")))) AS ?oStr) BIND(concat(?sStr, " ", ?pStr, " ", ?oStr, " ", if(bound(?g), concat("<", str(?g), ">", " "), ""), ".") AS ?quadStr) } ORDER BY ?g ?s ?p ?o datatype(?o) lcase(lang(?o)) Blank nodes are ignored. It works to the extent that: * I got ?hash of an N-Quads test file * round-tripped the test file as ?nQuads (using SELECT that is commented out) * I got ?hash of the round-tripped N-Quads (this part requires pulling them out of SPARQL results syntax such as XML or CSV) * both ?hash values matched using Jena One thing this query fails on is serializing literal values with newlines, which are not allowed in N-Triples/N-Quads. Can anyone suggest how str(?o) should be replaced to fix that? Martynas atomgraph.com On Mon, Jun 7, 2021 at 9:46 PM Peter Patel-Schneider <pfpschneider@gmail.com> wrote: > > Here's my version of "Signing and Verifying RDF Datasets for Dummies". > > > If you want to sign and verify documents (sequences of Unicode code > points), encode the document in utf-8 and sign and verify a hash of the > octet sequence. Transmit the octet sequence along with the signed > hash. > > If you want to sign and verify RDF datasets, serialize the dataset in > N-Quads and sign and verify that document. When a receiver > deserializes the document the result will be isomorphic to the dataset > that the sender had. Don't use a syntax that allows relative IRIs > (e.g., Turtle) as relative IRIs may turn into different absolute IRIs > when the document is deserialized. Don't use a syntax that allows > remote resources to affect deserialization (e.g., JSON-LD) as these > remote resources can be modified by an attacker. Don't use a syntax > where parts of the document that don't serialize parts of the datatset > look as if they might be important (e.g., RDFa) as receivers might come > to depend on these non-coding parts. Don't use a syntax where it is > not obvious which parts of the document serialize parts of the dataset > (e.g., JSON-LD) as receivers might be confused as to just what dataset > is being transmitted. Don't use a syntax where the mapping from the > serialization to the dataset is poorly defined in practice (e.g., JSON- > LD). > > If you want to sign and verify RDF datasets and you want isomorphic RDF > datasets to have the same signature, you first need to define a > canonical serialization for RDF datasets so that isomorphic RDF > datasets have the same canonical form. To sign and verify, create the > canonical serialization for the RDF dataset and sign and verify that. > Use N-Quads for this canonical form for the reasons above. Don't > transmit any encoding other than the N-Quads canonical form for the > reasons above, and more. If you don't want to depend on a complex > algorithm to produce the canonical form then forbid blank nodes. > > This pretty much boils down to just using and only transmitting the > simplest and most transparent document format possible because anything > else just adds extra problems and that N-Quads is the simplest and most > transparent document format for RDF datasets. > > > My takeaway from this is that any W3C WG that is trying to standardize > something that involves signing and verifying RDF datasets should only > use N-Quads to transmit these datasets. > > > peter > > >
Received on Tuesday, 8 June 2021 10:17:07 UTC