- From: Martynas Jusevičius <martynas@atomgraph.com>
- Date: Tue, 8 Jun 2021 12:15:31 +0200
- To: Peter Patel-Schneider <pfpschneider@gmail.com>
- Cc: Semantic Web <semantic-web@w3.org>
Peter,
I have tried to implement your canonicalization algorithm as SPARQL:
# SELECT (GROUP_CONCAT(?quadStr ; separator='\n') AS ?nQuads)
SELECT (SHA1(GROUP_CONCAT(?quadStr ; separator=' \n')) AS ?hash)
WHERE
{ { SELECT DISTINCT ?g ?s ?p ?o
WHERE
{ { ?s ?p ?o }
UNION
{ GRAPH ?g
{ ?s ?p ?o }
}
}
}
BIND(concat("<", str(?s), ">") AS ?sStr)
BIND(concat("<", str(?p), ">") AS ?pStr)
BIND(if(isURI(?o), concat("<", str(?o), ">"), concat("\"",
str(?o), "\"", if(( lang(?o) != "" ), concat("@", str(lang(?o))),
concat("^^<", str(datatype(?o)), ">")))) AS ?oStr)
BIND(concat(?sStr, " ", ?pStr, " ", ?oStr, " ", if(bound(?g),
concat("<", str(?g), ">", " "), ""), ".") AS ?quadStr)
}
ORDER BY ?g ?s ?p ?o datatype(?o) lcase(lang(?o))
Blank nodes are ignored.
It works to the extent that:
* I got ?hash of an N-Quads test file
* round-tripped the test file as ?nQuads (using SELECT that is commented out)
* I got ?hash of the round-tripped N-Quads (this part requires pulling
them out of SPARQL results syntax such as XML or CSV)
* both ?hash values matched using Jena
One thing this query fails on is serializing literal values with
newlines, which are not allowed in N-Triples/N-Quads. Can anyone
suggest how str(?o) should be replaced to fix that?
Martynas
atomgraph.com
On Mon, Jun 7, 2021 at 9:46 PM Peter Patel-Schneider
<pfpschneider@gmail.com> wrote:
>
> Here's my version of "Signing and Verifying RDF Datasets for Dummies".
>
>
> If you want to sign and verify documents (sequences of Unicode code
> points), encode the document in utf-8 and sign and verify a hash of the
> octet sequence. Transmit the octet sequence along with the signed
> hash.
>
> If you want to sign and verify RDF datasets, serialize the dataset in
> N-Quads and sign and verify that document. When a receiver
> deserializes the document the result will be isomorphic to the dataset
> that the sender had. Don't use a syntax that allows relative IRIs
> (e.g., Turtle) as relative IRIs may turn into different absolute IRIs
> when the document is deserialized. Don't use a syntax that allows
> remote resources to affect deserialization (e.g., JSON-LD) as these
> remote resources can be modified by an attacker. Don't use a syntax
> where parts of the document that don't serialize parts of the datatset
> look as if they might be important (e.g., RDFa) as receivers might come
> to depend on these non-coding parts. Don't use a syntax where it is
> not obvious which parts of the document serialize parts of the dataset
> (e.g., JSON-LD) as receivers might be confused as to just what dataset
> is being transmitted. Don't use a syntax where the mapping from the
> serialization to the dataset is poorly defined in practice (e.g., JSON-
> LD).
>
> If you want to sign and verify RDF datasets and you want isomorphic RDF
> datasets to have the same signature, you first need to define a
> canonical serialization for RDF datasets so that isomorphic RDF
> datasets have the same canonical form. To sign and verify, create the
> canonical serialization for the RDF dataset and sign and verify that.
> Use N-Quads for this canonical form for the reasons above. Don't
> transmit any encoding other than the N-Quads canonical form for the
> reasons above, and more. If you don't want to depend on a complex
> algorithm to produce the canonical form then forbid blank nodes.
>
> This pretty much boils down to just using and only transmitting the
> simplest and most transparent document format possible because anything
> else just adds extra problems and that N-Quads is the simplest and most
> transparent document format for RDF datasets.
>
>
> My takeaway from this is that any W3C WG that is trying to standardize
> something that involves signing and verifying RDF datasets should only
> use N-Quads to transmit these datasets.
>
>
> peter
>
>
>
Received on Tuesday, 8 June 2021 10:17:07 UTC