W3C home > Mailing lists > Public > semantic-web@w3.org > June 2021

Re: Thoughts on the LDS WG chartering discussion

From: Dan Brickley <danbri@danbri.org>
Date: Thu, 10 Jun 2021 11:43:55 +0100
Message-ID: <CAFfrAFqY6ssawhSMRO0Z2Gii-X14epSG8CWHFbsSkQUmBUMMGg@mail.gmail.com>
To: Ivan Herman <ivan@w3.org>
Cc: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, semantic-web@w3.org
On Thu, 10 Jun 2021 at 11:31, Ivan Herman <ivan@w3.org> wrote:

> Graphs/Datasets are more often than not store in datastores, triple
> stores, knowledge graphs, you name it.

We’re seeing RDF (schema.org, OGP etc.) in 1/3 of so of pages you’ll find
in Google Search. There is a massive amount of RDF in active daily use that
is generated the same way ordinary web pages are generated (often from SQL
backends), and which can be used without necessarily storing in a
graph-specific manner. Although at some point you “datastores” covers
anything non-magically storing data in any way.

“More often than not” isn’t worth arguing over since you could fill up a
few SPARQL databases with Linked Open Numbers, but it would be a mistake to
assume that most RDF is obviously served or consumed in something like
triple stores.


The serialization format used to feed the triple store is irrelevant, and
> clients of such triple stores

(Not “datastores”?)

may request the data in different serialization format that suits their
> needs. If the consistency of such graphs (ie, set of triples or quads in
> the triple store) has to be checked via, say, a hash, then the approach you
> are describing does not work, due to the problem of bnode labels:
> triplestores are free to relabel the bnodes of incoming graphs and
> producing new labels when they export them.
> Also: isomorphic graphs do not have the same hash value, because graphs
> may be ony b isomorphic via a suitable relabeling of bnodes.
> Ivan
> On 10 Jun 2021, at 12:05, Peter F. Patel-Schneider <pfpschneider@gmail.com>
> wrote:
> On 6/10/21 3:40 AM, Ivan Herman wrote:
> [...]
> But. If I "just" start by, say, a Turtle representation of a Graph, I can
> of course convert that into canonical n-quads and hash the n-quads. But if
> the same Turtle representation is converted by RDFLib (or any other tool)
> into, God forbid, RDF/XML, the BNode identifiers will be different, ie, the
> conversion of the RDF/XML to n-quads will be different and, consequently,
> the hash will be different. *Unless the RDF canonicalization assigns the
> canonical identifiers to the BNodes in the process.*
> I really don't understand this point.  If I start with a Turtle document,
> just send the Turtle.  Well, except for the problem that deserializing
> Turtle documents doesn't always produce isomorphic graphs.  But the
> solution to this is easy, just use a format that always produces isomorphic
> graphs.  Send that.  No canonicalization necessary as each deserialization
> will produce an isomorphic graph.  And the hash is done on the document
> itself so standard methods for verifiable transmission of documents can be
> used without modification.
> If the starting point is a document in some other format, have the sender
> convert it to the appropriate format using the environment that the sender
> considers appropriate and send the resulting document.  If the starting
> point is an actual RDF graph, serialize the graph in the appropriate format
> and send the resulting document.  In each case, because deserialization in
> the document format produces isomorphic graphs, the recipient will end up
> with a graph isomorphic to the graph that the sender wanted to send.
> Which document format to use?  As far as I can tell, N-Triples (N-Quads)
> is the only document format where deserialization produces isomorphic RDF
> graphs (datasets).  Well, except for case normalization of language tags.
> So I am not really sure I actually understand your problem: you cannot
> avoid a canonical relabeling of the BNodes in the general case. That is
> what the abstract RDF canonicalization does: define canonical BNode labels
> in a serialization independent manner. In my view, that is absolutely
> necessary in general.
> But there is no need for this if all you are trying to do is to verifiable
> transmission of isomorphic RDF graphs.
> Ivan
> [...]
> peter
> ----
> Ivan Herman, W3C
> Home: http://www.w3.org/People/Ivan/
> mobile: +33 6 52 46 00 43
> ORCID ID: https://orcid.org/0000-0003-0782-2704
Received on Thursday, 10 June 2021 10:45:06 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:46:09 UTC