W3C home > Mailing lists > Public > public-linked-json@w3.org > June 2011

Re: JSON-LD bnode canonical naming algorithm

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Sat, 18 Jun 2011 01:42:46 -0400
Message-ID: <4DFC3AD6.8010005@digitalbazaar.com>
To: "public-linked-json@w3.org" <public-linked-json@w3.org>
On 06/10/11 01:35, glenn mcdonald wrote:
> It sounds like your argument is: because we don't have feature X
> that we should also not have feature Y.
> No, I'm saying that either you can have a "pure data" serialization, 
> or you can have a system for expressing logic, but it doesn't make 
> any sense to me to have data and existential quantification without 
> universal quantification and implication, and thus I don't buy 
> existential quantification as an argument for bnodes.

I'm having a very hard time understanding what you mean. Could you
please elaborate? Specifically, why are you treating this as an "if you
have X, then you MUST have Y" issue? It is provable that, in the case of
bnodes, just because you have data and existential quantification that
you don't need universal quantification and implication. I know you are
asserting that /you/ don't think it makes sense to have one without the
other, but I'm not convinced of your opinion because I can't see the
reasoning. This also assumes that we're using the same definitions for
"existential quanitification", "universal quantification" and "implication".

I can't help but think that we're devolving into the realm of
theoretical purity here. Let's re-ground the discussion.

If we are to remove support for bnodes, do you expect every
implementation of PaySwarm to generate a unique identifier for the
millions, if not billions, of transfers, digital signatures and payee
descriptions that will be generated per day? Do you expect us to create
a financial system that is not capable of telling whether or not two
people in the same graph, which do not have a unique identifier, are in
fact the same person? These are real problems that we're grappling with.
If we get rid of bnodes, we're still going to have to solve these problems.

At the end of the day, you are going to have to propose a solution that
allows us to perform markup like the following:


What's the alternative that you are proposing?

> But then you say:
>> Expressing logical assertions are not the only use for blank nodes.
>>  There are more practical uses that we're concerned with - where a
>>  dereferenceable IRI doesn't make sense, but graph equality is a 
>> requirement. For instance, not everything needs a dereferenceable
>> IRI in a transient, digitally-signed contract for the immediate
>> exchange of virtual goods for cash. Creating IRIs here is a
>> nuisance with considerable maintenance overhead.
> And here I claim that uniquely identifying each node is what it
> means to have a graph, and that doing so (which ought to be handled 
> automatically by your database) is far /less/ of a construction and 
> maintenance nuissance, both conceptually and practically, than 
> supporting unidentified nodes.

You're confusing me - you seem to be stating that a database is supposed
to help you handle unidentified nodes by automatically naming them? If
so, what's the naming algorithm that the database uses? What happens
when you have to canonicalize? Wouldn't the naming algorithm need to be
specified to get canonical graphs generated? I don't understand the
point you're making.

> These are tradeoffs. You've got this mailing list because you didn't
>  want to be encumbered by all the RDF baggage that gotten loaded
> onto the RDF/JSON effort. 

I hope by "you" you mean - the group of people that are interested in
this problem space. /I/ didn't personally receive anything. :) I just
want to make it clear that this is a group effort and all points of view
are welcome here - creating this mailing list was a group decision. The
reason this group was created was because the RDF Working Group felt
that they did not have the expertise to work on the type of Linked JSON
specification proposed by the JSON-LD proposal. We needed a group of
folks that were interested in expressing graph information, preferably
in RDF of some form, via JSON.

Some of the people in this group don't see all of RDF as baggage. Some
of the people in this group do see much of RDF as baggage. We have
varied opinions on where to draw the line between too little and too
much RDF. We're just trying to find the right balance - together. :)

> From my perspective, you're /still/
> lugging around way more baggage than you need. If you think you can't
> travel without it, that's your prerogative, just as it was every RDF 
> purist's prerogative not to follow you this far. But somebody else 
> will be willing to ditch more, and then they will race you, and then 
> you will lose. Why not try to win?

Glenn, I know you probably didn't mean this to come across in the way
that I am reading the above paragraph, but phrases like "Why not try to
win" make my skin crawl. The term "win" presumes that there is an end to
what we do - there is not. There are just attempts and lessons learned
from those attempts:


I'd like to see all of us focus on collaborating on solving real issues
and learning from one another - when the focus is put on vague terms
like "winning", I don't think we get the results that we want.

The art of creating specifications is a continuous learning process -
there isn't an end. You try something, see if it works, refine it,
release it and start the cycle all over again. If we end up with
something that is useful to a large group of people - great! If somebody
ditches a few bits and is more successful, accomplishing the original
goal, then great! The goal here is to make the Web better, not "win".

At the core of all of this is a set of use cases that must be met. We're
trying to find the correct balance. We're pulling along as much RDF
baggage as is necessary to solve our real world problems - that is
PaySwarm. Having bnodes really helps us do that. Removing bnodes makes
it impossible to address some of the PaySwarm use cases.

Here are some of the real-world issues that we're dealing with:

1. How can you represent RDFa data using JSON-LD?
2. How do you efficiently represent N-to-1 graph relationships when the
   "1" does not have a unique identifier /and/ you care about
   bandwidth? 50 people know a single person, but all you have is the
   name of that person and their place of business. You could repeat
   the person's information 50 times, but that's wasteful.
3. How do you represent relationships between semantic objects like
   Microformats, which do not typically have URLs or any other types of
   identifiers for things. Microformat data is typically best
   represented as bnodes.
4. How do you efficiently represent N-to-1 graph relationships when the
   "1" does not have a unique identifier /and/ you need to differentiate
   between multiple values that /could/ be the "1"? That is, there are
   four "John Smiths" in a social graph with no unique URL identifiers.
   If there are no bnodes, how do you differentiate between those four
   "John Smiths"?
5. If everything must be named, how can you have a decentralized
   Payment system that can name every digital signature object created
   by that decentralized system without creating name clashes? Yes,
   you could use a Distributed Hash Table (DHT) technique, but then
   you're just shifting the problem somewhere else.

Those are just five of the problems that are solved with using bnodes.
If one wants to get rid of bnodes, they must find a way to solve at
least those 5 problems.

-- manu

Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: PaySwarm Developer Tools and Demo Released
Received on Saturday, 18 June 2011 05:43:12 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:53:17 UTC