Re: JSON-LD bnode canonical naming algorithm from glenn mcdonald on 2011-06-19 (public-linked-json@w3.org from June 2011)

From: glenn mcdonald <glenn@furia.com>
Date: Sun, 19 Jun 2011 01:42:11 +0000
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: "public-linked-json@w3.org" <public-linked-json@w3.org>
Message-ID: <BANLkTimiLVb3RmmuazKAa3cq9ciwKUWrGQ@mail.gmail.com>
>
> Let's re-ground the discussion.
>

> If we are to remove support for bnodes, do you expect every
> implementation of PaySwarm to generate a unique identifier for the
> millions, if not billions, of transfers, digital signatures and payee
> descriptions that will be generated per day?


I'm fine with re-grounding, but this is now a totally different question.
What you generate ids for in your system is entirely your problem. It would
seem pretty strange to me if transfers didn't have ids, but if they don't in
your world, fine. I can more easily see why you might not want to give
descriptions identifiers, but if you don't, then I don't see why you'd want
to make blank nodes for them, either. But I do know that generating ids is
not a new problem in computer science, and I'm pretty sure that a
PaySwarm-specific algorithm for generating them would be a lot easier to
write and implement than a generic blank-node-normalization scheme, and I'm
*really* sure that forcing "every implementation of PaySwarm" to generate
unique identifiers in some PaySwarm-suitable way is a lot less net
development work than forcing every JSON-LD implementation to support a
generic normalization algorithm. (Especially the particular algorithm
proposed at the beginning of this thread, which I took as an entertaining
thought-experiment nobody would seriously consider implementing in practice
for anything but a toy, static dataset.)

But you seem to be stipulating a) PaySwarm is going to use JSON-LD, and b)
JSON-LD is responsible for supporting the constructs you would like to use
in PaySwarm. This amounts to begging the question. I could do the same
begging, myself, by stipulating that Needle is going to use this
as-yet-unresolved JSON-LD and demanding that Needle's particular
inclinations be catered to in the spec, too. And ditto everybody else who
happens to wander into this "community".

But who are "we", and where are we trying to get? I don't think "we" know. I
know about this mailing list from a combination of blog posts and emails
from you. I know some of your stated motivations arising from your
frustration with the way the RDF/JSON working-group turned out. I think the
"JSON" part of "JSON-LD" is pretty clear, but the "LD" part is manifestly
not. What is its audience? What are its goals? How does it relate to RDF?

And while "It's the subset of RDF that PaySwarm liked" is *an* answer,
surely "we" would want to come with one that's less arbitrarily specific.
And in the spirit of trying to get us to simple, coherent, shared answers to
these questions, I'm trying to propose some that aren't based on one
particular system or one particular set of precedents.

Specifically, I'm proposing that we agree to this starting point: "JSON-LD"
is a set of conventions for using JSON to represent directed, labeled
graphs.

If we could agree to that, or some other statement of equivalent precision,
then we'd have some basis for discussing how blank nodes relate to the goal.

And it's possible that we could come to a consensus on a proposal that you
would end up not using, or not using exactly, in PaySwarm, or that I would
end up not using, or not using exactly, in Needle. In particular, if you
have lots of pieces of data that you don't intend to identify uniquely, and
thus can't link to, maybe you actually aren't doing "Linked Data".


At the end of the day, you are going to have to propose a solution that
> allows us to perform markup like the following:
>
> http://payswarm.com/vocabs/payswarm#Contract


I don't have to do any such thing. I don't even have to click on your link
to say, with great confidence, that if you want this to be a community
standards effort, you don't get to stipulate that it has to support every
arbitrary requirement and existing implementation of your own. I could just
as easily make the above statement and then link to some esoteric TopicMaps
example loaded with n-ary relationships, or *Leaves of Grass* for that
matter.

What's the alternative that you are proposing?


http://lists.w3.org/Archives/Public/public-linked-json/2011May/0010.html

But until "we" agree on a goal, we'll have no easier a time evaluating my
proposal, either.

Glenn, I know you probably didn't mean this to come across in the way
> that I am reading the above paragraph, but phrases like "Why not try to
> win" make my skin crawl.


Yes, I saw a post or tweet from you about your dislike of this kind of
wording a couple days after writing my note. For "win", substitute "get
widespread adoption". So to repeat my sentiment in different terms, I
believe that if "JSON-LD" passes up this opportunity to *really* simplify
the model, somebody else who is willing to discard more RDF baggage will
produce something simpler that will have a better chance for widespread
adoption. You may well find, to your dismay, that the features you want for
PaySwarm are at odds with the adoption spectrum you want from the rest of
the world.

Having bnodes really helps us do that. Removing bnodes makes
> it impossible to address some of the PaySwarm use cases.
>

I believe the first sentence. I'm skeptical about the "impossible" part of
the second.

1. How can you represent RDFa data using JSON-LD?
>

I'm not convinced this should be a requirement.

2. How do you efficiently represent N-to-1 graph relationships when the
>   "1" does not have a unique identifier /and/ you care about
>   bandwidth? 50 people know a single person, but all you have is the
>   name of that person and their place of business. You could repeat
>   the person's information 50 times, but that's wasteful.
>

Make a node, give it an id.

3. How do you represent relationships between semantic objects like
>   Microformats, which do not typically have URLs or any other types of
>   identifiers for things.


I'm not convinced this should be a requirement.

4. How do you efficiently represent N-to-1 graph relationships when the
>   "1" does not have a unique identifier /and/ you need to differentiate
>   between multiple values that /could/ be the "1"? That is, there are
>   four "John Smiths" in a social graph with no unique URL identifiers.
>

Four nodes, four ids. It's nodes and ids all the way down. Later you might
decide to merge some or all of these four nodes, but that's fine.

5. If everything must be named, how can you have a decentralized
>   Payment system that can name every digital signature object created
>   by that decentralized system without creating name clashes? Yes,
>   you could use a Distributed Hash Table (DHT) technique, but then
>   you're just shifting the problem somewhere else.
>

As I said above, I contend that this problem is *better* shifted inside
PaySwarm. Not only should this cost not be imposed on everybody because *you
* need it, but your specific solution can be *much* simpler than a
generalized one.

If one wants to get rid of bnodes, they must find a way to solve at
> least those 5 problems.


I agree that those are cases. I do not agree that bnodes are necessary to
"solve" all of them, and I do not accept that all of them have to be
addressed by JSON-LD just because they exist.

But this last is just another way of saying what I said above: If we want to
make progress, and if we want there to be a "we", then we have to start by
agreeing on our goals.

glenn
Received on Sunday, 19 June 2011 01:43:08 UTC