Re: Branding? from Dave Longley on 2011-07-27 (public-linked-json@w3.org from July 2011)

From: Dave Longley <dlongley@digitalbazaar.com>
Date: Wed, 27 Jul 2011 01:07:21 -0400
To: public-linked-json@w3.org
Message-ID: <4E2F9D09.6090903@digitalbazaar.com>
In order to contribute to this branding discussion and JSON-SD/unlabeled 
nodes debate I'll start with what the whole point of JSON-LD is from my 
point of view -- and then offer my position on the various arguments.

My view of JSON-LD is that it was a technology that was created to 
represent graphs in JSON in the same way that other technologies like 
RDF express graphs. More than this, it was also intended to make it 
possible to contextualize JSON data (including existing JSON) such that 
it could be made sense of as a subgraph of the greater graph of data on 
the web. This is where Linked Data comes in.

Perhaps the most prolific graphs on the web are so-called "Linked Data" 
graphs. If you're going to model your data as a graph and your data is 
going to exist on the web, inter-operate, and reuse existing data on the 
web, then your graph should be a "linked" graph. More casually, it 
should play nice with the other data on the web. So that's the kind of 
data that I presume JSON-LD was invented to markup.

Now, it has been argued that not all JSON data can be contextualized 
such that it fits a strict Linked Data definition. This primarily refers 
to "blank nodes" or "unlabeled nodes". Hence the current debate for 
changing JSON-LD in some way to something that either goes by another 
name and includes a larger subset of graphs, or by somehow dropping 
support for those graphs that are not supposedly strict Linked Data. My 
position is that both of these approaches is flawed.

First, I understand that some have argued that supporting unlabeled 
nodes (or nodes with "blank node identifiers") would result in 
supporting the markup of graphs that are not considered Linked Data. The 
main argument being that the name "JSON-LD" (JSON Linked Data) would 
betray the conceptual purity of "Linked Data". The reasoning behind 
this, as I understand it is as follows:

In order for a graph to be Linked Data, it has been argued that its 
subjects (nodes that are not literal values) and edges must be 
resolvable to representations of their referents. This simply means that 
there must be a Resolver that can resolve a subject or edge to a 
representation of the graph that refers to it. The contrapositive for 
this is that if no such Resolver exists, then a graph is not Linked Data.

The argument that a graph containing a blank node is not Linked Data has 
not been very explicit in my view. To try and make it more explicit, 
here is what it appears to me to be: An HTTP Resolver can resolve HTTP 
subject URIs, but if a subject uses a blank node identifier, then an 
HTTP Resolver can't resolve it. Therefore, a graph with a blank node is 
not Linked Data.

It is possible that I've misunderstood the argument, but that's what 
I've been hearing. Of course, when written out, the conclusion doesn't 
follow from the definition offered for Linked Data. In fact, a much 
simpler Resolver than an HTTP Resolver could be devised for blank node 
identifiers: All it must do is return the local graph that it is a part 
of -- where you found the blank node in the first place.

Side note: It is worth mentioning that a blank node identifier can be a 
URI, even if the scheme is not HTTP. Furthermore, if an algorithm can be 
devised to automatically and canonically label unlabeled subjects, then 
that algorithm need only be added to the aforementioned Resolver in 
order for it to comply with the given Linked Data definition. In several 
current JSON-LD implementations, such an algorithm has been implemented 
and it is part of JSON-LD "normalization".

At this point it could be counter-argued that the above definition of 
Linked Data simply needs some work in order to exclude unlabeled nodes 
properly. (In fact, perhaps the definition is so lacking that one could 
rationalize that all data is Linked Data). But let's step back for a 
second. Why should we try to refine the definition so as to exclude 
unlabeled nodes? Why not attempt to make it include them instead to 
ensure their proper use?

Now on to my second point. When we're talking about graphs, I think 
there is an important concept that shouldn't be discarded, certainly not 
for the sake of conceptual purity. That is that there are pieces of 
information that only have meaning in relation to the entities of which 
they are a part (credit to Niklas). Another way of saying this is that 
there are nodes in a graph that do not stand alone. Any attempt to talk 
about them without the referring graph is meaningless. That doesn't mean 
that they aren't referred to by that graph (obviously) or that you can't 
create a simple Resolver that returns a representation of that graph. I 
think that any markup that can represent Linked Data should be able to 
both represent subjects that are only referred to by one graph and have 
only local-graph meaning, and subjects that can be referred to by more 
than one graph.

In all useful cases where a subject that has only local-graph meaning 
exists (an unlabeled node), the rest of the graph will be populated with 
the other type of subjects (labeled nodes). I see no reason why a 
JSON-LD interpreter can't look at a blank node identifier and say: "This 
identifier means that this subject is only referred to by the current 
graph and therefore resolves back to it. There are no external graphs 
that reference it and there is therefore no other external information 
about this subject."

It is easy to see the strength of subjects that become more meaningful 
the more they are linked to. But we shouldn't overlook the value of 
subjects that do not become more meaningful that way. Nor should we 
write a specification that would cause the inclusion of this type of 
information to constitute an invalid JSON-LD document. And, if we're 
really just debating conceptual purity, then I say a link count of 1 is 
still "Linked Data". Sure, without *any* more links in the graph such 
data is useless, but that unlabeled node is going to be, somehow, 
connected to a labeled node. People who use JSON-LD aren't going to 
create useless, wholly unlabeled data. We don't need the specification 
to stop them from doing that.

Tying this back to the end of my first point, if the branding debate is 
really about conceptual purity, but we know that unlabeled nodes are 
going to be useful to people who want to create Linked Data graphs in 
JSON, then why not make it clear how they fit in with Linked Data? It 
seems to me like this is a better approach than trying to figure out how 
to define Linked Data so that they have no place and so we need to 
rename our technology. Furthermore, this approach speaks to what seems 
to be another somewhat latent argument here against unlabeled nodes:

There seems to be some concern that if people are able to use blank 
nodes, then they will abuse them. In other words, they will use blank 
nodes when a piece of information can stand on its own. This will result 
in a failure to link data in an appropriate way.

There is no evidence offered for this argument. A similar argument might 
be that people will invent invalid URIs that do not resolve to 
representations of their referents in their data. Especially if they 
can't use blank nodes. This would also result in a failure to link data 
in an appropriate way.

Myriad arguments could be constructed this way -- so I reject this 
particular reasoning. But if we are worried about the abuse of unlabeled 
nodes, why don't we include how they fit in with Linked Data so it's 
clear what their use is? If you're using an unlabeled node it ought to 
be only because having a URI as a label would be forced or doesn't make 
any sense: you wouldn't refer to the subject outside of the graph. That 
may be covered by the recent adoption of the "SHOULD" text when talking 
about labeling nodes, but perhaps it could be clearer.

I expect users of JSON-LD to encounter situations where they think they 
should be using unlabeled nodes. They shouldn't get the impression that 
they must abandon JSON-LD all together if this happens -- or that 
there's no solution to their use case in the specification. I also don't 
think that a JSON-LD processor that is generating triples or normalized 
JSON-LD should fail someone who is contextualizing all of their JSON 
data and some of it simply needs to use unlabeled nodes.

All of this being said, if we still feel the need to adopt a new name I 
can live with that. I just want to see that we don't cut support for 
unlabeled nodes and would prefer that their use not be discouraged, but 
rather put in its appropriate place.

On 07/25/2011 11:08 PM, Manu Sporny wrote:
> JSON-SD doesn't really roll off of the tongue... neither did JSON-LD 
> or RDFa. HTML is only used because it's been around forever... but 
> it's a pretty crappy brand name. Any thoughts on what this technology 
> should be called as we ready it for public consumption?
>
> I was thinking: Structure
>
> "Structure allows you to express Linked Data in JSON"
>
> Yes, I realize that isn't entirely accurate, but tag-lines rarely are 
> accurate. Thoughts on branding the technology so that it's easy to 
> drop into a conversation without scaring Web developers away or making 
> people feel as if the conversation is going to take a scary turn 
> toward geek-speak?
>
> -- manu
>


-- 
Dave Longley
CTO
Digital Bazaar, Inc.
Received on Wednesday, 27 July 2011 05:07:47 UTC