- From: Dave Longley <dlongley@digitalbazaar.com>
- Date: Wed, 27 Jul 2011 01:07:21 -0400
- To: public-linked-json@w3.org
In order to contribute to this branding discussion and JSON-SD/unlabeled nodes debate I'll start with what the whole point of JSON-LD is from my point of view -- and then offer my position on the various arguments. My view of JSON-LD is that it was a technology that was created to represent graphs in JSON in the same way that other technologies like RDF express graphs. More than this, it was also intended to make it possible to contextualize JSON data (including existing JSON) such that it could be made sense of as a subgraph of the greater graph of data on the web. This is where Linked Data comes in. Perhaps the most prolific graphs on the web are so-called "Linked Data" graphs. If you're going to model your data as a graph and your data is going to exist on the web, inter-operate, and reuse existing data on the web, then your graph should be a "linked" graph. More casually, it should play nice with the other data on the web. So that's the kind of data that I presume JSON-LD was invented to markup. Now, it has been argued that not all JSON data can be contextualized such that it fits a strict Linked Data definition. This primarily refers to "blank nodes" or "unlabeled nodes". Hence the current debate for changing JSON-LD in some way to something that either goes by another name and includes a larger subset of graphs, or by somehow dropping support for those graphs that are not supposedly strict Linked Data. My position is that both of these approaches is flawed. First, I understand that some have argued that supporting unlabeled nodes (or nodes with "blank node identifiers") would result in supporting the markup of graphs that are not considered Linked Data. The main argument being that the name "JSON-LD" (JSON Linked Data) would betray the conceptual purity of "Linked Data". The reasoning behind this, as I understand it is as follows: In order for a graph to be Linked Data, it has been argued that its subjects (nodes that are not literal values) and edges must be resolvable to representations of their referents. This simply means that there must be a Resolver that can resolve a subject or edge to a representation of the graph that refers to it. The contrapositive for this is that if no such Resolver exists, then a graph is not Linked Data. The argument that a graph containing a blank node is not Linked Data has not been very explicit in my view. To try and make it more explicit, here is what it appears to me to be: An HTTP Resolver can resolve HTTP subject URIs, but if a subject uses a blank node identifier, then an HTTP Resolver can't resolve it. Therefore, a graph with a blank node is not Linked Data. It is possible that I've misunderstood the argument, but that's what I've been hearing. Of course, when written out, the conclusion doesn't follow from the definition offered for Linked Data. In fact, a much simpler Resolver than an HTTP Resolver could be devised for blank node identifiers: All it must do is return the local graph that it is a part of -- where you found the blank node in the first place. Side note: It is worth mentioning that a blank node identifier can be a URI, even if the scheme is not HTTP. Furthermore, if an algorithm can be devised to automatically and canonically label unlabeled subjects, then that algorithm need only be added to the aforementioned Resolver in order for it to comply with the given Linked Data definition. In several current JSON-LD implementations, such an algorithm has been implemented and it is part of JSON-LD "normalization". At this point it could be counter-argued that the above definition of Linked Data simply needs some work in order to exclude unlabeled nodes properly. (In fact, perhaps the definition is so lacking that one could rationalize that all data is Linked Data). But let's step back for a second. Why should we try to refine the definition so as to exclude unlabeled nodes? Why not attempt to make it include them instead to ensure their proper use? Now on to my second point. When we're talking about graphs, I think there is an important concept that shouldn't be discarded, certainly not for the sake of conceptual purity. That is that there are pieces of information that only have meaning in relation to the entities of which they are a part (credit to Niklas). Another way of saying this is that there are nodes in a graph that do not stand alone. Any attempt to talk about them without the referring graph is meaningless. That doesn't mean that they aren't referred to by that graph (obviously) or that you can't create a simple Resolver that returns a representation of that graph. I think that any markup that can represent Linked Data should be able to both represent subjects that are only referred to by one graph and have only local-graph meaning, and subjects that can be referred to by more than one graph. In all useful cases where a subject that has only local-graph meaning exists (an unlabeled node), the rest of the graph will be populated with the other type of subjects (labeled nodes). I see no reason why a JSON-LD interpreter can't look at a blank node identifier and say: "This identifier means that this subject is only referred to by the current graph and therefore resolves back to it. There are no external graphs that reference it and there is therefore no other external information about this subject." It is easy to see the strength of subjects that become more meaningful the more they are linked to. But we shouldn't overlook the value of subjects that do not become more meaningful that way. Nor should we write a specification that would cause the inclusion of this type of information to constitute an invalid JSON-LD document. And, if we're really just debating conceptual purity, then I say a link count of 1 is still "Linked Data". Sure, without *any* more links in the graph such data is useless, but that unlabeled node is going to be, somehow, connected to a labeled node. People who use JSON-LD aren't going to create useless, wholly unlabeled data. We don't need the specification to stop them from doing that. Tying this back to the end of my first point, if the branding debate is really about conceptual purity, but we know that unlabeled nodes are going to be useful to people who want to create Linked Data graphs in JSON, then why not make it clear how they fit in with Linked Data? It seems to me like this is a better approach than trying to figure out how to define Linked Data so that they have no place and so we need to rename our technology. Furthermore, this approach speaks to what seems to be another somewhat latent argument here against unlabeled nodes: There seems to be some concern that if people are able to use blank nodes, then they will abuse them. In other words, they will use blank nodes when a piece of information can stand on its own. This will result in a failure to link data in an appropriate way. There is no evidence offered for this argument. A similar argument might be that people will invent invalid URIs that do not resolve to representations of their referents in their data. Especially if they can't use blank nodes. This would also result in a failure to link data in an appropriate way. Myriad arguments could be constructed this way -- so I reject this particular reasoning. But if we are worried about the abuse of unlabeled nodes, why don't we include how they fit in with Linked Data so it's clear what their use is? If you're using an unlabeled node it ought to be only because having a URI as a label would be forced or doesn't make any sense: you wouldn't refer to the subject outside of the graph. That may be covered by the recent adoption of the "SHOULD" text when talking about labeling nodes, but perhaps it could be clearer. I expect users of JSON-LD to encounter situations where they think they should be using unlabeled nodes. They shouldn't get the impression that they must abandon JSON-LD all together if this happens -- or that there's no solution to their use case in the specification. I also don't think that a JSON-LD processor that is generating triples or normalized JSON-LD should fail someone who is contextualizing all of their JSON data and some of it simply needs to use unlabeled nodes. All of this being said, if we still feel the need to adopt a new name I can live with that. I just want to see that we don't cut support for unlabeled nodes and would prefer that their use not be discouraged, but rather put in its appropriate place. On 07/25/2011 11:08 PM, Manu Sporny wrote: > JSON-SD doesn't really roll off of the tongue... neither did JSON-LD > or RDFa. HTML is only used because it's been around forever... but > it's a pretty crappy brand name. Any thoughts on what this technology > should be called as we ready it for public consumption? > > I was thinking: Structure > > "Structure allows you to express Linked Data in JSON" > > Yes, I realize that isn't entirely accurate, but tag-lines rarely are > accurate. Thoughts on branding the technology so that it's easy to > drop into a conversation without scaring Web developers away or making > people feel as if the conversation is going to take a scary turn > toward geek-speak? > > -- manu > -- Dave Longley CTO Digital Bazaar, Inc.
Received on Wednesday, 27 July 2011 05:07:47 UTC