Re: Branding? from Kingsley Idehen on 2011-07-27 (public-linked-json@w3.org from July 2011)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Wed, 27 Jul 2011 13:44:19 -0400
To: public-linked-json@w3.org
Message-ID: <4E304E73.3070404@openlinksw.com>
> In order to contribute to this branding discussion and 
> JSON-SD/unlabeled nodes debate I'll start with what the whole point of 
> JSON-LD is from my point of view -- and then offer my position on the 
> various arguments.
>
> My view of JSON-LD is that it was a technology that was created to 
> represent graphs in JSON in the same way that other technologies like 
> RDF express graphs. 

Yes, but RDF is but an option for creating Linked Data. It isn't the 
only syntax for achieving the goal. The fact that the syntax is based on 
a graph model doesn't mean it owns graph creation. The fact that it 
allows expression of semantics doesn't give it ownership of semantic 
expression. Sadly, the aformentioned fallacies are being pushed as facts 
in very careless ways.

Linked Data is a specific kind of structure with specific 
characteristics. If those characteristics aren't met its simply unwise 
to tag the end product as being Linked Data since the net effect is yet 
another broken narrative via yet another syntax. A new JSON syntax won't 
fix conceptual flaws (confining semantically rich expression to a 
specific syntax) or broken narratives.

> More than this, it was also intended to make it possible to 
> contextualize JSON data (including existing JSON) such that it could 
> be made sense of as a subgraph of the greater graph of data on the 
> web. This is where Linked Data comes in.

Yes, but Linked Data is a form of directed graph based Structured Data 
(SD). Thus, if you want broad coverage don't start with LD, use 
something else. Hence, the SD suggestion. JSON-SD or JSON-CD (as project 
names at worst) gives you natural flow (from the generic to the 
specific) that ultimately provides foundation for coherent narratives 
and ultimately audience comprehension.

>
> Perhaps the most prolific graphs on the web are so-called "Linked 
> Data" graphs. If you're going to model your data as a graph and your 
> data is going to exist on the web, inter-operate, and reuse existing 
> data on the web, then your graph should be a "linked" graph. More 
> casually, it should play nice with the other data on the web. So 
> that's the kind of data that I presume JSON-LD was invented to markup.

JSON-LD comes across as Linked Data construction and serialization using 
a JSON.

>
> Now, it has been argued that not all JSON data can be contextualized 
> such that it fits a strict Linked Data definition. This primarily 
> refers to "blank nodes" or "unlabeled nodes". 

Plus the more important issue of IRIs that resolve to Representations of 
their Referents.

> Hence the current debate for changing JSON-LD in some way to something 
> that either goes by another name and includes a larger subset of 
> graphs, or by somehow dropping support for those graphs that are not 
> supposedly strict Linked Data. My position is that both of these 
> approaches is flawed.

The suggestion is a Name or Moniker that implies a larger definition 
space for construction graphs using JSON. The goal is not to add to the 
expensive conflation bandwagon that continues to add complexity to a 
pretty simple concept, once all the confusion is set aside.
>
> First, I understand that some have argued that supporting unlabeled 
> nodes (or nodes with "blank node identifiers") would result in 
> supporting the markup of graphs that are not considered Linked Data. 
> The main argument being that the name "JSON-LD" (JSON Linked Data) 
> would betray the conceptual purity of "Linked Data". The reasoning 
> behind this, as I understand it is as follows:
>
> In order for a graph to be Linked Data, it has been argued that its 
> subjects (nodes that are not literal values) and edges must be 
> resolvable to representations of their referents. This simply means 
> that there must be a Resolver that can resolve a subject or edge to a 
> representation of the graph that refers to it. The contrapositive for 
> this is that if no such Resolver exists, then a graph is not Linked Data.

If Names do not Resolve to Representation of their Referents it isn't 
Linked Data. It isn't Linked Data today re. Web context and wasn't 
Linked Data in the past when working on your local computer using any 
system level programming language that offered you de-reference 
(indirection) and address-of operations via language specific operators. 
Programmers have been creating and exploiting Linked Data structures 
since the advent of computing.

>
> The argument that a graph containing a blank node is not Linked Data 
> has not been very explicit in my view.

Its hybrid Linked Data via skolemization, bottom line. The real issue 
boils down to the "deceptively simple" doctrine where you provide the 
simplest entry point into a nuanced realm of subjective complexity.

Do we want skolemization at the front door? I don't think so.

Blank nodes (as Richard explains nicely) ultimately degrades Linked Data 
meshes.

> To try and make it more explicit, here is what it appears to me to be: 
> An HTTP Resolver can resolve HTTP subject URIs, but if a subject uses 
> a blank node identifier, then an HTTP Resolver can't resolve it. 
> Therefore, a graph with a blank node is not Linked Data.

You have a local Name with local Resolution (via skolemization), whereas 
the Web aspect of Linked Data is about a Global Data Space i.e., a Web 
of Linked Data of Web of Data.

>
> It is possible that I've misunderstood the argument, but that's what 
> I've been hearing. Of course, when written out, the conclusion doesn't 
> follow from the definition offered for Linked Data. In fact, a much 
> simpler Resolver than an HTTP Resolver could be devised for blank node 
> identifiers: All it must do is return the local graph that it is a 
> part of -- where you found the blank node in the first place.
>
> Side note: It is worth mentioning that a blank node identifier can be 
> a URI, even if the scheme is not HTTP. Furthermore, if an algorithm 
> can be devised to automatically and canonically label unlabeled 
> subjects, then that algorithm need only be added to the aforementioned 
> Resolver in order for it to comply with the given Linked Data 
> definition. In several current JSON-LD implementations, such an 
> algorithm has been implemented and it is part of JSON-LD "normalization".
>
> At this point it could be counter-argued that the above definition of 
> Linked Data simply needs some work in order to exclude unlabeled nodes 
> properly. (In fact, perhaps the definition is so lacking that one 
> could rationalize that all data is Linked Data).

You can't rationalize that all Data is Linked Data. You can represent 
data in a myriad of ways, that doesn't imply that:

1. Every Datum has a Name
2. Names resolve to Representations of the Datum (the Referent of the Name)
3. That Representation is in directed graph form
4. That directed graphs are EAV/SPO 3-tuples (triples).

> But let's step back for a second. Why should we try to refine the 
> definition so as to exclude unlabeled nodes? Why not attempt to make 
> it include them instead to ensure their proper use?

If you feel skolemization is worth the hassle for the target audience. 
Trouble is others don't share this view. At the same time others accept 
that anonymous nodes are part graph based data representation.
>
> Now on to my second point. When we're talking about graphs, I think 
> there is an important concept that shouldn't be discarded, certainly 
> not for the sake of conceptual purity. That is that there are pieces 
> of information that only have meaning in relation to the entities of 
> which they are a part (credit to Niklas). Another way of saying this 
> is that there are nodes in a graph that do not stand alone. Any 
> attempt to talk about them without the referring graph is meaningless. 

Yes, context is important. But also understand context is inherently 
subjective and fluid. Knowing that something exists and is endowed with 
certain characteristics that coalesce around a Name is very powerful. 
The surface/space through which you access the basic claim also provides 
basic context. You aren't going to make claims in thin air, there is 
always a place into which you post the claims, and this place has an 
address provided by its host space.

> That doesn't mean that they aren't referred to by that graph 
> (obviously) or that you can't create a simple Resolver that returns a 
> representation of that graph. I think that any markup that can 
> represent Linked Data should be able to both represent subjects that 
> are only referred to by one graph and have only local-graph meaning, 
> and subjects that can be referred to by more than one graph.
>
> In all useful cases where a subject that has only local-graph meaning 
> exists (an unlabeled node), the rest of the graph will be populated 
> with the other type of subjects (labeled nodes). I see no reason why a 
> JSON-LD interpreter can't look at a blank node identifier and say: 
> "This identifier means that this subject is only referred to by the 
> current graph and therefore resolves back to it. There are no external 
> graphs that reference it and there is therefore no other external 
> information about this subject."
>
> It is easy to see the strength of subjects that become more meaningful 
> the more they are linked to. But we shouldn't overlook the value of 
> subjects that do not become more meaningful that way. Nor should we 
> write a specification that would cause the inclusion of this type of 
> information to constitute an invalid JSON-LD document. And, if we're 
> really just debating conceptual purity, then I say a link count of 1 
> is still "Linked Data". Sure, without *any* more links in the graph 
> such data is useless, but that unlabeled node is going to be, somehow, 
> connected to a labeled node. People who use JSON-LD aren't going to 
> create useless, wholly unlabeled data. We don't need the specification 
> to stop them from doing that.
>
> Tying this back to the end of my first point, if the branding debate 
> is really about conceptual purity, but we know that unlabeled nodes 
> are going to be useful to people who want to create Linked Data graphs 
> in JSON, then why not make it clear how they fit in with Linked Data?

Trouble is making it clear can lead to confusion since blank nodes and 
skolemization != good items for the front door of a lightweight 
mechanism for creating graphs (or specifically Linked Data graphs) in JSON.

The issues really have more to do with the following aimed at Web 
Developers, I believe:

1. What is JSON-LD?
2. Why is it important?
3. How do I use it?
> It seems to me like this is a better approach than trying to figure 
> out how to define Linked Data so that they have no place and so we 
> need to rename our technology. 
You mean spec :-)

> Furthermore, this approach speaks to what seems to be another somewhat 
> latent argument here against unlabeled nodes:
>
> There seems to be some concern that if people are able to use blank 
> nodes, then they will abuse them.

No, they'll be confused if skolemization algorithims hit them at the 
front door.

> In other words, they will use blank nodes when a piece of information 
> can stand on its own. This will result in a failure to link data in an 
> appropriate way.
>
> There is no evidence offered for this argument. A similar argument 
> might be that people will invent invalid URIs that do not resolve to 
> representations of their referents in their data. Especially if they 
> can't use blank nodes. This would also result in a failure to link 
> data in an appropriate way.

URIs have to Resolve otherwise it isn't Linked Data. Note, Linked Data 
is just a *kind* of directed graph, it too has no monopoly over data 
representation using directed graphs.

>
> Myriad arguments could be constructed this way -- so I reject this 
> particular reasoning. But if we are worried about the abuse of 
> unlabeled nodes, why don't we include how they fit in with Linked Data 
> so it's clear what their use is?

But the cost is high re. desired objective of this effort.

> If you're using an unlabeled node it ought to be only because having a 
> URI as a label would be forced or doesn't make any sense: you wouldn't 
> refer to the subject outside of the graph. 

What is the graph to you? Where are its boundaries?

Linked Data is about a WWW of Linked Data. The Web's Global Data Space 
dimension. Everything Name (irrespective of URI scheme) has to resolve 
to a Representation of its Referent that accessible from an Addresss.

> That may be covered by the recent adoption of the "SHOULD" text when 
> talking about labeling nodes, but perhaps it could be clearer.
>
> I expect users of JSON-LD to encounter situations where they think 
> they should be using unlabeled nodes. They shouldn't get the 
> impression that they must abandon JSON-LD all together if this happens 
> -- or that there's no solution to their use case in the specification. 
> I also don't think that a JSON-LD processor that is generating triples 
> or normalized JSON-LD should fail someone who is contextualizing all 
> of their JSON data and some of it simply needs to use unlabeled nodes.
>
> All of this being said, if we still feel the need to adopt a new name 
> I can live with that. I just want to see that we don't cut support for 
> unlabeled nodes and would prefer that their use not be discouraged, 
> but rather put in its appropriate place.

Yes, re. putting skolemization in its appropriate place i.e., not the 
front door of a lightweight spec for construction of graph based data 
representation using JSON :-)


Kingsley

>
> On 07/25/2011 11:08 PM, Manu Sporny wrote:
>> JSON-SD doesn't really roll off of the tongue... neither did JSON-LD 
>> or RDFa. HTML is only used because it's been around forever... but 
>> it's a pretty crappy brand name. Any thoughts on what this technology 
>> should be called as we ready it for public consumption?
>>
>> I was thinking: Structure
>>
>> "Structure allows you to express Linked Data in JSON"
>>
>> Yes, I realize that isn't entirely accurate, but tag-lines rarely are 
>> accurate. Thoughts on branding the technology so that it's easy to 
>> drop into a conversation without scaring Web developers away or 
>> making people feel as if the conversation is going to take a scary 
>> turn toward geek-speak?
>>
>> -- manu
>>
>
>


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Wednesday, 27 July 2011 17:44:56 UTC