JSON-LD terminology from Richard Cyganiak on 2012-08-27 (public-rdf-wg@w3.org from August 2012)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Mon, 27 Aug 2012 10:05:47 +0100
To: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <642FB5A4-6C00-4202-8A59-705FE4CA3257@cyganiak.de>

I've posted a comment on a JSON-LD issue about the choice of terminology in JSON-LD. I think that this discussion belongs onto the WG mailing list and shouldn't just be buried in the issue tracker, so I'm posting the message here too. This is about the fact that JSON-LD defines its own data model consisting of “subjects”, “objects” and “properties”, as can be seen here:
http://dvcs.w3.org/hg/json-ld/raw-file/default/spec/latest/json-ld-syntax/index.html#linking-data

The original comment is here:
https://github.com/json-ld/json-ld.org/issues/47#issuecomment-8050318

Best,
Richard

I think that the decision to define a new data model for JSON-LD is unfortunate. It leads to a rather complicated multi-levelled terminological mess. Let me illustrate.

We have an “object” in a JSON document. This describes a “subject” in the JSON-LD graph. That's a “node” in the corresponding RDF graph. Which, in turn, refers to a “resource” in the domain of interest. (RDF graphs have “subjects” and “objects” too, but they are something different.)

An object in a JSON document may have name-value pairs. The “name” denotes a “property” in the JSON-LD graph. In the corresponding RDF graph, this “property” becomes a “triple”. This triple encodes a “statement” about the domain of interest. (And the domain of interest also has “properties”, but they are something different.)

Explaining any RDF syntax is already a difficult endeavour because there are three levels of terminology to deal with:

* the syntax level (e.g., “elements” and “attributes” and “tags” in RDF/XML, or “objects” and “arrays” and “name/value pairs” in JSON-LD)

* the data model level (RDF graph terminology, e.g., “triple” and “node” and “literal”)

* the domain level (the things being described by the graph, e.g., “resources”, “properties”, “classes”, “literal values”)

Adding a fourth level with new terminology to this picture doesn't help anyone, and only has the effect of guaranteeing that the terminology that will be used around JSON-LD will forever be a confused mess.

The argument that has been brought forward for inventing a new data model is: “The existing data model is too complicated and most programmers wouldn't understand it.” I'm not convinced. The JSON-LD data model isn't significantly simpler than the RDF data model. The JSON-LD data model is also less precise (e.g., handling of unlabelled subjects, objects and properties; cardinalities of property-object pairs; what exactly is internationalized text; what's the identity function for subjects and objects; can non-IRI objects be subjects too; ...). And it has a proper WTF moment when it distinguishes subjects and objects only to explain that objects can be subjects too. The whole affair doesn't seem any simpler than the existing RDF data model to me.

Of course JSON-LD, being a JSON syntax, needs to use vanilla JSON terminology such as “array”, “object”, “name-value pair”. IMHO there are three options for the additional terminology that JSON-LD needs to introduce:

1. explain JSON-LD as essentially an extension/modification of the JSON data model (which consists of objects and arrays and values)
2. explain JSON-LD in terms of the RDF data model (predicates, nodes, IRIs, literals)
3. explain JSON-LD in terms of the statements that a JSON-LD document makes about the domain of discourse, using the terminology used in RDF (resources/entities, relationships, values, properties, classes)

Of those, the first choice seems best to me because it's easiest to motivate for non-SemWeb people. “JSON-LD provides a bunch of conventions for making the JSON model richer and more webby.” The current choice, explaining JSON-LD in terms of a new from-scratch data model, is the worst possible choice.

Received on Monday, 27 August 2012 09:06:17 UTC