Re: JSON-LD terminology from Richard Cyganiak on 2012-08-27 (public-rdf-wg@w3.org from August 2012)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Mon, 27 Aug 2012 23:42:52 +0100
To: Gregg Kellogg <gregg@greggkellogg.net>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <A485B5A5-0B4A-4B8D-B947-3484ACBE3EF8@cyganiak.de>
Hi Gregg,

Thanks for the response. Comments inline.

On 27 Aug 2012, at 18:25, Gregg Kellogg wrote:
>> 1. Conceptualise JSON-LD as an extension of JSON (that is, it allows identification and linking of JSON objects, and extends the JSON data model with some richer constructs. Vanilla JSON is a tree of JSON objects and JSON arrays. JSON-LD is a graph of JSON objects.)
> 
> I believe this is the general take we've taken on JSON-LD so far. The problems (IMO) have come when we need to cross over and talk about Linked Data issues.

I'm not sure I understand the distinction you draw between JSON-LD issues and Linked Data issues and where you need to cross over from one to the other.

>> 2. Pick terms from RDF Concepts (that is, speak of “nodes” instead of “subjects and objects”, “literals” instead of “objects that are labelled with text”, “predicates” instead of “properties”)
> 
> Yes, I agree. talking about subjects and objects becomes pretty confusing. In the issue, I've suggested changing _subject definition_ and _subject reference_ to _node definition_ and _node reference_, where a node is defined using a JSON object.

I think it would help if you stated somewhere very clearly: “In JSON-LD, every JSON object falls into one of X categories: node definition, node reference, …”

If you stop using the term “object” for the “nodes” of a JSON-LD graph, then I'd think that terms like “node object” and “reference object” instead of “node definition” and “node reference” might be better, because they're kinds of JSON objects.

Note: “subject definition” and “subject reference” are currently defined in a section that starts: “JSON [RFC4627] defines several terms which are used throughout this document: …” These terms are certainly *not* defined in RFC 4627!

> Getting into the difference between nodes, subjects, resources, properties and predicates is best left to RDF Concepts.

Sounds reasonable.

>> 3. Describe JSON-LD in terms of the claims it makes about the universe: Subject and objects are “resources”/“entities”/“things”. JSON-LD expresses “relationships” between them; “properties” are types of relationships. Strings, numbers and so on are “values”.
>> 
>> Given the goals of JSON-LD, the first seems most reasonable as it explains the benefits of using JSON-LD over vanilla JSON in terms that the target audience should already be familiar with and relate to. Personally, I think that the second is cleanest, but I suppose it would go a bit against the JSON-LD design goal of pretending that it has nothing to do with RDF.
>> 
>>> The doc does a reasonable job of saying "JSON Object" when it means a the concept from JSON - maybe there are some places it does not get the naming quite right (editorial).
>> 
>> Yes, the doc attempts to distinguish "JSON Object" and "object", but the target audience will be familiar with the term "object" in the vanilla JSON sense, so why go against the grain by re-defining "object" to mean something else? What's wrong with "node"?
> 
> Agreed, but I think we'll probably stick with _JSON object_ to keep it distinct from the concept of _object_ in the triple-positional sense.

+1

>>> EricP - You are noted in "Issue 2" as suggesting "that the definitions of subject and object, while being practical, are at odds with [RDF-CONCEPTS] use in their roles within a triple."  Care to say more?
>> 
>> I guess EricP's concern is that in RDF Concepts, subjects and objects are positions in a triple. In the JSON-LD data model, subjects and objects are what RDF Concepts calls nodes. RDF Concepts doesn't distinguish between “subject nodes” and “object nodes”; there's just nodes. RDF Concepts does distinguish between IRIs, blank nodes, and literals, unlike JSON-LD.
> 
> JSON-LD does distinguish between IRIs, blank nodes and values (literals).

I'm referring to the definition of the JSON-LD data model in the ED:
http://dvcs.w3.org/hg/json-ld/raw-file/default/spec/latest/json-ld-syntax/index.html#linking-data

In RDF Concepts, there are three kinds of nodes: IRIs, blank nodes, and literals.

In the JSON-LD model, IRIs are not kinds of nodes (kinds of subjects/objects), but labels on a node.

Blank nodes are not mentioned; it is merely mentioned that nodes can be unlabelled. (To be fair, that's probably close enough.)

Literals are not mentioned.

“Data values” are mentioned as a possible label on a node, besides IRIs and text. So it certainly doesn't match the RDF Concepts notion of literals or literal values.

Note that RDF Concepts, nodes *are* IRIs or literals; they are not *labelled* with an IRI or literal. But I guess I can live with a phrasing that says that nodes are labelled with IRIs or labelled with a literal or unlabelled, if it's done consistently.

>>> Personally, I have always found that the data model of RDF is not too complicated.  The main issue is the total amount of technology a web developer has juggle rather than any specific technology.
>>> 
>>> When faced with the task of learning RDF, the Turtle-as-records clicks.  URIs and prefix names can be a early confusion - JSON-LD does not change for the better or worse.
>> 
>> Uh, no. JSON-LD *does* change it for the worse. In Turtle, we have absolute URIs, relative URIs, and prefixed names. In JSON-LD, we have these three, plus term expansion. So there's four different ways of writing a URI.
> 
> RDFa also has terms, and terms are really necessary for people to work with JSON-LD as JSON.

I was disputing the claim that IRI handling in JSON-LD is as simple as in Turtle, that is all.

>>> The other confusion is "what is a graph?" A graph is a set of nodes and a set of edges.  It can be drawn as a picture.  An RDF graph does not include an explicit set of nodes - it's just the edge set.  A node label can be an edge label.   This is again the same in JSON-LD and RDF Concepts.
>> 
>> Unlike RDF, JSON-LD allows literals as subjects and predicates, allows multiple nodes with the same IRI, doesn't allow string literals that happen to contain an IRI, and allows multiple identical edges (i.e. is a multiset of triples).
> 
> Actually, JSON-LD does not allow literals as subjects.

Well…:

[[
• An object may be labeled with an IRI or a label that is not an IRI such as plain text, internationalized text, or a strictly-typed data value.
• A node may be a subject and an object at the same time.
]]
http://dvcs.w3.org/hg/json-ld/raw-file/default/spec/latest/json-ld-syntax/index.html#linking-data

There is nothing in the definitions that rules out literals as subjects. As an implementer, given the definitions above, I'd certainly assume that they need to be supported.

> A subject is always identified by the @id key of a JSON object (or an unnamed BNode if it does not exist). The value of an @id key is always interpreted as an IRI or BNode. I think there was some mis-communication about this earlier.

How is an implementer supposed to know that between all the mechanisms of the language, there is no way to generate a literal in subject position, if the data model allows it?

> Other RDF representations, also allow the same IRI to be used in multiple contexts as a subject. In fact, when JSON-LD is turned into RDF, these are merged together, much as they would be in RDFa or Turtle.

Well, but RDF is really specific about the fact that the graph node *is* an IRI, and hence if parsing produces the same node in multiple places, then it actually *is* the *same* node.

In JSON-LD, as far as I can tell, we end up with *multiple* nodes that are labelled with the same IRI, right?

Or, to ask it a different way: If you were to visualize the “linked data graph” represented in a JSON-LD document that has multiple subject definitions with the same @id, would there be one or multiple nodes in the graph?

> Part of the framing algorithm (not in a REC-track document just yet) does include a _flattening_ step, which reconciles multiple JSON objects containing the same @Id into a single JSON object. We should probably move _flattening_, or something like it, into the API document.
> 
>> JSON-LD also doesn't specify whether IRIs are absolute or relative, doesn't say what exactly an internationalized string is, doesn't say what range of datatypes are supported, doesn't say whether graphs can be merged or how, and doesn't say when two graphs are identical; so in all of these regards its data model might also differ from RDF.
> 
> I don't follow you on this. JSON-LD's use of IRIs is the same as Turtle or RDFa, for all practical purposes.

JSON-LD defines IRIs as:

[[
• A subject should be labeled with an IRI (an Internationalized Resource Identifier as described in [RFC3987]).
]]
http://dvcs.w3.org/hg/json-ld/raw-file/default/spec/latest/json-ld-syntax/index.html#linking-data

RFC 3987 defines both absolute and relative forms.

Turtle and RDFa both defer to RDF Concepts, which explicitly states that IRIs are always absolute.

> Relative IRIs, not used as a property or type, are resolved relative to the document base.

Well, I wouldn't know that from reading the JSON-LD data model definition.

> As with JSON, strings come from UTF-8, with language identified using the @languge key, either as part of an expanded value, or defined within a context. This is similar to the @lang definition within RDFa or RDF/XML. If there's some normative text you believe should be added to clarify this, I would be fine with that.

Again, here's what the data model definition says about internationalized text:

[[
An object may be labeled with an IRI or a label that is not an IRI such as plain text, internationalized text, or a strictly-typed data value.
]]
http://dvcs.w3.org/hg/json-ld/raw-file/default/spec/latest/json-ld-syntax/index.html#linking-data

Is there guidance somewhere that tells implementers what information they need to store to represent an internationalized string? For example, does one need to store the string's character encoding? If not, how would an implementer know?

> WRT graphs, JSON-LD provides a syntax for specifying them, pretty similar to the same way they can be specified in TriG. Notions of graph equivalence fall to the data model, not the serialization format, don't they?

Sure, but the JSON-LD data model doesn't say anything about graph equivalence.

As far as I understand, the normative data model of JSON-LD is what you call “Linked Data”, and define here:
http://dvcs.w3.org/hg/json-ld/raw-file/default/spec/latest/json-ld-syntax/index.html#linking-data

> Given that this is a normative transformation from JSON-LD to RDF, graph equivalence and other semantics leverage RDF.

I don't see how an optional conversion algorithm that converts JSON-LD into a different data model, presented in a different specification, can be taken as answering questions about the JSON-LD data model.

>> JSON-LD also contradicts itself regarding the possibility of unlabelled edges (not possible according to the definition, but possible according to a NOTE a bit later).
> 
> Yes, we could add something that prohibits a key in a JSON-LD object from having the form of a BNode; I don't think this would really loose anything.

(I'm not saying that all edges should be labelled. I'm just pointing out the inconsistency.)

> The grammar in A.1 should say:
> 
> [[[
> Keys are IRIs, compact IRIs, terms defined within the active context which MUST evaluate to absolute IRIs, or one of the following keywords
> ]]]
> 
> or words to that effect.
> 
> I would support removing that note, but I think the original came as a result of lobbying by Kingsley
> 
> [[[
> A property SHOULD be labeled with an IRI.
> ]]]
> 
> (from requirements: [1])
> 
> This was to be inclusive of notions of Linked Data that aren't RDF, but I think it's probably appropriate now to close the loop on this and settle that a property MUST be labeled with an (absolute) IRI.

(FWIW, I'm not opposed to allowing simple keywords as properties, and resolving them against some sort of base IRI in order to form absolute IRIs during RDF conversion. There are certainly benefits to such a design in terms of lowering barriers to entry, cf. microdata.)

>> So I don't think it's the same data model. It's not just a difference in choice of words here and there.
>> 
>> (JSON-LD also contains a narrow definition of "Linked Data" that contradicts five years of existing W3C specifications, but that's another rant for another day.)
>> 
>>> RDF is not an RDF format!
> 
> B.1, where this is asserted, should be stricken, or moved someplace else. We need to be clear that the concepts outlined in JSON-LD Syntax are based on, and fully consistent with RDF Concepts.

FWIW, my main concern is that the data model section (3.1) should be aligned to RDF Concepts as far as possible, and should use the same terminology. This doesn't mean that the definitions all need to be the same. If it makes sense for the JSON-LD use cases to deviate from RDF definitions (e.g., allow unlabelled edges, or restrict the available literal datatypes, or whatever), then JSON-LD should just use the same *term* with a *different definition*. It is then easy to systematically list the differences in an Appendix.

My other concern is that the definitions should be *really precise*, as they are in the JSON spec and (hopefully) in RDF Concepts. I mean, from reading *just the data model definition*, I should be able to implement a JSON-LD graph store that stores instances of this data model, right? I shouldn't need to go through the syntax spec with a magnifying glass to discover all the details of what's allowed, included and excluded in the data model.

(Finally, but that's again another rant for another day: Once the data model is really clear, I think that JSON-LD should really be introduced by listing all the ways *how its data model differs from the JSON data model*. So, all the things it adds above vanilla JSON. Arbitrary graphs instead of trees. Hyperlinks within and across documents. I18n and arbitrary data types. Expansion of keys into globally unique IRIs. I think this is really important. If you cannot explain to a JSON developer why they should care about these additions over plain JSON, then you get the RDF/XML problem where people try to produce the stuff with XML goggles on without understanding at all what's going on.)

>> Right, and neither are B.4 Microformats nor B.5 Microdata.
> 
> As a syntax, Microformats can be turned into RDF, although not normatively.

That is true, and I've implemented a collection of microformat-to-RDF parsers for Sindice. This doesn't change the fact that microformats are not RDF formats.

> For this audience, both Microformats and Microdata are appropriate when discussing how they relate to JSON-LD.

Sure, but the heading of Appendix B is still wrong!

> Microdata is just as much of an RDF format as RDFa (like it or not): [2]

Well, there is a difference between a W3C REC and a W3C IG Note. That's besides the point here though.

> However, as with JSON-LD, microdata does not necessarily need to be transformed to RDF to be useful.

But again, microdata is *not* an RDF format and cannot be sensible listed under a heading “Relationship to other RDF formats”.

Best,
Richard



> 
> Gregg
> 
> [1] http://json-ld.org/requirements/latest/
> [2] http://www.w3.org/TR/microdata-rdf/
> 
>> Best,
>> Richard
> 
>
Received on Monday, 27 August 2012 22:43:19 UTC