Re: JSON-LD requirements

On Jul 3, 2011, at 5:16 PM, glenn mcdonald wrote:

Glenn-I'd like to ask that you be specific in stating your objections
to the points in section 3.1.

OK, I'll go through them and then I'll back up and try to explain where I'm coming from, so it's easier (hopefullly) to understand where I end up.

1. Linked Data is used to express relationships between entities expressed as subject-predicate-object, or entity-attribute-value.
2. A subject is a non-terminal node in a directed graph.

I think by this pair you mean that subject, predicate and object are all singular, as in RDF triples. I believe that for a huge number of data applications, triples are too low-level an abstraction, an assembly language where we need a Java or a Ruby. Instead of breaking the logical nodes and arcs of a graph into triples, I want to see them expressed and transmitted as nodes, each with their arcs. Especially in JSON, where the mapping of nodes to JSON objects is both obvious and excellent, I don't want to see any "requirement" that data be decomposed into disconnected triples.

I was trying to be incremental, and in this section (3.1) trying to describe what I take to be the principles of Linked Data, without specific relation to a JSON representation. Note that in 3.2(13), there is specific mention to associating multiple values with an attribute in JSON.

3. A subject may be given a unique identifier represented using a URI.

I say "A node must have a unique identifier, which may be (or may be transformable to) an IRI".

Yes, in JSON.

4. A subject without a URI is scoped to the document in which it is expressed.
5. A subject without a URI is called a Blank Node (or BNode)
6. A BNode may be given an identifier for intra-document referencing.

No, no, no. No blank nodes. All nodes must have IDs. They don't have to have anything else, so they may be "blank" in various human senses, but they are not anonymous to the machine.

Maybe we just need to invent terms that don't come loaded. My point was that nodes may be externally addressable, and use URIs, or be internally addressable, possibly using identifiers to reference them from within the document. If it's linked data, and is externally addressable it MUST have a URI. If it is only internally addressable, then it effectively has the same semantics as a BNode.

7. A predicate describes an edge of the directed graph relating two entities.

I think we've spent decades suffering because relational databases make one-to-many relationships painful, and ordered multiples even more painful. RDF made those same mistakes again. Time to stop. Lists are integral to human reasoning, and should be integral to our data representation. An arc connects a node to an ordered list of targets.

Perhaps it should say "an entity with one or more values." My concern about expressing ordering is that it is inherently a difficult thing to do in many storage systems, and assuming that multi-valued relationships MUST be ordered, is placing an unnecessary burden on systems where ordering is not necessarily important.

8. A predicate should be represented as a URI.

It should be possible to associate a predicate with an IRI, but I don't believe IRIs are very good as actual predicates, any more than they are as column-headings in a table, and even if you want to associate a predicate with an IRI, this is a model-level assertion that should be done once per type-arc, not once per instance assertion. And, in fact, I believe many, many data applications can function perfectly well without their arcs being formally meta-modeled at all.

Yes, in JSON. In LD, it SHOULD/MUST be a URI.

9. An object is a node in a directed graph that may be a non-terminal or a terminal node.
10. An object which is a terminal node is called a literal.
11. A literal may include a datatype or have a language.

I think this node-or-literal thing is one of the big copouts in RDF (albeit a perfectly understandable one given its resource-description origins). Every logical piece of data in a dataset should be a typed node. The mapping of these typed nodes to human-readable symbols is a different level or dimension. It should be fundamentally impossible to say that the president of the US is the string "Barack Obama". Multiple-language support also belongs in this symbolic level, not the logical level, otherwise you get inane things like not even being able to count how many objects you have for a given subject and predicate because you have no way to tell how many of them are translations of each other. And datatypes belong in a data-model, not restated by every instance node (or, worse still, by every appearance of an "object literal").

Agreed. In the syntactic model (JSON) this can rely on structural elements of the language. In the logical (Linked Data) model, I don't see how these can be separated.

----------

So, backing up:

I come at this all as a data problem first, and a web problem only second and almost incidentally. I think the problem is that relational databases (as they have evolved in practice) have imposed several unhelpful machine-level realities on their human users, and that the idea of a data-graph is to push the machine concerns back down under the human abstraction. Internalize the tables and foreign keys and joins and you get types and arcs. Internalize the one-to-many/many-to-many tables and you get the need for target lists. Insist on every piece of data being a node and you get addressability and rotation and data that is not hamstrung by shortcuts.

And all this is true for data because it's data, no matter how it's exposed or accessed. Even inside an application with no HTTP access, there's still data, and a graph is still the best general-purpose representation for human purposes that I know. That is the fundamental data thing I believe. And I feel like RDF wants to believe in this idea, too, but got a little mired in history and quirks and edge-cases, and so has ended up not yet doing it justice. But we can fix this. The good news part of the bad news that RDF has had miserably slow uptake is that we still have time to iterate some more, and make some new things that are even simpler and even better.

Publishing data on the web is then, in my mind, an extra optional (but very excellent) layer of stuff on top of the basic data-representation. As far as I'm concerned, this is mostly a matter of mapping dataset-internal identifiers (I like integers for these) to IRIs, which for many purposes is nothing more than assigning (even dynamically) a single base IRI to the entire dataset.

And then, at the very end, when we talk about shipping blobs of this well-represented data around, I think JSON is very excellent and the most obvious and compelling choice. And so here I am, because "Linked Data in JSON" sounds so much like a thing I want to see exist, for the sake of humans and computers and their interactions in and around data, that I don't want it to turn out to be a disappointing label for some other encumbered, half-unsatisfactory result of inertia and premature entrenchment.

I should also probably admit that I'm professionally skeptical of "requirements" in design, outside of specifically commissioned work-for-hire. I believe in goals, and I believe in understanding potential use-cases. But when you're inventing something new, I think it's a mistake to believe that you know ahead of time exactly what (or whom) it is or isn't going to be good for.

----------

Does that put my input in any clearer a context?

glenn

Thanks, I think this does help. I created the requirements, as I felt that it wasn't clear that we had a common understanding of the context in which we are collectively trying to create a solution. Think of them as a set of statements we can discuss and refine to try to reach some consensus opinion, not as an externally imposed set of requirements that we need to satisfy.

Gregg

Received on Monday, 4 July 2011 01:09:17 UTC