Re: JSON-LD requirements from glenn mcdonald on 2011-07-04 (public-linked-json@w3.org from July 2011)

From: glenn mcdonald <glenn@furia.com>
Date: Mon, 4 Jul 2011 00:16:02 +0000
To: Linked JSON <public-linked-json@w3.org>
Message-ID: <CAHNbrUtTiYTU=5qWo7cLMxU5jbGuL6vTGzttVCFgwBhB=_oKCQ@mail.gmail.com>
>
> Glenn-I'd like to ask that you be specific in stating your objections
> to the points in section 3.1.
>

OK, I'll go through them and then I'll back up and try to explain where I'm
coming from, so it's easier (hopefullly) to understand where I end up.

1. Linked Data is used to express relationships between entities expressed
as subject-predicate-object, or entity-attribute-value.
2. A subject is a non-terminal node in a directed graph.

I think by this pair you mean that subject, predicate and object are all
singular, as in RDF triples. I believe that for a huge number of data
applications, triples are too low-level an abstraction, an assembly language
where we need a Java or a Ruby. Instead of breaking the logical nodes and
arcs of a graph into triples, I want to see them expressed and transmitted *as
nodes*, each with their arcs. Especially in JSON, where the mapping of nodes
to JSON objects is both obvious and excellent, I don't want to see any
"requirement" that data be decomposed into disconnected triples.

3. A subject may be given a unique identifier represented using a URI.

I say "A node must have a unique identifier, which may be (or may be
transformable to) an IRI".

4. A subject without a URI is scoped to the document in which it is
expressed.
5. A subject without a URI is called a Blank Node (or BNode)
6. A BNode may be given an identifier for intra-document referencing.

No, no, no. No blank nodes. All nodes must have IDs. They don't have to have
anything *else*, so they may be "blank" in various human senses, but they
are not anonymous to the machine.

7. A predicate describes an edge of the directed graph relating two
entities.

I think we've spent decades suffering because relational databases make
one-to-many relationships painful, and ordered multiples even more painful.
RDF made those same mistakes again. Time to stop. Lists are integral to
human reasoning, and should be integral to our data representation. An arc
connects a node to an ordered list of targets.

8. A predicate should be represented as a URI.

It should be possible to associate a predicate with an IRI, but I don't
believe IRIs are very good *as* actual predicates, any more than they are as
column-headings in a table, and even if you want to associate a predicate
with an IRI, this is a model-level assertion that should be done once per
type-arc, not once per instance assertion. And, in fact, I believe many,
many data applications can function perfectly well without their arcs being
formally meta-modeled at all.

9. An object is a node in a directed graph that may be a non-terminal or a
terminal node.
10. An object which is a terminal node is called a literal.
11. A literal may include a datatype or have a language.

I think this node-or-literal thing is one of the big copouts in RDF (albeit
a perfectly understandable one given its resource-description origins).
Every logical piece of data in a dataset should be a typed node. The mapping
of these typed nodes to human-readable symbols is a different level or
dimension. It should be fundamentally impossible to say that the president
of the US is the string "Barack Obama". Multiple-language support also
belongs in this symbolic level, not the logical level, otherwise you get
inane things like not even being able to count how many objects you have for
a given subject and predicate because you have no way to tell how many of
them are translations of each other. And datatypes belong in a data-model,
not restated by every instance node (or, worse still, by every appearance of
an "object literal").

----------

So, backing up:

I come at this all as a data problem first, and a web problem only second
and almost incidentally. I think the problem is that relational databases
(as they have evolved in practice) have imposed several unhelpful
machine-level realities on their human users, and that the idea of a
data-graph is to push the machine concerns back down under the human
abstraction. Internalize the tables and foreign keys and joins and you get
types and arcs. Internalize the one-to-many/many-to-many tables and you get
the need for target lists. Insist on every piece of data being a node and
you get addressability and rotation and data that is not hamstrung by
shortcuts.

And all this is true for data because it's data, no matter how it's exposed
or accessed. Even inside an application with no HTTP access, there's still
data, and a graph is still the best general-purpose representation for human
purposes that I know. That is the fundamental data thing I believe. And I
feel like RDF wants to believe in this idea, too, but got a little mired in
history and quirks and edge-cases, and so has ended up not yet doing it
justice. But we can fix this. The good news part of the bad news that RDF
has had miserably slow uptake is that we still have time to iterate some
more, and make some new things that are even simpler and even better.

Publishing data on the web is then, in my mind, an extra optional (but very
excellent) layer of stuff on top of the basic data-representation. As far as
I'm concerned, this is mostly a matter of mapping dataset-internal
identifiers (I like integers for these) to IRIs, which for many purposes is
nothing more than assigning (even dynamically) a single base IRI to the
entire dataset.

And then, at the very end, when we talk about shipping blobs of this
well-represented data around, I think JSON is very excellent and the most
obvious and compelling choice. And so here I am, because "Linked Data in
JSON" sounds so much like a thing I want to see exist, for the sake of
humans and computers and their interactions in and around data, that I don't
want it to turn out to be a disappointing label for some other encumbered,
half-unsatisfactory result of inertia and premature entrenchment.

I should also probably admit that I'm professionally skeptical of
"requirements" in design, outside of specifically commissioned
work-for-hire. I believe in goals, and I believe in understanding potential
use-cases. But when you're inventing something new, I think it's a mistake
to believe that you know ahead of time exactly what (or whom) it is or isn't
going to be good for.

----------

Does that put my input in any clearer a context?

glenn
Received on Monday, 4 July 2011 00:16:49 UTC