JSON-LD/RDF feedback

... from a Concerned Citizen.  For what it's worth, I'm quite familiar with
RDF but have not been following the various relevant WGs for some time and
only just got around to reading the JSON-LD draft, mainly because I
happened to notice the recent discussion about whether RDF should or should
not be mentioned etc., so I'll regale you with my impressions in hopes they
might be useful.

[P.S.  It turns out I have a specific idea for satisfying both pro- and
anti-RDF camps, see below.]

First impression:  where's the RDF?  I was expecting to see something in
the non-normative sections explaining or demonstrating how JSON-LD maps to
RDF or vice-versa.  Instead all I find is what amounts to a couple of
footnotes.  Which would have left me perplexed - what is this beast? - had
I not seen the discussion about RDF phobia etc.

Example 1:

{
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/",
  "image": "http://manu.sporny.org/images/manu.png"
}

"It's obvious to humans that the data is about a person whose name is 'Manu
Sporny'..."

This is plainly a false claim.  I see a set of three ordered pairs, and I
see no reason whatsoever to think that such a set is "about" anything.  If
I'm told that it is about something and am asked to guess what, there's a
pretty good chance that "a person named 'Manu Sporny'" is the last thing
that would come to mind.  It seems much more likely that I (in my
"Everyman" hat) would say it's about the homepage or the image of said
Manu.  On the other hand, knowing about RDF as I do, I see why the claim
was made.  Which strongly suggests that RDF is after all central to JSON-LD.

And this is the crux of the matter: it's all about aboutness.  More on this
below.

I also expected to see some kind of translation from JSON-LD expressions to
triples and found it annoying that this was not the case, since it left me
continually wondering if I was misunderstanding.  After all, if it's
supposed to "work" for RDF, but it pointedly excludes talk of RDF, well,
maybe it's supposed to be something else - what?  In other words, omission
of RDF-talk is not just an expression of accomodation the the RDF-phobes,
it's an expression of (mild) hostility to RDF-philes.  At least that's how
I take it.

Another thing that jumped out at me: @type.  Is that rdf:type?  Sure seems
like it ought to be but I can't really tell without spending time and
energy analyzing. Seems to me the spec ought to save me the trouble by
explicitly describing how the JSON-LD stuff relates to the RDF stuff.

There are a number of typos, grammatical errors etc. that I'll list in a
separate message.

More generally, in light of the LD v. RDF struggle: I get the distinct
impression that in trying to satisfy the RDF-phobes, the WG has thrown the
RDF-philes under the bus.

Even more generally regarding LD, RDF, etc.: in my view there is some deep
confusion in the land about "Linked Data".  I notice in a number of places
(in the discussions on the list and minutes of teleconfs) that people make
claims to the effect that linked data - er, Linked Data - is just HTTP IRIs
that are dereferencable.  An indirect example from one of the messages:

"IMHO, RDF != Linked Data. Nothing in RDF requires IRIs to be
dereferenceable ..."

The clear implication being that dereferenceability is what demarcates LD.

But then we also have claims like the following (from
http://json-ld.org/minutes/2011-07-04/#topic-3):  "Linked Data is used to
represent a directed graph, and within the context of Linked Data, the
graph can be represented as connections between different nodes, nodes are
subjects and objects, links are properties. Nodes may have identifiers that
are URIs allowing them to be externally addressed."

Note: no mention of dereferenceability as a criterion of demarcation.

I think one problem is a clash or at least lack of clarity regarding the
relation of formalisms to pragmatics.  You don't need the web to describe
graph structures.  You do need the web to have dereferencing.

A related problem is that lots of people seem to take "Linked Data" to
refer to a kind of data - the dereferenceable kind - that can be "defined"
as such.  The four items in TBL's original design note on LD are then taken
as definitional of what LD is.  This a mistake.  First of all, TBL's note
is explicit: those four items are "expectations of behavior", or as I would
put it, descriptions of normative practices.  Second, and more critically,
dereferenceability CANNOT be used to define a kind of data in isolation.
 It is not a property of data, it's a variety of data use.  It's probably
better construed as a system property (although that's not entirely right).
 If it were a property of data, then LD would cease to be LD as soon as the
server is taken offline, or the client loses network connectivity.  You
would also be able to tell if a datum is LD by looking at it (rather than
using it).  Treating dereferenceability as definitional of LD confuses
matters of fact with norms of practice.  It also tends to lead to
quasi-metaphysical debates involving claims of the form "but LD is/is not
xyz", or "but RDF is/is not LD" (or vice versa).  But it's not about
metaphysics, it's about pragmatics: what you do with the data, how you
treat it.

Just to be clear:  if you write a Java program that violates the syntactic
rules of the language, you have not written a bad Java program, you written
something that is not a Java program.  But if you publish (or claim to
publish) LD without providing for dereferencing of the IRIs (for example),
you have published bad LD, not something other than LD.  Or perhaps it
would be more accurate to say you have made an unwarranted claim.  That a
program is not Java is provable - it won't compile - so the truth of the
claim is decidable and categorical - yes or no.  That some LD is bad isn't
really provable in that sense, since the web changes - the claim can be
contested but not decided by proof.  Plus lots of data will mix
dereferenceable and non-dereferenceable IRIs, and HTTP and other schemes.

>From this perspective, the first paragraph of the intro should be
rewritten.  First, Linked Data is not a technique, it is a set of normative
practices.  "Technique" implies (in my opinion) procedure, algorithm, or
law-like rules that necessarily lead to correctness, which is not what LD
practices are (you can't guarantee dereferenceability, for example.)
 Second, mentions of Linked Data "properties" should be removed, or
replaced by mention of practices, norms or the like.

Now you might just say "so what?"  Is there any real harm in treating LD as
a definite kind of data rather than norms for using data?  Maybe not, in
the grand scheme of things, but in addition to the advantage of clarity
there's another reason to adopt something like the vocab I've suggested for
talking about LD (and RDF).  Which I can sum up in two principles:

    The Web is about aboutness.
    Aboutness on the web is purely pragmatic - a matter of norms governing
how we use/treat things, not what they intrinsically (objectively,
naturally, etc.) are.

The third of the four "properties" listed in the intro (which draws on
TBL's note) is "the name IRIs<http://json-ld.org/spec/latest/json-ld/#dfn-iri>,
when dereferenced, provide more information about the thing".  My
impression is that most people take "dereferenced" to be the key term in
that clause.  But that's wrong; the key term is "about".  And I suspect
that a lack of clarity about what "about" is about is the source of much of
the confusion that has always accompanied semantic web talk in its many
forms.  There are at least three varieties of aboutness involved.  (Ok, I
know this is starting to sound very arcane and philosophical but bear with
me - in the end it is very simple, clear, and easily explainable by
example.)


   - Denotational aboutness.  We use IRIs to name (refer to, denote)
   things.  This is a purely pragmatic matter; IRIs do not in and of
   themselves name anything.  Only insofar as we treat them as names do the
   function as names.  (Note that the English meaning of "about" may cause
   confusion here - we don't normally say that e.g. "The name 'Napoleon' is
   about Napoleon".  So here "aboutness" just means directed to something.)
   - Implicit claim aboutness.  Given <a
href="http://.../Napoleon.html">Napoleon</a>,
   the practical norm is that the HTML document named by the URI should be
   about Napoleon, at least in general; implicitly, this syntax expresses a
   claim that the HTML page is about Napoleon.  The critical point here is
   that this is implicit; the formal requirement is only that the browser
   should arrange for the URI to be dereferenced with "Napoleon" is clicked.
    Nothing in the syntax is defined as a claim.  That the content should be
   about Napoleon is a matter of social convention (norms).
   - Explicit claim aboutness.  We want to be able to say something more
   than simply "this webpage is about Napoleon"; for example, we want to be
   able to express the claim that Napoleon's wife was Josephine.  There is no
   way to do this implicitly.  You could design an XML language that includes
   a "Napoleon" tag with a "wife" attribute, but we want generality. RDF
   provides one solution to this problem - it explicitly (more or less)
   stipulates that a triple is to be taken as a claim about its first term
   referent.

(I just made this up so the language can no doubt be significantly improved
but I think it gets the point across.)

(Incidentally, this approach suggests a way of presenting RDF that may be
an improvement on the S-P-O vocabulary.  E.g. in RDF a claim is expressed
as a topic plus a comment about the topic.  The comment consists of a
qualifier and a complement.  Yielding Topic-Qualifier-Complement treated as
Topic-Comment, instead of S-P-O.  etc.)

Now we're in a position to see the problem with LD "definitions".  They
don't say what kind of aboutness is involved where dereferencing occurs.
 If it were only a matter of dereferencing IRIs to yield data about
something then the HTML web is by definition a Linked Data web.  But it
seems to me that the criterion of demarcation should be whether or not we
can make explicit, qualified claims.  (By "qualified" I mean that the
middle term of a triple serves to qualify the relation between the topic
and complement, e.g. in "Franklin invented bifocals", "invented" tells us
what kind of relation obtains between Franklin and bifocals.)

Both RDF and JSON-LD are species of the genus of making explicit claims
about things.  It isn't clear to me if LD is too.

Ok, so the potential payoff with respect to JSON-LD is that this vocabulary
of claims and aboutness would allow us to explicitly address the core of
what RDF is about without talking explicitly about RDF.  So for Example 1
from the spec, one could introduce the concept of expressing a claim about
something, show the JSON-LD expression, and explicate it in terms of topic
(the person, Manu Sporny) and comment (his homepage is at http//...).  This
could be done using any number of regimented quasi-formal schemes,
including pseudo-English.  (Note by the way that in many languages it is
the norm to talk in just this way: instead of "Manu Sporny's homepages is
http://..." one says something like "Many Spornu, his homepage is http:...")

Having said all that, I can live with the spec as it is; the WG need not
spend time formulating any kind of official response to this. I just wanted
to provide some feedback  (and I confess I think the stuff about
pragmatics, aboutness, and claims is kind of an interesting approach so I
wonder if anybody else does too.)  JSON-LD will sink or swim on its
technical merits; either way, relatively few people will read the spec
(anybody read the SQL spec lately?).  If it takes off, we'll see lots of
blog posts and some books explaining it.  So the non-normative sections
just need to be "good enough".

Thanks for all the hard work,

Gregg Reynolds

Received on Thursday, 13 June 2013 15:17:17 UTC