Re: JSON-LD/RDF feedback from Kingsley Idehen on 2013-06-13 (public-rdf-comments@w3.org from June 2013)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Thu, 13 Jun 2013 13:01:56 -0400
To: public-rdf-comments@w3.org
Message-ID: <51B9FB04.9020400@openlinksw.com>
On 6/13/13 11:16 AM, Gregg Reynolds wrote:
> ... from a Concerned Citizen.  For what it's worth, I'm quite familiar 
> with RDF but have not been following the various relevant WGs for some 
> time and only just got around to reading the JSON-LD draft, mainly 
> because I happened to notice the recent discussion about whether RDF 
> should or should not be mentioned etc., so I'll regale you with my 
> impressions in hopes they might be useful.
>
> [P.S.  It turns out I have a specific idea for satisfying both pro- 
> and anti-RDF camps, see below.]
>
> First impression:  where's the RDF?  I was expecting to see something 
> in the non-normative sections explaining or demonstrating how JSON-LD 
> maps to RDF or vice-versa.  Instead all I find is what amounts to a 
> couple of footnotes.  Which would have left me perplexed - what is 
> this beast? - had I not seen the discussion about RDF phobia etc.
>
> Example 1:
> {
>    "name": "Manu Sporny",
>    "homepage": "http://manu.sporny.org/",
>    "image": "http://manu.sporny.org/images/manu.png"
> }
> "It's obvious to humans that the data is about a person whose name is 
> 'Manu Sporny'..."
>
> This is plainly a false claim.  I see a set of three ordered pairs, 
> and I see no reason whatsoever to think that such a set is "about" 
> anything.  If I'm told that it is about something and am asked to 
> guess what, there's a pretty good chance that "a person named 'Manu 
> Sporny'" is the last thing that would come to mind.  It seems much 
> more likely that I (in my "Everyman" hat) would say it's about the 
> homepage or the image of said Manu.  On the other hand, knowing about 
> RDF as I do, I see why the claim was made.  Which strongly suggests 
> that RDF is after all central to JSON-LD.
>
> And this is the crux of the matter: it's all about aboutness.  More on 
> this below.
>
> I also expected to see some kind of translation from JSON-LD 
> expressions to triples and found it annoying that this was not the 
> case, since it left me continually wondering if I was 
> misunderstanding.  After all, if it's supposed to "work" for RDF, but 
> it pointedly excludes talk of RDF, well, maybe it's supposed to be 
> something else - what?  In other words, omission of RDF-talk is not 
> just an expression of accomodation the the RDF-phobes, it's an 
> expression of (mild) hostility to RDF-philes.  At least that's how I 
> take it.
>
> Another thing that jumped out at me: @type.  Is that rdf:type?  Sure 
> seems like it ought to be but I can't really tell without spending 
> time and energy analyzing. Seems to me the spec ought to save me the 
> trouble by explicitly describing how the JSON-LD stuff relates to the 
> RDF stuff.
>
> There are a number of typos, grammatical errors etc. that I'll list in 
> a separate message.
>
> More generally, in light of the LD v. RDF struggle: I get the distinct 
> impression that in trying to satisfy the RDF-phobes, the WG has thrown 
> the RDF-philes under the bus.
>
> Even more generally regarding LD, RDF, etc.: in my view there is some 
> deep confusion in the land about "Linked Data".  I notice in a number 
> of places (in the discussions on the list and minutes of teleconfs) 
> that people make claims to the effect that linked data - er, Linked 
> Data - is just HTTP IRIs that are dereferencable.  An indirect example 
> from one of the messages:
>
> "IMHO, RDF != Linked Data. Nothing in RDF requires IRIs to be 
> dereferenceable ..."
>
> The clear implication being that dereferenceability is what demarcates LD.
>
> But then we also have claims like the following (from 
> http://json-ld.org/minutes/2011-07-04/#topic-3):  "Linked Data is used 
> to represent a directed graph, and within the context of Linked Data, 
> the graph can be represented as connections between different nodes, 
> nodes are subjects and objects, links are properties. Nodes may have 
> identifiers that are URIs allowing them to be externally addressed."
>
> Note: no mention of dereferenceability as a criterion of demarcation.
>
> I think one problem is a clash or at least lack of clarity regarding 
> the relation of formalisms to pragmatics.  You don't need the web to 
> describe graph structures.  You do need the web to have dereferencing.
>
> A related problem is that lots of people seem to take "Linked Data" to 
> refer to a kind of data - the dereferenceable kind - that can be 
> "defined" as such.  The four items in TBL's original design note on LD 
> are then taken as definitional of what LD is.  This a mistake.  First 
> of all, TBL's note is explicit: those four items are "expectations of 
> behavior", or as I would put it, descriptions of normative practices. 
>  Second, and more critically, dereferenceability CANNOT be used to 
> define a kind of data in isolation.  It is not a property of data, 
> it's a variety of data use.  It's probably better construed as a 
> system property (although that's not entirely right).  If it were a 
> property of data, then LD would cease to be LD as soon as the server 
> is taken offline, or the client loses network connectivity.  You would 
> also be able to tell if a datum is LD by looking at it (rather than 
> using it).  Treating dereferenceability as definitional of LD confuses 
> matters of fact with norms of practice.  It also tends to lead to 
> quasi-metaphysical debates involving claims of the form "but LD is/is 
> not xyz", or "but RDF is/is not LD" (or vice versa).  But it's not 
> about metaphysics, it's about pragmatics: what you do with the data, 
> how you treat it.
>
> Just to be clear:  if you write a Java program that violates the 
> syntactic rules of the language, you have not written a bad Java 
> program, you written something that is not a Java program.  But if you 
> publish (or claim to publish) LD without providing for dereferencing 
> of the IRIs (for example), you have published bad LD, not something 
> other than LD.  Or perhaps it would be more accurate to say you have 
> made an unwarranted claim.  That a program is not Java is provable - 
> it won't compile - so the truth of the claim is decidable and 
> categorical - yes or no.  That some LD is bad isn't really provable in 
> that sense, since the web changes - the claim can be contested but not 
> decided by proof.  Plus lots of data will mix dereferenceable and 
> non-dereferenceable IRIs, and HTTP and other schemes.
>
> From this perspective, the first paragraph of the intro should be 
> rewritten.  First, Linked Data is not a technique, it is a set of 
> normative practices.  "Technique" implies (in my opinion) procedure, 
> algorithm, or law-like rules that necessarily lead to correctness, 
> which is not what LD practices are (you can't guarantee 
> dereferenceability, for example.)  Second, mentions of Linked Data 
> "properties" should be removed, or replaced by mention of practices, 
> norms or the like.
>
> Now you might just say "so what?"  Is there any real harm in treating 
> LD as a definite kind of data rather than norms for using data?  Maybe 
> not, in the grand scheme of things, but in addition to the advantage 
> of clarity there's another reason to adopt something like the vocab 
> I've suggested for talking about LD (and RDF).  Which I can sum up in 
> two principles:
>
>     The Web is about aboutness.
>     Aboutness on the web is purely pragmatic - a matter of norms 
> governing how we use/treat things, not what they intrinsically 
> (objectively, naturally, etc.) are.
>
> The third of the four "properties" listed in the intro (which draws on 
> TBL's note) is "the name IRIs 
> <http://json-ld.org/spec/latest/json-ld/#dfn-iri>, when dereferenced, 
> provide more information about the thing".  My impression is that most 
> people take "dereferenced" to be the key term in that clause.  But 
> that's wrong; the key term is "about".  And I suspect that a lack of 
> clarity about what "about" is about is the source of much of the 
> confusion that has always accompanied semantic web talk in its many 
> forms.  There are at least three varieties of aboutness involved. 
>  (Ok, I know this is starting to sound very arcane and philosophical 
> but bear with me - in the end it is very simple, clear, and easily 
> explainable by example.)
>
>   * Denotational aboutness.  We use IRIs to name (refer to, denote)
>     things.  This is a purely pragmatic matter; IRIs do not in and of
>     themselves name anything.  Only insofar as we treat them as names
>     do the function as names.  (Note that the English meaning of
>     "about" may cause confusion here - we don't normally say that e.g.
>     "The name 'Napoleon' is about Napoleon".  So here "aboutness" just
>     means directed to something.)
>   * Implicit claim aboutness.  Given <a
>     href="http://.../Napoleon.html">Napoleon</a>, the practical norm
>     is that the HTML document named by the URI should be about
>     Napoleon, at least in general; implicitly, this syntax expresses a
>     claim that the HTML page is about Napoleon.  The critical point
>     here is that this is implicit; the formal requirement is only that
>     the browser should arrange for the URI to be dereferenced with
>     "Napoleon" is clicked.  Nothing in the syntax is defined as a
>     claim.  That the content should be about Napoleon is a matter of
>     social convention (norms).
>   * Explicit claim aboutness.  We want to be able to say something
>     more than simply "this webpage is about Napoleon"; for example, we
>     want to be able to express the claim that Napoleon's wife was
>     Josephine.  There is no way to do this implicitly.  You could
>     design an XML language that includes a "Napoleon" tag with a
>     "wife" attribute, but we want generality. RDF provides one
>     solution to this problem - it explicitly (more or less) stipulates
>     that a triple is to be taken as a claim about its first term referent.
>
> (I just made this up so the language can no doubt be significantly 
> improved but I think it gets the point across.)
>
> (Incidentally, this approach suggests a way of presenting RDF that may 
> be an improvement on the S-P-O vocabulary.  E.g. in RDF a claim is 
> expressed as a topic plus a comment about the topic.  The comment 
> consists of a qualifier and a complement.  Yielding 
> Topic-Qualifier-Complement treated as Topic-Comment, instead of S-P-O. 
>  etc.)
>
> Now we're in a position to see the problem with LD "definitions". 
>  They don't say what kind of aboutness is involved where dereferencing 
> occurs.  If it were only a matter of dereferencing IRIs to yield data 
> about something then the HTML web is by definition a Linked Data web. 
>  But it seems to me that the criterion of demarcation should be 
> whether or not we can make explicit, qualified claims.  (By 
> "qualified" I mean that the middle term of a triple serves to qualify 
> the relation between the topic and complement, e.g. in "Franklin 
> invented bifocals", "invented" tells us what kind of relation obtains 
> between Franklin and bifocals.)
>
> Both RDF and JSON-LD are species of the genus of making explicit 
> claims about things.  It isn't clear to me if LD is too.
>
> Ok, so the potential payoff with respect to JSON-LD is that this 
> vocabulary of claims and aboutness would allow us to explicitly 
> address the core of what RDF is about without talking explicitly about 
> RDF.  So for Example 1 from the spec, one could introduce the concept 
> of expressing a claim about something, show the JSON-LD expression, 
> and explicate it in terms of topic (the person, Manu Sporny) and 
> comment (his homepage is at http//...).  This could be done using any 
> number of regimented quasi-formal schemes, including pseudo-English. 
>  (Note by the way that in many languages it is the norm to talk in 
> just this way: instead of "Manu Sporny's homepages is http://..." one 
> says something like "Many Spornu, his homepage is http:...")
>
> Having said all that, I can live with the spec as it is; the WG need 
> not spend time formulating any kind of official response to this. I 
> just wanted to provide some feedback  (and I confess I think the stuff 
> about pragmatics, aboutness, and claims is kind of an interesting 
> approach so I wonder if anybody else does too.)  JSON-LD will sink or 
> swim on its technical merits; either way, relatively few people will 
> read the spec (anybody read the SQL spec lately?).  If it takes off, 
> we'll see lots of blog posts and some books explaining it.  So the 
> non-normative sections just need to be "good enough".
>
> Thanks for all the hard work,
>
> Gregg Reynolds
>
>
Gregg,

To qualify a few things about statements such as RDF != Linked Data in 
association with de-reference, I would like to qualify a few things.

Web-like Structured Data Representation:

TimBL's meme outlined a principled approach structured data 
representation that results in a Data Web or Web-like structured data, 
courtesy of HTTP URIs. This approach makes the Web-like structured data 
scale to the expanse of the scale-free Web.

In his original meme [1] he indicated that this principled approach 
enables one to look-up what an HTTP URI denotes, while also indicating 
to publishes that useful information should be accessible from the 
look-up location.

In the revised meme [2] he added "using standards (RDF, SPARQL). This 
introduced the problem of using words that mean different things to 
different audiences which I break down as follows (un-pejoratively):

1. RDFers -- RDF and Linked Data (this thing with appreciative momentum) 
are now inextricably linked, we'll show them now!

2. RDF-Refluxers -- Linked Data is just a re-branding of RDF, they think 
we are stupid!

3. Independents -- WTF! (pardon my French, but I want to signal as 
effectively as possible in this response).

In reality, bearing in mind my proximity to TimBL re. these matters of 
Linked Data, I genuinely believe he meant (but of course I do not speak 
for him):

Use standards such as RDF and SAPRQL as an effective (productive) way to 
provide really useful information when HTTP URIs (as outlined in this 
note) are looked-up. In reality, when you get round to implementing 
Linked Data (as a publisher) all you have to do is redirect user agent 
URI look-up requests to a SPARQL protocol URL which leaves the heavy 
lifting to a SPARQL processor (which may or may not be a full blown DBMS 
engine).

As for this whole JSON-LD and RDF affair, there is one subtle detail 
that makes matters more challenging. JSON-LD is seeking to be published 
as a deliverable from the RDF group. In taking the aforementioned route, 
many of the RDF related push-backs become much more understandable.

Anyway, here are some links for additional context re. my comments re. 
Linked Data as Web-like structured data:

1. http://www.w3.org/DesignIssues/diagrams/history/proposal-fig1.gif -- 
illustration from original Web design document (this is clearly 
depicting a Data Web woven together via the typed relations illustrated 
as connections/connectors)

2. http://www.w3.org/2005/Talks/1110-iswc-tbl/#(4) -- that's Linked Data 101

3. http://www.w3.org/2005/Talks/1110-iswc-tbl/#(7) -- URIs + HTTP (this 
is makes Data Web-like)

4. http://dig.csail.mit.edu/2007/Talks/0511-tab-tbl/#(10) -- Linked Data 
(again, clearly defined and distinct from RDF).

Inserting Logic into Linked Data (i.e., taking it from our "minds eye" 
to a Data Web accessible to humans and machines) is where RDF kicks in, 
with aplomb.

Unfortunately, due to issues associated with OWL misconceptions 
compounded by RDF/XML dominating OWL examples, many RDFers are reluctant 
to utter the words "inference" or "reasoning" since they assume those 
are the issues that scare folks. A current example of that is easily 
discernible form some of the longer threads on the W3C's LDP list.

Pat Hayes gave a very nice presentation on Blogic [1] that puts this 
issue of Logic and Data Webs in scope.

Links:

1. http://slidesha.re/18CtxGK -- Blogic
2. http://videolectures.net/iswc09_hayes_blogic/ -- What's in a Link?

-- 

Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Thursday, 13 June 2013 17:02:21 UTC