- From: Gregg Reynolds <dev@mobileink.com>
- Date: Thu, 13 Jun 2013 10:16:44 -0500
- To: public-rdf-comments <public-rdf-comments@w3.org>
- Message-ID: <CAO40Mik59M7p50gTP=SA+miL6y7iO5ZCXbV9UM1=3a49_aap5w@mail.gmail.com>
... from a Concerned Citizen. For what it's worth, I'm quite familiar with RDF but have not been following the various relevant WGs for some time and only just got around to reading the JSON-LD draft, mainly because I happened to notice the recent discussion about whether RDF should or should not be mentioned etc., so I'll regale you with my impressions in hopes they might be useful. [P.S. It turns out I have a specific idea for satisfying both pro- and anti-RDF camps, see below.] First impression: where's the RDF? I was expecting to see something in the non-normative sections explaining or demonstrating how JSON-LD maps to RDF or vice-versa. Instead all I find is what amounts to a couple of footnotes. Which would have left me perplexed - what is this beast? - had I not seen the discussion about RDF phobia etc. Example 1: { "name": "Manu Sporny", "homepage": "http://manu.sporny.org/", "image": "http://manu.sporny.org/images/manu.png" } "It's obvious to humans that the data is about a person whose name is 'Manu Sporny'..." This is plainly a false claim. I see a set of three ordered pairs, and I see no reason whatsoever to think that such a set is "about" anything. If I'm told that it is about something and am asked to guess what, there's a pretty good chance that "a person named 'Manu Sporny'" is the last thing that would come to mind. It seems much more likely that I (in my "Everyman" hat) would say it's about the homepage or the image of said Manu. On the other hand, knowing about RDF as I do, I see why the claim was made. Which strongly suggests that RDF is after all central to JSON-LD. And this is the crux of the matter: it's all about aboutness. More on this below. I also expected to see some kind of translation from JSON-LD expressions to triples and found it annoying that this was not the case, since it left me continually wondering if I was misunderstanding. After all, if it's supposed to "work" for RDF, but it pointedly excludes talk of RDF, well, maybe it's supposed to be something else - what? In other words, omission of RDF-talk is not just an expression of accomodation the the RDF-phobes, it's an expression of (mild) hostility to RDF-philes. At least that's how I take it. Another thing that jumped out at me: @type. Is that rdf:type? Sure seems like it ought to be but I can't really tell without spending time and energy analyzing. Seems to me the spec ought to save me the trouble by explicitly describing how the JSON-LD stuff relates to the RDF stuff. There are a number of typos, grammatical errors etc. that I'll list in a separate message. More generally, in light of the LD v. RDF struggle: I get the distinct impression that in trying to satisfy the RDF-phobes, the WG has thrown the RDF-philes under the bus. Even more generally regarding LD, RDF, etc.: in my view there is some deep confusion in the land about "Linked Data". I notice in a number of places (in the discussions on the list and minutes of teleconfs) that people make claims to the effect that linked data - er, Linked Data - is just HTTP IRIs that are dereferencable. An indirect example from one of the messages: "IMHO, RDF != Linked Data. Nothing in RDF requires IRIs to be dereferenceable ..." The clear implication being that dereferenceability is what demarcates LD. But then we also have claims like the following (from http://json-ld.org/minutes/2011-07-04/#topic-3): "Linked Data is used to represent a directed graph, and within the context of Linked Data, the graph can be represented as connections between different nodes, nodes are subjects and objects, links are properties. Nodes may have identifiers that are URIs allowing them to be externally addressed." Note: no mention of dereferenceability as a criterion of demarcation. I think one problem is a clash or at least lack of clarity regarding the relation of formalisms to pragmatics. You don't need the web to describe graph structures. You do need the web to have dereferencing. A related problem is that lots of people seem to take "Linked Data" to refer to a kind of data - the dereferenceable kind - that can be "defined" as such. The four items in TBL's original design note on LD are then taken as definitional of what LD is. This a mistake. First of all, TBL's note is explicit: those four items are "expectations of behavior", or as I would put it, descriptions of normative practices. Second, and more critically, dereferenceability CANNOT be used to define a kind of data in isolation. It is not a property of data, it's a variety of data use. It's probably better construed as a system property (although that's not entirely right). If it were a property of data, then LD would cease to be LD as soon as the server is taken offline, or the client loses network connectivity. You would also be able to tell if a datum is LD by looking at it (rather than using it). Treating dereferenceability as definitional of LD confuses matters of fact with norms of practice. It also tends to lead to quasi-metaphysical debates involving claims of the form "but LD is/is not xyz", or "but RDF is/is not LD" (or vice versa). But it's not about metaphysics, it's about pragmatics: what you do with the data, how you treat it. Just to be clear: if you write a Java program that violates the syntactic rules of the language, you have not written a bad Java program, you written something that is not a Java program. But if you publish (or claim to publish) LD without providing for dereferencing of the IRIs (for example), you have published bad LD, not something other than LD. Or perhaps it would be more accurate to say you have made an unwarranted claim. That a program is not Java is provable - it won't compile - so the truth of the claim is decidable and categorical - yes or no. That some LD is bad isn't really provable in that sense, since the web changes - the claim can be contested but not decided by proof. Plus lots of data will mix dereferenceable and non-dereferenceable IRIs, and HTTP and other schemes. >From this perspective, the first paragraph of the intro should be rewritten. First, Linked Data is not a technique, it is a set of normative practices. "Technique" implies (in my opinion) procedure, algorithm, or law-like rules that necessarily lead to correctness, which is not what LD practices are (you can't guarantee dereferenceability, for example.) Second, mentions of Linked Data "properties" should be removed, or replaced by mention of practices, norms or the like. Now you might just say "so what?" Is there any real harm in treating LD as a definite kind of data rather than norms for using data? Maybe not, in the grand scheme of things, but in addition to the advantage of clarity there's another reason to adopt something like the vocab I've suggested for talking about LD (and RDF). Which I can sum up in two principles: The Web is about aboutness. Aboutness on the web is purely pragmatic - a matter of norms governing how we use/treat things, not what they intrinsically (objectively, naturally, etc.) are. The third of the four "properties" listed in the intro (which draws on TBL's note) is "the name IRIs<http://json-ld.org/spec/latest/json-ld/#dfn-iri>, when dereferenced, provide more information about the thing". My impression is that most people take "dereferenced" to be the key term in that clause. But that's wrong; the key term is "about". And I suspect that a lack of clarity about what "about" is about is the source of much of the confusion that has always accompanied semantic web talk in its many forms. There are at least three varieties of aboutness involved. (Ok, I know this is starting to sound very arcane and philosophical but bear with me - in the end it is very simple, clear, and easily explainable by example.) - Denotational aboutness. We use IRIs to name (refer to, denote) things. This is a purely pragmatic matter; IRIs do not in and of themselves name anything. Only insofar as we treat them as names do the function as names. (Note that the English meaning of "about" may cause confusion here - we don't normally say that e.g. "The name 'Napoleon' is about Napoleon". So here "aboutness" just means directed to something.) - Implicit claim aboutness. Given <a href="http://.../Napoleon.html">Napoleon</a>, the practical norm is that the HTML document named by the URI should be about Napoleon, at least in general; implicitly, this syntax expresses a claim that the HTML page is about Napoleon. The critical point here is that this is implicit; the formal requirement is only that the browser should arrange for the URI to be dereferenced with "Napoleon" is clicked. Nothing in the syntax is defined as a claim. That the content should be about Napoleon is a matter of social convention (norms). - Explicit claim aboutness. We want to be able to say something more than simply "this webpage is about Napoleon"; for example, we want to be able to express the claim that Napoleon's wife was Josephine. There is no way to do this implicitly. You could design an XML language that includes a "Napoleon" tag with a "wife" attribute, but we want generality. RDF provides one solution to this problem - it explicitly (more or less) stipulates that a triple is to be taken as a claim about its first term referent. (I just made this up so the language can no doubt be significantly improved but I think it gets the point across.) (Incidentally, this approach suggests a way of presenting RDF that may be an improvement on the S-P-O vocabulary. E.g. in RDF a claim is expressed as a topic plus a comment about the topic. The comment consists of a qualifier and a complement. Yielding Topic-Qualifier-Complement treated as Topic-Comment, instead of S-P-O. etc.) Now we're in a position to see the problem with LD "definitions". They don't say what kind of aboutness is involved where dereferencing occurs. If it were only a matter of dereferencing IRIs to yield data about something then the HTML web is by definition a Linked Data web. But it seems to me that the criterion of demarcation should be whether or not we can make explicit, qualified claims. (By "qualified" I mean that the middle term of a triple serves to qualify the relation between the topic and complement, e.g. in "Franklin invented bifocals", "invented" tells us what kind of relation obtains between Franklin and bifocals.) Both RDF and JSON-LD are species of the genus of making explicit claims about things. It isn't clear to me if LD is too. Ok, so the potential payoff with respect to JSON-LD is that this vocabulary of claims and aboutness would allow us to explicitly address the core of what RDF is about without talking explicitly about RDF. So for Example 1 from the spec, one could introduce the concept of expressing a claim about something, show the JSON-LD expression, and explicate it in terms of topic (the person, Manu Sporny) and comment (his homepage is at http//...). This could be done using any number of regimented quasi-formal schemes, including pseudo-English. (Note by the way that in many languages it is the norm to talk in just this way: instead of "Manu Sporny's homepages is http://..." one says something like "Many Spornu, his homepage is http:...") Having said all that, I can live with the spec as it is; the WG need not spend time formulating any kind of official response to this. I just wanted to provide some feedback (and I confess I think the stuff about pragmatics, aboutness, and claims is kind of an interesting approach so I wonder if anybody else does too.) JSON-LD will sink or swim on its technical merits; either way, relatively few people will read the spec (anybody read the SQL spec lately?). If it takes off, we'll see lots of blog posts and some books explaining it. So the non-normative sections just need to be "good enough". Thanks for all the hard work, Gregg Reynolds
Received on Thursday, 13 June 2013 15:17:17 UTC