- From: Gregg Reynolds <dev@mobileink.com>
- Date: Wed, 5 Jan 2011 06:59:11 -0600
- To: public-rdf-dawg-comments@w3.org
- Message-ID: <AANLkTikmnVFEng6MpxyvJkQtQ63QyB3U7MzZ7R5MVqHk@mail.gmail.com>
Hello, Here is some feedback on SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs <http://www.w3.org/TR/2010/WD-sparql11-http-rdf-update-20100126/>. I was on a WG once so I know it's a difficult and largely thankless task; I appreciate your work, even though I don't always agree with the results. Also, please don't take it personally if I use plain language; I presume that you value honest feedback over delicacy of expression. I basically agree entirely with Ian Davis' comments<http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2010Dec/0005.html>, but I would go a good bit further. In summary: - In my view this document is unnecessary and possibly harmful. - "RDF knowledge". Please don't do this. "This specification applies the HTTP protocol semantics in managing and modifying RDF graphs." HTTP, RDF, and SPARQL are already (relatively) well-defined. I don't see what this document adds; or rather, I don't see what problem it addresses that is not already addressed elsewhere. Indeed I think it likely to increase confusion. More on this in another post. Regarding "RDF knowledge": in my view introducing this phrase as a technical term is a very bad idea. "Knowledge" is not a technical term, I trust that much is obvious (show me one widely accepted and clear definition if you disagree), and I think it's also obvious that trying to impose a constrained technical sense on a common natural language term is a bad idea. It doesn't matter how well you "explain" your intended meaning; the fact that you are trying to appropriate an ordinary term for a specialized purpose will just annoy your readers, who will come to the text with well-established (read: stubborn) if informal notions about what the word "knowledge" means and how it should be used. Referring them to obscure papers in a minor engineering field will really annoy them, take my word for it. ;) The W3C is not an AI lab or a philosophy department; any standards it publishes should stick to minimal, pragmatic definitions using well-established and widely-accepted concepts and terminology. They should definitely not require researches in the arcana of AI. The problem is deeper than just finding the right terminology. I've read some of the correspondance on the mailing list and I understand the motivation behind the term. But I think the fact that "RDF knowledge" is the best you've come up with is a pretty good indication that the idea you want to convey is not well thought out. I gather the idea is that you want to draw some kind of distinction between graph and IRI - "the difference between the graph and what the graph IRI identifies<http://lists.w3.org/Archives/Public/public-rdf-dawg/2011JanMar/0011.html>". I suggest you rethink this - again, the plain obscurity of this language is a strong indicator that something is wrong. In my view the source of the problem is probably the notion that RDF somehow represents or encodes or [pick your verb]s "knowledge", which in turn is probably traceable to the notion that IRIs "identify resources". Both ideas are fundamentally wrong in my view; RDF is about graphs, not knowledge, and the IRI may or may not identify something (just because TBL et al. used "identifier" in the name does not make it an identifier *of* something). "Resource" is just a way of saying "we can't think of a better term". I expect this argument will be viewed with considerable skepticism in the WG and the RDF community in general. I've posted a few blog notes<http://blog.mobileink.com/>that might help clarify what I'm trying to get at if you're interested. A simple example might help. Consider the following simple analog to the kind of language we're discussing. Whatever else it may be, RDF is a formal language, just like any other formal language, including (loosely) mathematical notation. Now suppose I declare "let x be the integer 3". Will anyone think it important to draw a distinction between "the integer and what x identifies"? Or take this sentence from the draft (the same one that Ian Davis found troublesome): "In this way, an HTTP request can route operations towards a named graph in an RDF dataset via its URI(s). However, in using URIs in this way, we are not directly identifying the RDF graphs but rather the networked RDF knowledge they represent." Compare: "in using x to refer to the integer 3 in this way, I am not directly identifying the integer but rather the mathematical knowledge it represents." You'd be laughed out of town. In other words, the distinction between the formalism (i.e. mathematical object) and the "knowledge it represents" is, frankly, not only pointless but harmful insofar as it implies some kind of special meaning that is just not there. There is nothing special about RDF. RDF is about graphs, not knowledge; and a graph is a mathematical object, just like an integer or an algebra. Introducing an additional level of semantics - the meaning of the mathematical object - just adds confusion and complexity, without any benefits that I can see. An IRI that is used to name an RDF graph refers to that graph, just like the symbol "3" refers to the integer so-designated (and not the "knowledge" that is represented by the integer that is serialized by the symbol etc. etc.) To be honest I think the problem goes back to the use of model theory to provide semantics. Just because one can construct a model-theoretic semantics for a language does not mean one should. In fact virtually all formal languages - including in particular the notations of working mathematicians - get along just fine without formal semantics. Godel's theorems did not bring mathematics to a halt. The only programming language I can think of with a formal semantics is Standard ML, and my guess is that nobody bothers to read the definition. It just isn't necessary and it's hard to read. The Z Specification language provides a formal semantics for ordinary ZF notation, and it's very well done, but it hasn't exactly taken the world by storm. So the fact that we can provide a model-theoretic semantics for graph theory is not very useful, at least not for this kind of document. I have several introductory texts on graph theory on my bookshelf; not one of them contains even a hint of formal semantics, but they're all perfectly understandable. It's enough to know (or be convinced) that one *could* construct a formal semantics. The point being that introducing the machinery of model theory in documents about how the language works is not helpful for most readers. Having poured out part of my complaint about "RDF knowledge" (I could go on) I hasten to add that your text does touch on the critical issue, albeit only in passing. That is the issue of open-world semantics. This is an area where I think a minor terminological innovation may make sense. I propose the term "open graph" as a means of capturing the nature of RDF graphs. The motivation comes from ordinary mathematics; e.g. the open interval (0..1) (alternative notation: ]0,1[ ) defined as 0 < x < 1, and similar uses of the terms "open" and "closed" in topology etc. The key idea here is that you cannot write down a finite extensional representation of such mathematical objects; you can only write down approximations. Or think of irrationals defined in terms of bounded open sets; you can only approximate the square root of 2. This suggests an innovative way of looking at open-world RDF graphs. Given a suitable definition of "open graph" on the analogy of open interval or open ball, every RDF graph is open; that is, infinite. The graph you retrieve from a web server is only a finite approximation. (I omit the graph v. representation distinction on grounds that it is obvious.) One could go further, and declare that every concrete RDF graph is an approximation of the One True Graph, in which everything is connected to everything. Or one could say that the graph represented in your (finite) computer, now, is a closed (finite) RDF graph that is an approximation of (or "embedded in") an open RDF graph. This approach provides a simple, concise set of concepts and terminology with clear mathematical underpinnings; with a little work it could be used to provide an RDF semantics that is clear to everybody and shorn of the complexities of model theory and the hand-wavery of "RDF knowledge". Sincerely, Gregg Reynolds
Received on Wednesday, 5 January 2011 14:16:26 UTC