- From: Dan Brickley <danbri@danbri.org>
- Date: Sat, 24 Mar 2012 10:28:47 +0000
- To: Pat Hayes <phayes@ihmc.us>
- Cc: Jonathan A Rees <rees@mumble.net>, Jeni Tennison <jeni@jenitennison.com>, public-lod community <public-lod@w3.org>, Leigh Dodds <leigh@ldodds.com>, Dave Reynolds <dave@epimorphics.com>, Ian Davis <me@iandavis.com>
On 23 March 2012 14:33, Pat Hayes <phayes@ihmc.us> wrote: > > On Mar 23, 2012, at 8:52 AM, Jonathan A Rees wrote: > >> I am a bit dismayed that nobody seems to be picking up on the point >> I've been hammering on (TimBL and others have also pointed it out), >> that, as shown by the Flickr and Jamendo examples, the real issue is >> not an IR/NIR type distinction, but rather a distinction in the >> *manner* in which a URI gets its meaning, via instantiation (of some >> generic IR) on the one hand, vs. description (of *any* resource, >> perhaps even an IR) on the other. The whole >> information-resource-as-type issue is a total red herring, perhaps the >> most destructive mistake made by the httpRange-14 resolution. > > +1000. There is no need for anyone to even talk about "information resources". The important point about http-range-14, which unfortunately it itself does not make clear, is that the 200-level code is a signal that the URI *denotes* whatever it *accesses* via the HTTP internet architecture. We don't need to get into the metaphysics of HTTP in order to see that a book (say) can't be accessed by HTTP, so if you want to denote it (the book) with an IRI and stay in conformance with this rule, then you have to use something other than a 200-level response. Setting aside http://www.fastcompany.com/1754259/amazon-declares-the-e-book-era-has-arrived ('ebooks' will soon just be 'books', just as 'email' became 'mail'), and slipping into general opinion here that's not particularly directed at Pat. I assume you're emphasising the physical notion of book. Perhaps 'person' is even more obviously physical (though heavily tattoo'd people have some commonaliities with books). The Web architecture that I first learned, was explained to me (HTTP-NG WG era) in terms familiar from the "Object Oriented" style of thinking about computing (and a minor religion at the time too). The idea is that the Web interface is a kind of encapsulation. External parties don't get direct access to the insides, it's always mediated by HTTP GET and other requests. Just as in Java, you an expose an object's data internals directly, or you get hide them behind getters and setters, same with Web content. So a Web site might encapsulate a coffee machine, teapot or toaster; a CSV file, SGML repository, perl script or whatever). That pattern allowed the Web to get very big, very fast; you could wrap it around anything. In http://www.w3.org/TR/WD-HTTP-NG-interfaces/ we see a variant on this view described, in which the hidden innards of a Web object are constrained to be 'data'. "When we think of the Web today, the idea of a 'resource' comes to mind. In general, a resource is an Object that has some methods (e.g. in HTTP, Get Head and Post) that can be invoked on it. Objects may be stateful in that they have some sort of opaque 'native data' that influences their behavior. The nature of this native data is unknown to the outside, unless the object explicitly makes it known somehow. " (note, this is from the failed HTTP-NG initiative, not the HTTP/webarch we currently enjoy) So on this thinking, "Dan's homepage" is an item of Web content, that is encapsulated inside the standard Web interface. It has http-based getters and (potentially) setters, so you can ask for the default bytestream rendering of it, or perhaps content-negotiate with different getter and get a PDF, or a version in another language. But on this OO-style of thinking about Web content, you *never get the thing itself*. Only (possibly lossy, possibly on-the-fly generated) serializations of it. The notion of 'serialization' (also familiar to many coders) doesn't get used much in discussing http-range-14, yes it seems to be very close to our concerns here. Perhaps all the different public serializations of my homepage are so rich that they constitute full (potentially round-trippable) serializations of the secret internal state. Or perhaps they're all lossy, because enough internals are never actually sent out over the wire. The Web design (as I understand/understood) it means that you'll never 100% know what's "on the inside". My homepage might be generated by 1000 typing monkeys; or by pulling zeros and ones from filesystem, or composed from a bunch of SQL database lookups. It might be generated by different methods in 2010 to 2012; or from minute to minute. All of this is my private webmasterly business: as far as the rest of the world is concerned, it's all the same thing, ... my homepage. I can move the internals from filesystem-based to wordpress to mediawiki, and from provider to provider. I can choose to serve US IP addresses from a mediawiki in Boston, and Japanese IP addresses from a customised MoinMoin wiki in Tokyo. Why? That's my business! But it's still my homepage. And you - the outside world - don't get to know how it's made. On that thinking, it might be sometimes useful to have clues as to whether sufficient of the secret internals of some Web page could be fully reconstituted from the content that was shared. Whether the HTML page I sent (e.g. by virtue of hidden rdfa data islands) also contained enough info that the PDF and French negotiable version of the page could be reconstituted. In other words, how complete a serialization it was of the 'full thing'. But in the general case, we have to assume it's lossy, because my homepage could acquire for e.g. an SVG format, or Bosnian language negotiable version of itself at any point in time. Maybe my homepage includes sections of content from database queries, ... but they're not very important (and not included in versions sent to ipads). Who knows or cares if that's an essential vs disposable part of it? A much stronger thing to know about some bunch of bytes, is whether it was ever an authorised, official, "not messed with" rendering/serialization of a piece of Web content. SSL and PGP and so on can help with this. If "Dan's homepage" is a Web page, it's very reasonable to say "this bunch of mimetyped, datastamped bytes I have here in my hand ... is that legitimate official rendering/representation/serialization of it?". But don't mistake the bytes-in-hand for the thing itself. They'll almost always be a shadow of the original, a frozen snapshot sent in response to some specific request at some point in time. The OO style of thinking about Web GETs helps us remember this picture. You might be 'accessing' my homepage (essentially exchanging messages with it, or something that proxies it into the Web). But since you're not getting the thing itself, even in case of a homepage, ... why should an ebook, or a Web representation of a physical book, ... or of a 'net-connected toaster, or Web cam, .... why should those be modeled in a radically different way? "The thing itself' is a kind of convenient fiction. We all know you can't send a person across the wires. But ... even with the most canonically information-oriented web content, a simple Web homepage, ... in Webarch terms you're not sending the homepage across the wire either. Just a message that (likely lossily) sends some representation of it. So often we slip into saying here "well, a Web page can be sent across the wire, whereas a person cannot be, obviously". And on some discussion, this retreats to "you can send a pretty full representation of a Web page across the wire, whereas with a person obviously you can send just a mere description". I don't believe we'll ever come up with a clear distinction between 'description' and 'representation', such that we can say "look, Dan's homepage, you get a proper representation of it across the wire, ... whereas a physical book, you're merely getting a description". When you do an HTTP GET on the Postmodernism Generator, ... http://www.elsewhere.org/pomo/ ... the reponse you get, ... is that a representation, a description, or what? From the HTTP-as-OO perspective, ... it's a message, a rendering, .. a response. It's computer generated, different every time, ... Why should http://www.elsewhere.org/pomo/ be OK with a 200 response, and a 'book' not be? If we get a 200, where "200-level code is a signal that the URI *denotes* whatever it *accesses* " ... then the URI 'http://www.elsewhere.org/pomo/' denotes what exactly, on today's http-range-14 reading? the generator script/service? Surely not the varying messages that are sent out; they're never the same twice. Maybe the hidden innards of the site that nobody sees? Would it be so different if a real postmodernist were wired up to the Web server and wrote a custom response each time? The outside world needn't know how an HTTP response is generated. I think we worry too much about corner cases and don't even fairly describe pretty basic cases: homepages that can change in format/language, ... Web pages that are generated by script and are never the same twice. Never mind people with ebooks tattooed on them or Web servers with people inside. I don't see a healthy description of even the basic machinery of the Web, let alone the fancy stuff. Dan
Received on Saturday, 24 March 2012 10:29:16 UTC