Re: Change Proposal for HttpRange-14 from Pat Hayes on 2012-03-24 (public-lod@w3.org from March 2012)

From: Pat Hayes <phayes@ihmc.us>
Date: Sat, 24 Mar 2012 15:40:31 -0500
To: Dan Brickley <danbri@danbri.org>
Cc: Jonathan A Rees <rees@mumble.net>, Jeni Tennison <jeni@jenitennison.com>, public-lod community <public-lod@w3.org>, Leigh Dodds <leigh@ldodds.com>, Dave Reynolds <dave@epimorphics.com>, Ian Davis <me@iandavis.com>
Message-Id: <D4654800-BDCB-4C09-BACB-D97B90B4A03B@ihmc.us>
On Mar 24, 2012, at 5:28 AM, Dan Brickley wrote:

> On 23 March 2012 14:33, Pat Hayes <phayes@ihmc.us> wrote:
>> 
>> On Mar 23, 2012, at 8:52 AM, Jonathan A Rees wrote:
>> 
>>> I am a bit dismayed that nobody seems to be picking up on the point
>>> I've been hammering on (TimBL and others have also pointed it out),
>>> that, as shown by the Flickr and Jamendo examples, the real issue is
>>> not an IR/NIR type distinction, but rather a distinction in the
>>> *manner* in which a URI gets its meaning, via instantiation (of some
>>> generic IR) on the one hand, vs. description (of *any* resource,
>>> perhaps even an IR) on the other. The whole
>>> information-resource-as-type issue is a total red herring, perhaps the
>>> most destructive mistake made by the httpRange-14 resolution.
>> 
>> +1000. There is no need for anyone to even talk about "information resources". The important point about http-range-14, which unfortunately it itself does not make clear, is that the 200-level code is a signal that the URI *denotes* whatever it *accesses* via the HTTP internet architecture. We don't need to get into the metaphysics of HTTP in order to see that a book (say) can't be accessed by HTTP, so if you want to denote it (the book) with an IRI and stay in conformance with this rule, then you have to use something other than a 200-level response.
> 
> Setting aside http://www.fastcompany.com/1754259/amazon-declares-the-e-book-era-has-arrived
> ('ebooks' will soon just be 'books', just as 'email' became 'mail'),
> and slipping into general opinion here that's not particularly
> directed at Pat.
> 
> I assume you're emphasising the physical notion of book.

Yes, I work with people who collect, repair and cherish antique books, so my notion of "book" is highly physical. Books have *odor*. But we could obviously have used a different example, like my favorite pair of a galaxy and a sodium atom. 

> Perhaps
> 'person' is even more obviously physical (though heavily tattoo'd
> people have some commonaliities with books).

Harder to repair their bindings, though. 

> 
> The Web architecture that I first learned, was explained to me
> (HTTP-NG WG era) in terms familiar from the "Object Oriented" style of
> thinking about computing (and a minor religion at the time too).  The
> idea is that the Web interface is a kind of encapsulation. External
> parties don't get direct access to the insides, it's always mediated
> by HTTP GET and other requests.
> 
> Just as in Java, you an expose an object's data internals directly, or
> you get hide them behind getters and setters, same with Web content.
> So a Web site might encapsulate a coffee machine, teapot or toaster; a
> CSV file, SGML repository, perl script or whatever). That pattern
> allowed the Web to get very big, very fast; you could wrap it around
> anything.
> 
> In http://www.w3.org/TR/WD-HTTP-NG-interfaces/ we see a variant on
> this view described, in which the hidden innards of a Web object are
> constrained to be 'data'.
> "When we think of the Web today, the idea of a 'resource' comes to
> mind. In general, a resource is an Object that has some methods (e.g.
> in HTTP, Get Head and Post) that can be invoked on it. Objects may be
> stateful in that they have some sort of opaque 'native data' that
> influences their behavior. The nature of this native data is unknown
> to the outside, unless the object explicitly makes it known somehow. "
> (note, this is from the failed HTTP-NG initiative, not the
> HTTP/webarch we currently enjoy)
> 
> So on this thinking, "Dan's homepage" is an item of Web content, that
> is encapsulated inside the standard Web interface. It has http-based
> getters and (potentially) setters, so you can ask for the default
> bytestream rendering of it, or perhaps content-negotiate with
> different getter and get a PDF, or a version in another language.
> 
> But on this OO-style of thinking about Web content, you *never get the
> thing itself*. Only (possibly lossy, possibly on-the-fly generated)
> serializations of it.

You never "get" it, but you do make contact with it. The thing itself: your request gets to it, itself, and then it, itself,  emits something which is sent back to you. OK, the content of that send-back thingie never reveals the full truth about the nature of this thing behind the curtain. I never said it did. Still, it really does exist, and it really is accessed by the HTTP transaction. And it must therefore be the kind of thing that can partake in an HTTP transaction. And most things in heaven and earth, that we want to refer to using URIs, are not this kind of thing. 

> 
> The notion of 'serialization' (also familiar to many coders) doesn't
> get used much in discussing http-range-14, yes it seems to be very
> close to our concerns here.
> 
> Perhaps all the different public serializations of my homepage are so
> rich that they constitute full (potentially round-trippable)
> serializations of the secret internal state. Or perhaps they're all
> lossy, because enough internals are never actually sent out over the
> wire. The Web design (as I understand/understood) it means that you'll
> never 100% know what's "on the inside".

Fine. I dont want to know.

> My homepage might be generated
> by 1000 typing monkeys; or by pulling zeros and ones from filesystem,
> or composed from a bunch of SQL database lookups. It might be
> generated by different methods in 2010 to 2012; or from minute to
> minute. All of this is my private webmasterly business: as far as the
> rest of the world is concerned, it's all the same thing, ... my
> homepage.

Quite. And it, itself, is a real thing. Look, you are preaching to the choir here. I use logics with terms that refer to possible future worlds, imaginary possible entities, fictional characters, events that should never occur, etc... I have no trouble with entities whose insides I cannot see. 

> I can move the internals from filesystem-based to wordpress
> to mediawiki, and from provider to provider. I can choose to serve US
> IP addresses from a mediawiki in Boston, and Japanese IP addresses
> from a customised MoinMoin wiki in Tokyo. Why? That's my business! But
> it's still my homepage. And you - the outside world - don't get to
> know how it's made.
> 
> On that thinking, it might be sometimes useful to have clues as to
> whether sufficient of the secret internals of some Web page could be
> fully reconstituted from the content that was shared. Whether the HTML
> page I sent (e.g. by virtue of hidden rdfa data islands) also
> contained enough info that the PDF and French negotiable version of
> the page could be reconstituted. In other words, how complete a
> serialization it was of the 'full thing'. But in the general case, we
> have to assume it's lossy, because my homepage could acquire for e.g.
> an SVG format, or Bosnian language negotiable version of itself at any
> point in time. Maybe my homepage includes sections of content from
> database queries, ... but they're not very important (and not included
> in versions sent to ipads). Who knows or cares if that's an essential
> vs disposable part of it?
> 
> A much stronger thing to know about some bunch of bytes, is whether it
> was ever an authorised, official, "not messed with"
> rendering/serialization of a piece of Web content. SSL and PGP and so
> on can help with this.
> 
> If "Dan's homepage" is a Web page, it's very reasonable to say "this
> bunch of mimetyped, datastamped bytes I have here in my hand ... is
> that legitimate official rendering/representation/serialization of
> it?". But don't mistake the bytes-in-hand for the thing itself.

I agree, and I wasn't. I was referring to the thing itself, not the bunch of bytes. I think we have kind of agreed that URIs never refer to HTTP message payloads like byte streams. 

> They'll almost always be a shadow of the original, a frozen snapshot
> sent in response to some specific request at some point in time.
> 
> The OO style of thinking about Web GETs helps us remember this
> picture. You might be 'accessing' my homepage (essentially exchanging
> messages with it, or something that proxies it into the Web). But
> since you're not getting the thing itself, even in case of a homepage,
> ... why should an ebook, or a Web representation of a physical book,
> ... or of a 'net-connected toaster, or Web cam, .... why should those
> be modeled in a radically different way? "The thing itself' is a kind
> of convenient fiction.

No, wait. I can never "get at" the thing itself in the sense that I never can know that I have fully explored inside it, but I certainly can "get at" it in the sense that I can interact with it and refer to it. (I can "get at" you, and refer to you as an individual, without knowing how your liver and kidneys work.) It and I can have a real transaction, and have causal infliuences upon one another via the internet. And it is that latter sense of "get to" (or "being on" the Web) which matters, not the first; and it is this latter sense in which this thing differs from most things, such as dead Roman emperors, distant galaxies and human beings. 

> 
> We all know you can't send a person across the wires. But ... even
> with the most canonically information-oriented web content, a simple
> Web homepage, ... in Webarch terms you're not sending the homepage
> across the wire either. Just a message that (likely lossily) sends
> some representation of it.

Yes, but the key point is not what gets sent on the wire, but the fact that **the thing attached to the wire that took part in the transaction** is indeed attached to "wire" which eventually leads to my computer. THAT is what put it into a special category. Of course we can have lossy representations of galaxies (say) zapping around the Web, but for sure they weren't *sent* by a galaxy responding to an HTTP GET. Similarly for dead Roman emperors and the weather in Oaxala. 

> So often we slip into saying here "well, a
> Web page can be sent across the wire, whereas a person cannot be,
> obviously". And on some discussion, this retreats to "you can send a
> pretty full representation of a Web page across the wire, whereas with
> a person obviously you can send just a mere description".

This is exactly the kind of muddle that we get into by treating this terminology of "information resource" too seriously. As I believe I said at the beginning, it really does not matter how we characterize the nature of the things that take part in http transactions, or the kinds of representations that we are tossing around. But what is surely obvious, even to an OO programmer, is that most things in the past, future, imagination and even the actual physical universe are not, and never will be, causally involved in any kind of HTTP transaction. 

> 
> I don't believe we'll ever come up with a clear distinction between
> 'description' and 'representation', such that we can say "look, Dan's
> homepage, you get a proper representation of it across the wire, ...
> whereas a physical book, you're merely getting a description".
> 
> When you do an HTTP GET on the Postmodernism Generator, ...
> http://www.elsewhere.org/pomo/ ... the reponse you get, ... is that a
> representation, a description, or what? From the HTTP-as-OO
> perspective, ... it's a message, a rendering, .. a response.  It's
> computer generated, different every time, ...
> 
> Why should http://www.elsewhere.org/pomo/ be OK with a 200 response,
> and a 'book' not be? If we get a 200,
> where "200-level code is a signal that the URI *denotes* whatever it
> *accesses* "
> ... then the URI 'http://www.elsewhere.org/pomo/' denotes what
> exactly, on today's http-range-14 reading? the generator
> script/service?

Something like that, yes. I dont really care, other than whatever it is, it is physically attached to an internet device and capable of emitting byte streams. 

> Surely not the varying messages that are sent out;
> they're never the same twice. Maybe the hidden innards of the site
> that nobody sees?

You keep referrring to this thing. You call it a "site" or a "service" or a "script". Not my problem which is right: but that thing that YOU are refrring to here is what I am also referring to, OK?

> 
> Would it be so different if a real postmodernist were wired up to the
> Web server and wrote a custom response each time?

Not at all. But most people aren't wired up in this way; and no dead people or galaxies or sodium atoms (or numbers or insects or classes or properties or average rainfalls or...) will ever be. 

> The outside world
> needn't know how an HTTP response is generated.

Indeed, and neither do I, nor do I care. 

> 
> I think we worry too much about corner cases and don't even fairly
> describe pretty basic cases: homepages that can change in
> format/language, ... Web pages that are generated by script and are
> never the same twice. Never mind people with ebooks tattooed on them
> or Web servers with people inside. I don't see a healthy description
> of even the basic machinery of the Web, let alone the fancy stuff.


I feel like someone wanting to drive a car and you keep telling me I neednt know about spark plugs. I KNOW I dont need to know about them, but I do need a steering wheel. 

Pat

> 
> Dan
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Saturday, 24 March 2012 20:41:09 UTC