Re: Change Proposal for HttpRange-14 from Dan Brickley on 2012-03-24 (public-lod@w3.org from March 2012)

From: Dan Brickley <danbri@danbri.org>
Date: Sat, 24 Mar 2012 10:28:47 +0000
To: Pat Hayes <phayes@ihmc.us>
Cc: Jonathan A Rees <rees@mumble.net>, Jeni Tennison <jeni@jenitennison.com>, public-lod community <public-lod@w3.org>, Leigh Dodds <leigh@ldodds.com>, Dave Reynolds <dave@epimorphics.com>, Ian Davis <me@iandavis.com>
Message-ID: <CAFfrAFr_c1_5v5140=KLKk6oEnL4SLpLOH7TwqSBdOz7SsAU6A@mail.gmail.com>
On 23 March 2012 14:33, Pat Hayes <phayes@ihmc.us> wrote:
>
> On Mar 23, 2012, at 8:52 AM, Jonathan A Rees wrote:
>
>> I am a bit dismayed that nobody seems to be picking up on the point
>> I've been hammering on (TimBL and others have also pointed it out),
>> that, as shown by the Flickr and Jamendo examples, the real issue is
>> not an IR/NIR type distinction, but rather a distinction in the
>> *manner* in which a URI gets its meaning, via instantiation (of some
>> generic IR) on the one hand, vs. description (of *any* resource,
>> perhaps even an IR) on the other. The whole
>> information-resource-as-type issue is a total red herring, perhaps the
>> most destructive mistake made by the httpRange-14 resolution.
>
> +1000. There is no need for anyone to even talk about "information resources". The important point about http-range-14, which unfortunately it itself does not make clear, is that the 200-level code is a signal that the URI *denotes* whatever it *accesses* via the HTTP internet architecture. We don't need to get into the metaphysics of HTTP in order to see that a book (say) can't be accessed by HTTP, so if you want to denote it (the book) with an IRI and stay in conformance with this rule, then you have to use something other than a 200-level response.

Setting aside http://www.fastcompany.com/1754259/amazon-declares-the-e-book-era-has-arrived
('ebooks' will soon just be 'books', just as 'email' became 'mail'),
and slipping into general opinion here that's not particularly
directed at Pat.

I assume you're emphasising the physical notion of book. Perhaps
'person' is even more obviously physical (though heavily tattoo'd
people have some commonaliities with books).

The Web architecture that I first learned, was explained to me
(HTTP-NG WG era) in terms familiar from the "Object Oriented" style of
thinking about computing (and a minor religion at the time too).  The
idea is that the Web interface is a kind of encapsulation. External
parties don't get direct access to the insides, it's always mediated
by HTTP GET and other requests.

Just as in Java, you an expose an object's data internals directly, or
you get hide them behind getters and setters, same with Web content.
So a Web site might encapsulate a coffee machine, teapot or toaster; a
CSV file, SGML repository, perl script or whatever). That pattern
allowed the Web to get very big, very fast; you could wrap it around
anything.

In http://www.w3.org/TR/WD-HTTP-NG-interfaces/ we see a variant on
this view described, in which the hidden innards of a Web object are
constrained to be 'data'.
"When we think of the Web today, the idea of a 'resource' comes to
mind. In general, a resource is an Object that has some methods (e.g.
in HTTP, Get Head and Post) that can be invoked on it. Objects may be
stateful in that they have some sort of opaque 'native data' that
influences their behavior. The nature of this native data is unknown
to the outside, unless the object explicitly makes it known somehow. "
(note, this is from the failed HTTP-NG initiative, not the
HTTP/webarch we currently enjoy)

So on this thinking, "Dan's homepage" is an item of Web content, that
is encapsulated inside the standard Web interface. It has http-based
getters and (potentially) setters, so you can ask for the default
bytestream rendering of it, or perhaps content-negotiate with
different getter and get a PDF, or a version in another language.

But on this OO-style of thinking about Web content, you *never get the
thing itself*. Only (possibly lossy, possibly on-the-fly generated)
serializations of it.

The notion of 'serialization' (also familiar to many coders) doesn't
get used much in discussing http-range-14, yes it seems to be very
close to our concerns here.

Perhaps all the different public serializations of my homepage are so
rich that they constitute full (potentially round-trippable)
serializations of the secret internal state. Or perhaps they're all
lossy, because enough internals are never actually sent out over the
wire. The Web design (as I understand/understood) it means that you'll
never 100% know what's "on the inside". My homepage might be generated
by 1000 typing monkeys; or by pulling zeros and ones from filesystem,
or composed from a bunch of SQL database lookups. It might be
generated by different methods in 2010 to 2012; or from minute to
minute. All of this is my private webmasterly business: as far as the
rest of the world is concerned, it's all the same thing, ... my
homepage. I can move the internals from filesystem-based to wordpress
to mediawiki, and from provider to provider. I can choose to serve US
IP addresses from a mediawiki in Boston, and Japanese IP addresses
from a customised MoinMoin wiki in Tokyo. Why? That's my business! But
it's still my homepage. And you - the outside world - don't get to
know how it's made.

On that thinking, it might be sometimes useful to have clues as to
whether sufficient of the secret internals of some Web page could be
fully reconstituted from the content that was shared. Whether the HTML
page I sent (e.g. by virtue of hidden rdfa data islands) also
contained enough info that the PDF and French negotiable version of
the page could be reconstituted. In other words, how complete a
serialization it was of the 'full thing'. But in the general case, we
have to assume it's lossy, because my homepage could acquire for e.g.
an SVG format, or Bosnian language negotiable version of itself at any
point in time. Maybe my homepage includes sections of content from
database queries, ... but they're not very important (and not included
in versions sent to ipads). Who knows or cares if that's an essential
vs disposable part of it?

A much stronger thing to know about some bunch of bytes, is whether it
was ever an authorised, official, "not messed with"
rendering/serialization of a piece of Web content. SSL and PGP and so
on can help with this.

If "Dan's homepage" is a Web page, it's very reasonable to say "this
bunch of mimetyped, datastamped bytes I have here in my hand ... is
that legitimate official rendering/representation/serialization of
it?". But don't mistake the bytes-in-hand for the thing itself.
They'll almost always be a shadow of the original, a frozen snapshot
sent in response to some specific request at some point in time.

The OO style of thinking about Web GETs helps us remember this
picture. You might be 'accessing' my homepage (essentially exchanging
messages with it, or something that proxies it into the Web). But
since you're not getting the thing itself, even in case of a homepage,
... why should an ebook, or a Web representation of a physical book,
... or of a 'net-connected toaster, or Web cam, .... why should those
be modeled in a radically different way? "The thing itself' is a kind
of convenient fiction.

We all know you can't send a person across the wires. But ... even
with the most canonically information-oriented web content, a simple
Web homepage, ... in Webarch terms you're not sending the homepage
across the wire either. Just a message that (likely lossily) sends
some representation of it. So often we slip into saying here "well, a
Web page can be sent across the wire, whereas a person cannot be,
obviously". And on some discussion, this retreats to "you can send a
pretty full representation of a Web page across the wire, whereas with
a person obviously you can send just a mere description".

I don't believe we'll ever come up with a clear distinction between
'description' and 'representation', such that we can say "look, Dan's
homepage, you get a proper representation of it across the wire, ...
whereas a physical book, you're merely getting a description".

When you do an HTTP GET on the Postmodernism Generator, ...
http://www.elsewhere.org/pomo/ ... the reponse you get, ... is that a
representation, a description, or what? From the HTTP-as-OO
perspective, ... it's a message, a rendering, .. a response.  It's
computer generated, different every time, ...

Why should http://www.elsewhere.org/pomo/ be OK with a 200 response,
and a 'book' not be? If we get a 200,
where "200-level code is a signal that the URI *denotes* whatever it
*accesses* "
... then the URI 'http://www.elsewhere.org/pomo/' denotes what
exactly, on today's http-range-14 reading? the generator
script/service? Surely not the varying messages that are sent out;
they're never the same twice. Maybe the hidden innards of the site
that nobody sees?

Would it be so different if a real postmodernist were wired up to the
Web server and wrote a custom response each time? The outside world
needn't know how an HTTP response is generated.

I think we worry too much about corner cases and don't even fairly
describe pretty basic cases: homepages that can change in
format/language, ... Web pages that are generated by script and are
never the same twice. Never mind people with ebooks tattooed on them
or Web servers with people inside. I don't see a healthy description
of even the basic machinery of the Web, let alone the fancy stuff.

Dan
Received on Saturday, 24 March 2012 10:29:16 UTC