Re: Change Proposal for HttpRange-14 from Niklas Lindström on 2012-03-24 (public-lod@w3.org from March 2012)

From: Niklas Lindström <lindstream@gmail.com>
Date: Sat, 24 Mar 2012 15:15:21 +0100
To: Dan Brickley <danbri@danbri.org>
Cc: Pat Hayes <phayes@ihmc.us>, Jonathan A Rees <rees@mumble.net>, Jeni Tennison <jeni@jenitennison.com>, public-lod community <public-lod@w3.org>, Leigh Dodds <leigh@ldodds.com>, Dave Reynolds <dave@epimorphics.com>, Ian Davis <me@iandavis.com>
Message-ID: <CADjV5jeh+knxP1drQDx1CV712m3LWAjKYY-bza+9XuuRL0kmuA@mail.gmail.com>
Hi Dan,

Brilliantly expressed! You pointed out a lot that's been fleetingly in
my mind regarding this issue. Also, I think you have pointed out the
core of the issue. What is the difference between a representation and
a description? Is the former is something intrinsic to the thing
identified, whereas the latter is a quoted form, an observation, or
similar?

The question of a "thing itself" and its description is indeed hard
and multifaceted. (Consider Magritte's "The Treachery of Images" [1],
and ask yourself what the actual pipe being referred to is.)

What's the fundamental difference, apart from fidelity and amount, in
the information you get when your retina collects photons and forms an
image of me in your mind when you meet me, from what you can gather
from a collection of photographs of me, and works by me, which you may
gather from getting data about me from my identity address (i.e. some
information retrievable via our shared lookup system the web (external
to our own minds), addressed by a symbol of me, my IRI)? The shared
notion of "me" is hard to definitively set down, and in either case
seems intrinsically linked to information.

We should not base the architecture of data delivery on assumptions
about reality. What we should do is emphasize the importance of being
clear and precise when we describe things. That is, to focus on "a
difference which makes a difference" (to quote Gregory Bateson [2]).

Of course there are many valid and important needs for distinguishing
between me and a description of me. For instance to attribute
different creation times, or a license to the latter. If I publish
information at <http://neverspace.net/id> but claim it to be an
identifier of *me*, I can still identify distinct documents
representing this information, like <http://neverspace.net/id.html> or
<http://neverspace.net/id.json>. Granted, if I do this, I may not
clearly distinguish between my *self* and information about this self,
possible to represent in different formats. But I might not subscribe
to such a philosophical distinction. I can very well claim that my
biological body is also just a physical representation of me. The
"vessel of my soul", or an "instance in time of my unique pattern of
energy", or whatever. (And as you Dan pointed out, these bodies are
indeed canvases able to represent other information, like in tattoos.)
Admittedly, by conflating like this, it might be implied that I make
no distinction between my body and the ceramic, aluminium, copper,
gold and silicon carrying the bits of my identify document. Or that I
just don't see the need to do this in this context. As said, I can
mint separate IRIs for these entities if I need to make statements
about them.

Nor might I necessarily consider it theoretically correct to
conceptually group various representations together as manifestations
of the same, intrinsic information but still distinct from a conceived
primary subject matter. It depends on my needs for expression. I
should be aware of the limitations the various choices of expression
impose, and avoid conflations as far as is reasonable. But if there
are any means of making distinctions as late as possible, when needs
arise, I think we should ease up on the pressure on publishers to make
these deep decisions at initial deployment time. This is why I agree
with the change proposal.

That is, I decide myself as a publisher of data what kind of
descriptions it expresses, and I decide what kinds of differences are
considered important (those which will, in practise, make a
difference). I do not expect a HTTP status code to be the final,
meaningful arbitrator in these matters. It ultimately hinges on what
information is encoded in the response data. That's what's going to be
interpreted.

For the record, I *do* believe in an important distinction between the
information captured in my identity page and me. (And I *have* minted
<http://neverspace.net/id#self> to identify the latter.) In any case,
regardless of how you publish, make sure that statements in the data,
especially those which use references to (or relative to) the base,
actually express what you intend. If you serve the *same* data from
</id> and </id.html> but consider them to be separate in nature (the
former a person, the latter a document), ensure that you use the
intended subject or object in the various triples (in RDFa by using
@about or @resource).

Finally, for those with a taste for theoretical physics, I'd recommend
to consider the holographic principle [3]. It has been suggested that
scientists may "regard the physical world as made of information, with
energy and matter as incidentals." This may or may not have bearing on
the veracity of the "non-information resource" concept. :)

Best regards,
Niklas

[1]: http://en.wikipedia.org/wiki/The_Treachery_of_Images
[2]: http://en.wikipedia.org/wiki/Gregory_Bateson#Other_terms_used_by_Bateson
[3]: http://en.wikipedia.org/wiki/Holographic_principle#High_level_summary



On Sat, Mar 24, 2012 at 11:28 AM, Dan Brickley <danbri@danbri.org> wrote:
> On 23 March 2012 14:33, Pat Hayes <phayes@ihmc.us> wrote:
>>
>> On Mar 23, 2012, at 8:52 AM, Jonathan A Rees wrote:
>>
>>> I am a bit dismayed that nobody seems to be picking up on the point
>>> I've been hammering on (TimBL and others have also pointed it out),
>>> that, as shown by the Flickr and Jamendo examples, the real issue is
>>> not an IR/NIR type distinction, but rather a distinction in the
>>> *manner* in which a URI gets its meaning, via instantiation (of some
>>> generic IR) on the one hand, vs. description (of *any* resource,
>>> perhaps even an IR) on the other. The whole
>>> information-resource-as-type issue is a total red herring, perhaps the
>>> most destructive mistake made by the httpRange-14 resolution.
>>
>> +1000. There is no need for anyone to even talk about "information resources". The important point about http-range-14, which unfortunately it itself does not make clear, is that the 200-level code is a signal that the URI *denotes* whatever it *accesses* via the HTTP internet architecture. We don't need to get into the metaphysics of HTTP in order to see that a book (say) can't be accessed by HTTP, so if you want to denote it (the book) with an IRI and stay in conformance with this rule, then you have to use something other than a 200-level response.
>
> Setting aside http://www.fastcompany.com/1754259/amazon-declares-the-e-book-era-has-arrived
> ('ebooks' will soon just be 'books', just as 'email' became 'mail'),
> and slipping into general opinion here that's not particularly
> directed at Pat.
>
> I assume you're emphasising the physical notion of book. Perhaps
> 'person' is even more obviously physical (though heavily tattoo'd
> people have some commonaliities with books).
>
> The Web architecture that I first learned, was explained to me
> (HTTP-NG WG era) in terms familiar from the "Object Oriented" style of
> thinking about computing (and a minor religion at the time too).  The
> idea is that the Web interface is a kind of encapsulation. External
> parties don't get direct access to the insides, it's always mediated
> by HTTP GET and other requests.
>
> Just as in Java, you an expose an object's data internals directly, or
> you get hide them behind getters and setters, same with Web content.
> So a Web site might encapsulate a coffee machine, teapot or toaster; a
> CSV file, SGML repository, perl script or whatever). That pattern
> allowed the Web to get very big, very fast; you could wrap it around
> anything.
>
> In http://www.w3.org/TR/WD-HTTP-NG-interfaces/ we see a variant on
> this view described, in which the hidden innards of a Web object are
> constrained to be 'data'.
> "When we think of the Web today, the idea of a 'resource' comes to
> mind. In general, a resource is an Object that has some methods (e.g.
> in HTTP, Get Head and Post) that can be invoked on it. Objects may be
> stateful in that they have some sort of opaque 'native data' that
> influences their behavior. The nature of this native data is unknown
> to the outside, unless the object explicitly makes it known somehow. "
> (note, this is from the failed HTTP-NG initiative, not the
> HTTP/webarch we currently enjoy)
>
> So on this thinking, "Dan's homepage" is an item of Web content, that
> is encapsulated inside the standard Web interface. It has http-based
> getters and (potentially) setters, so you can ask for the default
> bytestream rendering of it, or perhaps content-negotiate with
> different getter and get a PDF, or a version in another language.
>
> But on this OO-style of thinking about Web content, you *never get the
> thing itself*. Only (possibly lossy, possibly on-the-fly generated)
> serializations of it.
>
> The notion of 'serialization' (also familiar to many coders) doesn't
> get used much in discussing http-range-14, yes it seems to be very
> close to our concerns here.
>
> Perhaps all the different public serializations of my homepage are so
> rich that they constitute full (potentially round-trippable)
> serializations of the secret internal state. Or perhaps they're all
> lossy, because enough internals are never actually sent out over the
> wire. The Web design (as I understand/understood) it means that you'll
> never 100% know what's "on the inside". My homepage might be generated
> by 1000 typing monkeys; or by pulling zeros and ones from filesystem,
> or composed from a bunch of SQL database lookups. It might be
> generated by different methods in 2010 to 2012; or from minute to
> minute. All of this is my private webmasterly business: as far as the
> rest of the world is concerned, it's all the same thing, ... my
> homepage. I can move the internals from filesystem-based to wordpress
> to mediawiki, and from provider to provider. I can choose to serve US
> IP addresses from a mediawiki in Boston, and Japanese IP addresses
> from a customised MoinMoin wiki in Tokyo. Why? That's my business! But
> it's still my homepage. And you - the outside world - don't get to
> know how it's made.
>
> On that thinking, it might be sometimes useful to have clues as to
> whether sufficient of the secret internals of some Web page could be
> fully reconstituted from the content that was shared. Whether the HTML
> page I sent (e.g. by virtue of hidden rdfa data islands) also
> contained enough info that the PDF and French negotiable version of
> the page could be reconstituted. In other words, how complete a
> serialization it was of the 'full thing'. But in the general case, we
> have to assume it's lossy, because my homepage could acquire for e.g.
> an SVG format, or Bosnian language negotiable version of itself at any
> point in time. Maybe my homepage includes sections of content from
> database queries, ... but they're not very important (and not included
> in versions sent to ipads). Who knows or cares if that's an essential
> vs disposable part of it?
>
> A much stronger thing to know about some bunch of bytes, is whether it
> was ever an authorised, official, "not messed with"
> rendering/serialization of a piece of Web content. SSL and PGP and so
> on can help with this.
>
> If "Dan's homepage" is a Web page, it's very reasonable to say "this
> bunch of mimetyped, datastamped bytes I have here in my hand ... is
> that legitimate official rendering/representation/serialization of
> it?". But don't mistake the bytes-in-hand for the thing itself.
> They'll almost always be a shadow of the original, a frozen snapshot
> sent in response to some specific request at some point in time.
>
> The OO style of thinking about Web GETs helps us remember this
> picture. You might be 'accessing' my homepage (essentially exchanging
> messages with it, or something that proxies it into the Web). But
> since you're not getting the thing itself, even in case of a homepage,
> ... why should an ebook, or a Web representation of a physical book,
> ... or of a 'net-connected toaster, or Web cam, .... why should those
> be modeled in a radically different way? "The thing itself' is a kind
> of convenient fiction.
>
> We all know you can't send a person across the wires. But ... even
> with the most canonically information-oriented web content, a simple
> Web homepage, ... in Webarch terms you're not sending the homepage
> across the wire either. Just a message that (likely lossily) sends
> some representation of it. So often we slip into saying here "well, a
> Web page can be sent across the wire, whereas a person cannot be,
> obviously". And on some discussion, this retreats to "you can send a
> pretty full representation of a Web page across the wire, whereas with
> a person obviously you can send just a mere description".
>
> I don't believe we'll ever come up with a clear distinction between
> 'description' and 'representation', such that we can say "look, Dan's
> homepage, you get a proper representation of it across the wire, ...
> whereas a physical book, you're merely getting a description".
>
> When you do an HTTP GET on the Postmodernism Generator, ...
> http://www.elsewhere.org/pomo/ ... the reponse you get, ... is that a
> representation, a description, or what? From the HTTP-as-OO
> perspective, ... it's a message, a rendering, .. a response.  It's
> computer generated, different every time, ...
>
> Why should http://www.elsewhere.org/pomo/ be OK with a 200 response,
> and a 'book' not be? If we get a 200,
> where "200-level code is a signal that the URI *denotes* whatever it
> *accesses* "
> ... then the URI 'http://www.elsewhere.org/pomo/' denotes what
> exactly, on today's http-range-14 reading? the generator
> script/service? Surely not the varying messages that are sent out;
> they're never the same twice. Maybe the hidden innards of the site
> that nobody sees?
>
> Would it be so different if a real postmodernist were wired up to the
> Web server and wrote a custom response each time? The outside world
> needn't know how an HTTP response is generated.
>
> I think we worry too much about corner cases and don't even fairly
> describe pretty basic cases: homepages that can change in
> format/language, ... Web pages that are generated by script and are
> never the same twice. Never mind people with ebooks tattooed on them
> or Web servers with people inside. I don't see a healthy description
> of even the basic machinery of the Web, let alone the fancy stuff.
>
> Dan
>
Received on Saturday, 24 March 2012 14:16:22 UTC