Re: [httpRange-14] What is an Information Resource?

On 2007-12-18, Noah Mendelsohn writes:
> Pat Hayes writes:
>> No, an HTML *page* is a series of characters.
> Crucially, when transmitted as a representation using HTTP, it is not just
> a series of characters.  It is a series of octets >and< a Content-type.
> That content type licenses the receiver to interpret the octets as much
> more than characters, but as representing a tree of elements with some
> semantics.  So, if I ask, "Do HTML pages with the following bodies have
> the same number of paragraphs?":
> Example 1:
> <html><body><p>para</p><p>para</p></body></html>
> Example 2:
> <html><body><P>para</P><P>para</P></body></html>
> The answer is "yes", even though the two documents are not comprised of
> the same characters (not the uppercase "P"s in the 2nd).  The pertinent
> specifications, including the specification for the text/html media types
> allow, us to determine that the bodies of each of these documents both
> consist of two "paragraphs", as defined in HTML.  The same paragraph
> abstraction can be conveyed in at least two encodings: <P> and <p>.
> Shannon's theory starts with the assumption that a receiver knows that a
> sender is in one of N possible states.

Isn't this assumption false with respect to the web? In general, I have no 
idea what states the resources I visit may be in. Any given resource may be 
a novel, a song, a picture, an email message, a movie,  a blog post, etc., 
etc. How could I possibly know the N possible states that 8 billion 
resources can be in? I think we must strike this assumption. If you agree, 
can you restate your argument without it? Does Shannon's theory apply when a 
receiver *does not* know what possible states the sender may be in?

John Black

> Having received a correctly
> encoded message, and agreeing with the sender on the encoding, the
> receiver can determine which of the states that sender is in.  That does
> not mean the "state" has been transmitted in any other sense.
> In the case of RDF and media type application/rdf+xml, we can assume that
> the state of the sender is some RDF graph.  The sender transmits a series
> of characters, along withe Content-type indicator application/rdf+xml.
> With this, a receiver can determine which RDF graph the sender had.  In
> that particular sense, the graph has been transmitted.
> The only difference in the case where the source resource happens to
> itself to be a document that is known to be a sequence of characters, some
> of the possible representation encodings are very straightforward.  As far
> as I'm aware, though, there is even in those cases no requirement that the
> encoding be straighforward.  If for some reason I invented a bizarre media
> type that sent all the even numbered characters first, and then followed
> with all the odd, I don't think that would break Web archtecture at all.
> So, I think that RDF graphs, numbers, and HTML document trees are all
> information resources.  The particular characters, such as whitespace,
> comprising the representation of a resource on the wire are only
> significant if the media type says they are.
> Noah
> --------------------------------------
> Noah Mendelsohn
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------

Received on Tuesday, 8 January 2008 01:48:07 UTC