Telcon of the W3C study group on semantics of HTTP -- 11 Dec 2007

We are discussing the "brainstorm page" at http://esw.w3.org/topic/AwwswTopicsBrainstormPage

What can we infer from a 200 response

<scribe> scribenick: dbooth

Noah: What occured to me is that we've gone to some trouble to invent the notion of information resource. It's clear that the sense of it is that they are different because a high fidelity rep can be transmitted in a computer message. Then we say those are the ones you should use a 200 with, clearly implying that there is no rep of something like me the person. It's clearly implied that if I have an info resource i should try to send hi fi reps except in ccases i need to send a media type that's lossy. All I was pointing out is that i don't think there's anything normative that says that. So we're putting a lot of effort into the case where it's a person. But we haven't set the bar high for info resources either. what's good practice? should the webarch be updated to say "if you have an IR you shoudl send enough info to reconstruct the state".

<scribe> scribenick: noah

AR: I've heard Richard Cyganiak saying "the representation is whatever the owner of the resource wants it to be". I hear people saying the representation is ephemeral, only exists on the wire.

DB: So, I'm not sure 200 conveys anything other than it's "not not an information resource"

JR: I find the definition of information resource (IR) troubling, and have some sympathy with those who say that the architecture shouldn't be too prescriptive

HT: Thinking about conneg, which is related...there is an old move in formal semantics which is "I can't tell you what something is, but I can tell you when two things are the same." The bar for conneg is not external measure of similarity, but URI owner warrants similar for his or her purposes.
... We can still give guideilines as to what people SHOULD do. There will still not be an objective measure you can hold up.

<jar> we lost noah again

<Zakim> ht, you wanted to put the minter on the spot

<Zakim> dbooth, you wanted to say that the webarch def of IR is just plain wrong and we need to start by fixing it

<alanr> jonathan and I name this "200-responder"

<alanr> but fairly useless: "200 responder

DB: I keep hearing discussions about fidelity of the representation or its ability to reproduce the resource. Seems to me on the wrong track, because current definition is plain wrong. Correct definition would be "it is something that can give a 200 response." Having its essence conveyed is secondary. Lots of things can give 200's that cannot be conveyed in messages. E.g. current weather in xxxxx.

<Zakim> alanr, you wanted to question whether owner deciding whatever they want is any basis for an architecture. Also questions whether we should use http because of this baggage

<alanr> but fairly useless: "200 responder" == responds 200. Tautology.

AR: Some of this discussion takes us back to legacy issues. Considering the legacy stuff immutable is problematic. We can't have a protocol that lets anyone send anything, because other party needs to understand what's sent.
... Responding to David...IR is something that can respond 200 doesn't help, because it's a tautology. We're going to have to define something to give an interesting rule.

<Zakim> noah, you wanted to defend information resource concept

<scribe> scribenick: dbooth

NM: I was somewhat involved in the term IR. History is that TIMBL and others though people were thinking they were linking to documents. if you have a uri for me, the person, anything you can send back in a network stands in a different relation than what you can send in a document. I think you have to start with being careful with how the owner of the resource could/should define what the resource is. Suppose i want to define a resource as a specific sequence of chars. No question i can use a computer message to allow you to reconstruct the sequence as I've defined it. You closely get to Shannon's info theory. To me, that fits very well in nailing the distinction between what I will informally call a document versus Noah the person. You do have to be careful. We got pushback on a clock resource in a way we just got pushback on the weather example. But if we view it as a City, cloudy/sunny, etc, it can be reconstructed.
... Time of day is an IR; the clock itself is not. What do you mean by a blog? Is it the location on disk? I think it's okay for the info on the blog.

<scribe> scribenick: noah

<alanr> re: essence is a sequence of characters: I believe that md5 digest would be considered by some to be a reasonable representation.

<alanr> Can we rule this out?

<alanr> question: if the essence is that it is "about " something, that is very different than not

<alanr> particularly in terms of inference

<Zakim> alanr, you wanted to discuss "documents" as sensible, but generalizations are problematic

AR: Documents are much more intuitive as objects than most things. Most people seem comfortable with documents and their representations. Then they go off and generalize to things that don't have to do with documents. Clock and time of day are good examples. We don't have documents that constantly change with time of day.

<noah_> I think we do..front page of New York Times (NM)

<jar> jar wants to steer this back to what can we infer. talking about what something is is much slipperier

AR: People want to make predictions. In making the definition of time of day you will necessarily refer to something like physics, and that's not needed for more traditional documents
... We need to have URIs for documents, and for the things they are about.

<Zakim> dbooth, you wanted to point out that noah is confusing one instance of the weather report for the weather report in general over all time

AR: I'm comfortable talking about bits, which are what we send through the wire.

<dbooth> dbooth: How would you convey the essence of the blog in a message? The blog content may change in unknown ways in the future? The blog URI denotes the whole blog -- not merely the blog at one instant in time. The blog at one instant is what Jeffrey Mogul calls an "instance".

DB: I don't think what you're asserting holds up. You are confusing an instance of the blog in time with the blog in general.

<alanr> +1 to jar/inferences

<Zakim> ht, you wanted to suggest we avoid issues already known to be deep, c.f. FRBR (http://www.ifla.org/VII/s13/frbr/frbr.pdf)

<jar> sorry dbooth i was a bit early on the ack

DB: Responding to Alan, that's just the starting point.

<alanr> re: FRBR. Need to really avoid them. i.e. not just not talk about them, but know enough about the territory that we don't get people thing that we *are* talking about it when we don't want to.

<alanr> is time a work of art?

<alanr> fyi, I'm very familiar with it.

HT: It would be good if we could avoid having to solve previously unsolved problem, I.e. the problem sometimes called the "nature of the work of art". Consider the book Moby Dick. Strongly recommend people look at the frbr [sic] document. They have answers to very similar questions.

<alanr> frbr (and only incidentally time)

<alanr> frbr almost works, but misses, in the end.

See FRBR (http://www.ifla.org/VII/s13/frbr/frbr.pdf)

HT: Discusses the range of what people might mean by the unqualified term "Moby Dick"

<alanr> and http://www.frbr.org/

<alanr> semantic web needs to support them

HT: Recommend we not wrestle with all that. We better not have to answer those questions or we're unlikely to succeed here.

<alanr> +1 to idea of good and bad citizens. But we don't tell people how to be good citizens currently

HT: Also, I did not mean to strongly imply it was good for any resource owner to return anything he/she likes. Some players will be better than others. We should say what respectful use of 200 means.

<alanr> -1 up to the judgement of the user. Needs to involve the community more.

<alanr> something objective

<dbooth> +1 to the idea of good/bad citizens

<dbooth> +1 to avoiding the quagmire of defining IR (this is one reason why i advocate a simple "200 potential" def)

HT: I would rather focus on coding, that is, an IR conveys things via a coding such as English, JPEG, etc.

<dbooth> -1 to the judgement of the user

<Zakim> noah, you wanted to defend IR vs. documents

<scribe> scribenick: dbooth

NM: I think people are right to probe on things that vary with time. It's clear the varying with time is part of http, because we have time-out concept, caching, etc. It's intended that they can change. So i think i would be reasonable to say that the def of IR should be changed from "essence of the resource cna be conveyed" to "essence of a snapshot/current state of the resource can be conveyed". TimBL likes to talk about documents, which have a beginning and an end. My claim is that if i had a relational table, and I can tell you the names of the columns, that fits my def of an info resource (leaving time off the table). But the resource is the abstract table -- not the encoding on the wire. But when I send it on the wire, we'll agree that the encoding is not an essential part of the message.

<scribe> scribenick: noah

<alanr> state is better than essence. But how exactly is in the details.

<dbooth> ... That's an exampel of something i call an IR, but not happy with referring to it as a 'document".

<alanr> I guess a process issue is whether we will talk about inferences. Or not.

<alanr> consequences = inferences, btw

<jar> +1 to "state" instead of "essence"...

<Zakim> alanr, you wanted to discuss consequences of considering IR as "coding". But then in what sense can a coding be "generic"

<ht> Graphs, trees and databases are certainly within the ambit of what I mean by 'linguistic artefacts'

AR: We keep talking about descriptions of things. I would rather talk about inferences. On the semweb, we need to talk about what follows from what. Is the output of our work here to describe what happens in terms of inferences and consequences.

JR: httprange14 gets a lot of flack because people don't see what bad inferences will happen if 200 is returned.
... Maybe that's also (in)separable from the question of what's an information resource.

<dbooth> Noah's suggested change to the IR def is quite similar to what I'm proposing as the def. There would be a difference if noah's def would require the complete state to be sent.

JR: Might be good to settle this without noodling on the edge cases of "what's an IR"?

<jar> i want to settle the question: what inferences can we make, what mistakes are made if we don't follow "good practice"? not the question of "what's an IR"

<dbooth> +1 to community being able to judge

AR: Leaving the choices up to the sender doesn't work. We need rules the sender and receiver will agree on. Outside observers need to be able to judge what's a mistake.

<Zakim> dbooth, you wanted to ask why it matters if the essense of a snapshot/state of a resource can be conveyed in a message?

DB: Drilling on Information Resource. I like changing things along the lines of what Noah suggested, which suggests that the instantaneous state of a resource is transmissible in a message.

<alanr> range retrievals are described in terms of "content" which I understand as "representation". Not resource.

<alanr> i.e. can't describe "content" headers in ways that don't change across conneg types.

DB: I think that, with that change, it has the same impact as the change I was proposing.

<noah_> I'm happy with the change, but I too found the "it returns a 200 if it can return a 200" circular, so I accept they're in the same spirit, but not that they're the same.

<alanr> no. jar

JR: If you look at "what can you infer?", and assume people are behaving well, what inferences can you draw from a 200? Can I learn anything? As far as I can tell, all you can infer is that the owner of the resource >could have< given you a full fidelity representation. Perhaps s/he SHOULD have. Right now, you don't know if they did.

<Zakim> dbooth, you wanted to say that actually what primarily matters in the first order is that you might get a 200 back

JR: I'd like to go in the direction of finding out what you can infer.

DB: At risk of appearing to contradict myself-- the ability to get a 200 back is the first order of what we want to know, because it tells you about whether a GET is useful to attempt. Then we can follow up from there.

<alanr> cause I see subject of our work to figure out what's needed for "S" use on the "W"

<Zakim> alanr, you wanted to whether it might be a good strategy to approach the problem by designing a new protocol. Then see if what we find useful can be ported back to http.

<jar> thanks dbooth, i forgot that it ends at 10 instead of 10:30. let's plan next steps next

AR: I wonder if it's better starting fresh here. Should we consider designing a new protocol that shows us what we really would want for the semantic web if we had a clean slate, learn from that, then use it to inform our analysis of how to use HTTP well.

<dbooth> I think we are making progress, BTW.

How long to meet?

HT: Prefer one hour

JR: Need to figure out next meeting date.

??: January 8th

JR: We will meet for one hour on the 8th.
... Now we need to find out what our homework is for next meeting. Noah did some looking at RFC 2616, as we agreed. Maybe next goal is to have some RDF written down?

<ht> 2396 is where the notions of non-retrievable and time-varying resource comes in: http://www.ietf.org/rfc/rfc2396.txt

AR: Is our goal to get to inferences? In any case, we need to be clear on our goals for the work in this group. Can discuss in email.

<noah_> Henry, that's very helpful.

<ht> Compare that to 1738

HT: I'm in favor of aiming for being able to make inferences. 1738 is original URL spec.

<ht> Having said that (about 2396), I believe that _idea_ comes from Roy Fielding's thesis

NM: I think inferences is the right goal, but I know less about semweb than everyone else here, so am adding zero in saying that.

<alanr> we could certainly conclude that http response codes are not relevant for SW interactions.

HT: The HTTP RFC is up for rewriting. If there are things we need, then we need to be able to tell the drafters that. Hence, an important output of this group should be input to the redrafting of HTTP.

<alanr> current response codes

<dbooth> Another goal is to clarify conventions needed that build upon the old-fashioned Web arch to make Semantic Web work.

JR: We also talked about something like a layered protocol on top of HTTP. We should keep that option open.

HT: Yes.

<jar> ht, we have the option of making or recommending new "law", not necessarily the http revision

<alanr> bye (after dbooth)

AR: I think we need to clarify conventions for applying the old web for semweb -- that's a bit more than interpreting it.
... Also, on the question of whether everything in the universe "has" a URI, I'm doing some work on the question of what a URI means. One way to look at it is that a URI denotes a set of assertions.

<alanr> no uris for people, then.

JR: Adjourned.

<dbooth> http://dbooth.org/2007/uri-decl/

- DRAFT -

Telcon of the W3C study group on semantics of HTTP

11 Dec 2007

Attendees

Contents

What can we infer from a 200 response

How long to meet?

Summary of Action Items