Something's I might have said if I'd been at the meeting. from Williams, Stuart (HP Labs, Bristol) on 2007-12-14 (public-awwsw@w3.org from December 2007)

From: Williams, Stuart (HP Labs, Bristol) <skw@hp.com>
Date: Fri, 14 Dec 2007 17:12:59 +0000
To: "public-awwsw@w3.org" <public-awwsw@w3.org>
Message-ID: <9674EA156DA93A4F855379AABDA4A5C60FBD018786@G5W0277.americas.hpqcorp.net>

Since I wasn't at the meeting on Tuesday I've annotated a few of my thoughts arising from your discussion into a copy of the minutes, below.

There was a repeating theme in the minutes of Jonathan trying to get the discussion back to... "what inferences can be made?"...

I find myself wanting to turn that around - lifting a comment from further down wrt to homework:

"I'd like to suggest that Alan and Jonathan spend some time enumerating some examples of the sort of inferences that they would like to be able to make - ie come at the problem from the other end. Then we can explore whether the are plausible lines of reasoning to reach those kinds of conclusions. I suspect that any chain of reasoning would take in request and response headers aswell as response codes. Start with the conclusions you'd like to be able to justify and backward-chain rather than forward - that would be my suggestion."

Regards,

Stuart
--

________________________________

[http://www.w3.org/Icons/w3c_home]<http://www.w3.org/>

- DRAFT -
Telcon of the W3C study group on semantics of HTTP
11 Dec 2007
Attendees

Present
David Booth, Noah Mendelsohn, Jonathan Rees, Allen Ruttenberg, Henry Thompson
Regrets
Chair
Jonathan Rees
Scribe
Noah Mendelsohn
Contents

* Topics<file:///C:/Documents%20and%20Settings/skw/Local%20Settings/Temporary%20Internet%20Files/OLK19/11-minutes.html#agenda>
* What can we infer from a 200 response<file:///C:/Documents%20and%20Settings/skw/Local%20Settings/Temporary%20Internet%20Files/OLK19/11-minutes.html#item01>
* How long to meet?<file:///C:/Documents%20and%20Settings/skw/Local%20Settings/Temporary%20Internet%20Files/OLK19/11-minutes.html#item02>
* Summary of Action Items<file:///C:/Documents%20and%20Settings/skw/Local%20Settings/Temporary%20Internet%20Files/OLK19/11-minutes.html#ActionSummary>

________________________________

We are discussing the "brainstorm page" at http://esw.w3.org/topic/AwwswTopicsBrainstormPage

What can we infer from a 200 response

<scribe> scribenick: dbooth

Noah: What occured to me is that we've gone to some trouble to invent the notion of information resource. It's clear that the sense of it is that they are different because a high fidelity rep can be transmitted in a computer message. Then we say those are the ones you should use a 200 with, clearly implying that there is no rep of something like me the person. It's clearly implied that if I have an info resource i should try to send hi fi reps except in ccases i need to send a media type that's lossy. All I was pointing out is that i don't think there's anything normative that says that. So we're putting a lot of effort into the case where it's a person. But we haven't set the bar high for info resources either. what's good practice? should the webarch be updated to say "if you have an IR you shoudl send enough info to reconstruct the state".

<scribe> scribenick: noah

AR: I've heard Richard Cyganiak saying "the representation is whatever the owner of the resource wants it to be". I hear people saying the representation is ephemeral, only exists on the wire.

Well one only experiences the exchange of representations in messages. There is a type/token question as to whether representations are ephemeral things on a wire (tokens) or a type/class all the messages that convey an identical set of bits.

Whilst I have previously taken the view of representations as messages (ephemeral tokens), I think that there is good reason to regard them (representations) as a type characterised by the bit/byte sequence those message convey. That probably still leaves representations as somewhat ephemeral - in that the can still change over time, however representations conveyed in response to two different requests of the same resource are not necessarily different. I think that view is probably reinforced by things like the HTTP last-modified and etag headers (modulo Moguls paper and the concept which call 'instance').

DB: So, I'm not sure 200 conveys anything other than it's "not not an information resource"

JR: I find the definition of information resource (IR) troubling, and have some sympathy with those who say that the architecture shouldn't be too prescriptive

The term IR in webarch has a long and tricky birth. It was clear that during its genesis that there was a category of things that at least some TAG members felt strongly that it was important to distinguish from others. We tried "web resources" being roughtly those that return 200 ok' - that didn't stick. An early position for some was centred on the presense/absense of a '#' in an http URI (this is and always has been only about HTTP URI, not URI in general) - but that seemed to be making the distinction on the basis of spelling. Upholding this view would have meant that the architecture would place restrictions on what could or could not be named with HTTP URI, sans '#'. It would have meant that naming (referring to, denoting...) a person or a dog or... any material thing, with an HTTP URI sans '#' would have been wrong per webarch. What the TAG's httpRange-14 resolution does is removes that restriction - you can choose to name, refer-to, denote any kind of thing with an http: URI sans '#' if you choose to. It means that you *don't* have to consider it's "essential" nature in giving it a name.

However, that came with a cost. Those for whom the distinction was important, having yielded on a syntactic marker, still needed a means to know when they were being served a representation of the thing that they asked for rather than a description/depiction of it (possibly amongst other things).

I tried to press TimBL recently on why he regarded the distinction as so vitally important - but he answered somewhat differently. He answered more along the lines that what was important was that when deploying a URI one considers whether the kind of response one would be able to make to a GET. I have a hard time tying this down in a throughly objective fashion, but qualitatively I'm quite comfortable with it. It hinges really on the difference between description/depiction and representation (in the sense of webarch:Representations). If you are only able to describe/depict the thing being named/referenced/denoted then you really cannot properly provide a representation of it - you can merely provide a representation of something descriptive/depictive of the thing. The thing and a description/depiction of it are distinct. There is an implicit sense also in which a representation originates from the represented thing. I think that its quite clear that a library record, or a web page, could have a webarch:Representation. I think that it's also quite clear that the planet Mars couldn't. Some would argue... sure it can, I could regard a picture of the planet Mars an adequate representation and serve a serialised jpeg image, and I suppose that you could argue that the image even in some sense orignated form the planet though it wouldn't really be very current. And one could even fed the image from a webcam directed at the planet in order to keep it more up to date. However, to my mind Mars itself is incapable of responding to web requests; what is being convey are the results of a particular observation of the planet rather than a representation of the planet itself.

Does anyone have a razor sharp way to distinguish a webarch:Representation of a thing from a (webarch:Representation of a) description of a thing (possibly amongst other things)? I'd venture that (modulo conneg), at a given instant a resource has a single represented state whereas it may have multiple, distinct and possibly conflicting descriptions/depicitions which may also be stale.

All that said, I don't think there is dispute about there being a distinction between a thing and a description/depiction of the same thing. There is more a question about why a representation of a description/depiction could/should not serve double duty as a representation of the thing itself - ie. for Mars, clearly assign distinct URI to denote the planet and each of multiple distinct depictions/descriptions, but why not allow yourself to serve a representation of a description/depiction as a representation of the planet under a 200 response? The best answer I have is that at least some would regard such a 200 response as having come from the planet, as having involved interaction with the planet - which would be patently false - it was not the planet responding to the request, it is not possible for it to do so - and as Pat has pointed out on other occasions, it would be patently absurd to expect Mars to alter its state in response to a PUT, POST or DELETE.

I also want to come at this from a slighty different tack. Wind the clock back; Remove the IR/non-IR categorisation; Agonise no-longer about whether a thing is an information resource or not; consider the kind of things you are interested in: proteins, samples, drugs, organs, cell-types, medical conditions...whatever. What kind of HTTP names would you allow yourself to use for them? Would you feel constrained or not to use fragment Id (because none of these are web documents or things that are attached to the web). Wouldn't you have the same set of problems/issues? ie. the httpRange resolution is really not the source of the problem. AFAICT there are a great many things that need to be spoken of (described/depicted) rather than represented - and what gets represented is information *about* those things rather than *from* those things.

HT: Thinking about conneg, which is related...there is an old move in formal semantics which is "I can't tell you what something is, but I can tell you when two things are the same." The bar for conneg is not external measure of similarity, but URI owner warrants similar for his or her purposes.
... We can still give guideilines as to what people SHOULD do. There will still not be an objective measure you can hold up.

<jar> we lost noah again

<Zakim> ht, you wanted to put the minter on the spot

<Zakim> dbooth, you wanted to say that the webarch def of IR is just plain wrong and we need to start by fixing it

<alanr> jonathan and I name this "200-responder"

<alanr> but fairly useless: "200 responder

DB: I keep hearing discussions about fidelity of the representation or its ability to reproduce the resource. Seems to me on the wrong track, because current definition is plain wrong. Correct definition would be "it is something that can give a 200 response." Having its essence conveyed is secondary. Lots of things can give 200's that cannot be conveyed in messages. E.g. current weather in xxxxx.

Well... that's really a 200 for the weather report or forecast, *not* the weather itself.

<Zakim> alanr, you wanted to question whether owner deciding whatever they want is any basis for an architecture. Also questions whether we should use http because of this baggage

It's always been the way for the GOFW. URI owners get to say what resource the URI identifies/denotes. There is a counter view, due at least in part to Larry Masinter... at least wrt http: and ftp: URI that actually it is the URI scheme specification which tells you. The ftp:scheme is a good example of this in that it gives a very operationalised account of what resource an ftp: scheme URI refers to. What it does not so is give any account of the siginficance of the resources - so I think these accounts operate at different levels. In the case of ftp: what is identified (denoted) is the thing that would be accessed by performing a series of FTP protocol operations. Likewise, for http.

Earlier specs also seemed to distinguish URI from URI references, the former having no potential to carry a fragment, whilst the latter could. However, more recent versions cast URI as absolute URI references - so URI can infact have fragments. Nevertheless, the older styles of use have affected peoples world views and the positions which they have build around them.

For quite sometime Mark Baker has asserted (eg in emails) that the URI http://www.markbaker.ca/ identifies/denotes the person Mark Baker on the basis that he is the relevant authority for that URI. Pre-httpRange resolution - folks would argue about whether that was permitted (let alone sane) by Webarchitecture. Post-httpRange resolution it is certainly permitted, however, by the resolution, in order for that to conform there should be a 303 redirect in place. There isn't (at present) so the current deployment of that page is inconsistent with the conjuction of the TAG's resolution and Mark claim. Actually, I am probably being unfair to Mark, in that the claim he may have been making is that the given URI *can* be used to identify him the person, and he'd then possibly elaborate by speaking of indirect identification along the lines of himself being the maintainer of that resource (much like foaf uses an inverse functional foaf:mbox as an 'indirect' identifier for a person).

<alanr> but fairly useless: "200 responder" == responds 200. Tautology.

AR: Some of this discussion takes us back to legacy issues. Considering the legacy stuff immutable is problematic. We can't have a protocol that lets anyone send anything, because other party needs to understand what's sent.
... Responding to David...IR is something that can respond 200 doesn't help, because it's a tautology. We're going to have to define something to give an interesting rule.

<Zakim> noah, you wanted to defend information resource concept

<scribe> scribenick: dbooth

NM: I was somewhat involved in the term IR. History is that TIMBL and others though people were thinking they were linking to documents. if you have a uri for me, the person, anything you can send back in a network stands in a different relation than what you can send in a document. I think you have to start with being careful with how the owner of the resource could/should define what the resource is. Suppose i want to define a resource as a specific sequence of chars. No question i can use a computer message to allow you to reconstruct the sequence as I've defined it. You closely get to Shannon's info theory. To me, that fits very well in nailing the distinction between what I will informally call a document versus Noah the person. You do have to be careful. We got pushback on a clock resource in a way we just got pushback on the weather example. But if we view it as a City, cloudy/sunny, etc, it can be reconstructed.
... Time of day is an IR; the clock itself is not. What do you mean by a blog? Is it the location on disk? I think it's okay for the info on the blog.

<scribe> scribenick: noah

<alanr> re: essence is a sequence of characters: I believe that md5 digest would be considered by some to be a reasonable representation.

..of the time of day? that doesn't seem plausible really. MD5's are not reversible IIRC... it is at least v.hard to recover a time of day from an MD5.

<alanr> Can we rule this out?

To what does 'this' refer?

<alanr> question: if the essence is that it is "about " something, that is very different than not

<alanr> particularly in terms of inference

<Zakim> alanr, you wanted to discuss "documents" as sensible, but generalizations are problematic

AR: Documents are much more intuitive as objects than most things. Most people seem comfortable with documents and their representations. Then they go off and generalize to things that don't have to do with documents. Clock and time of day are good examples. We don't have documents that constantly change with time of day.

<noah_> I think we do..front page of New York Times (NM)

<jar> jar wants to steer this back to what can we infer. talking about what something is is much slipperier

Try: Can infer:

1) that the returned representation orginated from interaction with the resource (may have been sourced from a cache - but originated for an interaction with the resource).
2) that the representation is a 'faithful' representation upto the upper bound of any reported cache lifetime (assuming that the resource is not being malicious).

I would find it helpful if you could give some examples of the sort of inferences that you would like to be able to make. Whilst we seem to be hung up on the IR/nonIR distinction I don't think that's a distinction that you find useful, so I don't think that it is representative of the sort of inference that you want to be able to make.

I suspect that you like to be able infer things like:

- a resource is invariant;
- that two URI refer to the same thing;
- what resource a URI refers too;

I don't think that HTTP status codes can tell you any of those things. So... examples of credible inferences that you'd like to be able to make from the http response codes and response headers would be useful.

AR: People want to make predictions. In making the definition of time of day you will necessarily refer to something like physics, and that's not needed for more traditional documents
... We need to have URIs for documents, and for the things they are about.

Yes... but documents can change (though... yes they don't have to).

<Zakim> dbooth, you wanted to point out that noah is confusing one instance of the weather report for the weather report in general over all time

AR: I'm comfortable talking about bits, which are what we send through the wire.

<dbooth> dbooth: How would you convey the essence of the blog in a message? The blog content may change in unknown ways in the future? The blog URI denotes the whole blog -- not merely the blog at one instant in time. The blog at one instant is what Jeffrey Mogul calls an "instance".

I have two responses. One is to say that we mispoke and what was intended was that current state of the resource could be conveyed in a message (which accords with where the discussion went).

The second is to say, that in principle, at any point in the future the entire history of the blog could be serialised in a message (it's just not a message you expected to get back when dereferencing the blog at any given moment) - and if you're really going to quibble I'd cite that serialisation at the future instant when the universe implodes - or maybe just prior to the death of the plant (assuming that we haven't moved off to colonise other planets and taken the WWW with us).

DB: I don't think what you're asserting holds up. You are confusing an instance of the blog in time with the blog in general.

<alanr> +1 to jar/inferences

As above... I'd like to turn this around and have examples of the sort of inferences you'd like to be able to make.

<Zakim> ht, you wanted to suggest we avoid issues already known to be deep, c.f. FRBR (http://www.ifla.org/VII/s13/frbr/frbr.pdf)

<jar> sorry dbooth i was a bit early on the ack

DB: Responding to Alan, that's just the starting point.

<alanr> re: FRBR. Need to really avoid them. i.e. not just not talk about them, but know enough about the territory that we don't get people thing that we *are* talking about it when we don't want to.

<alanr> is time a work of art?

<alanr> fyi, I'm very familiar with it.

HT: It would be good if we could avoid having to solve previously unsolved problem, I.e. the problem sometimes called the "nature of the work of art". Consider the book Moby Dick. Strongly recommend people look at the frbr [sic] document. They have answers to very similar questions.

<alanr> frbr (and only incidentally time)

<alanr> frbr almost works, but misses, in the end.

See FRBR (http://www.ifla.org/VII/s13/frbr/frbr.pdf)

HT: Discusses the range of what people might mean by the unqualified term "Moby Dick"

<alanr> and http://www.frbr.org/

<alanr> semantic web needs to support them

HT: Recommend we not wrestle with all that. We better not have to answer those questions or we're unlikely to succeed here.

<alanr> +1 to idea of good and bad citizens. But we don't tell people how to be good citizens currently

HT: Also, I did not mean to strongly imply it was good for any resource owner to return anything he/she likes. Some players will be better than others. We should say what respectful use of 200 means.

<alanr> -1 up to the judgement of the user. Needs to involve the community more.

<alanr> something objective

<dbooth> +1 to the idea of good/bad citizens

<dbooth> +1 to avoiding the quagmire of defining IR (this is one reason why i advocate a simple "200 potential" def)

HT: I would rather focus on coding, that is, an IR conveys things via a coding such as English, JPEG, etc.

<dbooth> -1 to the judgement of the user

<Zakim> noah, you wanted to defend IR vs. documents

<scribe> scribenick: dbooth

NM: I think people are right to probe on things that vary with time. It's clear the varying with time is part of http, because we have time-out concept, caching, etc. It's intended that they can change. So i think i would be reasonable to say that the def of IR should be changed from "essence of the resource cna be conveyed" to "essence of a snapshot/current state of the resource can be conveyed". TimBL likes to talk about documents, which have a beginning and an end. My claim is that if i had a relational table, and I can tell you the names of the columns, that fits my def of an info resource (leaving time off the table). But the resource is the abstract table -- not the encoding on the wire. But when I send it on the wire, we'll agree that the encoding is not an essential part of the message.

I like Fieldings model of a resource as a mapping from time to sets of available representations. The resource is that construct over all time, not just at an instant. At an instant the available set of representations is a coherent set of representations where the conneg makes a selection in the event of an access at that instant.

<scribe> scribenick: noah

<alanr> state is better than essence. But how exactly is in the details.

<dbooth> ... That's an exampel of something i call an IR, but not happy with referring to it as a 'document".

<alanr> I guess a process issue is whether we will talk about inferences. Or not.

<alanr> consequences = inferences, btw

<jar> +1 to "state" instead of "essence"...

<Zakim> alanr, you wanted to discuss consequences of considering IR as "coding". But then in what sense can a coding be "generic"

<ht> Graphs, trees and databases are certainly within the ambit of what I mean by 'linguistic artefacts'

AR: We keep talking about descriptions of things. I would rather talk about inferences. On the semweb, we need to talk about what follows from what. Is the output of our work here to describe what happens in terms of inferences and consequences.

JR: httprange14 gets a lot of flack because people don't see what bad inferences will happen if 200 is returned.
... Maybe that's also (in)separable from the question of what's an information resource.

<dbooth> Noah's suggested change to the IR def is quite similar to what I'm proposing as the def. There would be a difference if noah's def would require the complete state to be sent.

Theres a significant difference in that potential to serialize state in a message and return with a 200 is different from the actuality of whether that happens or not.

JR: Might be good to settle this without noodling on the edge cases of "what's an IR"?

<jar> i want to settle the question: what inferences can we make, what mistakes are made if we don't follow "good practice"? not the question of "what's an IR"

What inferences do you want to make? What inferences would lead you to disaster?

<dbooth> +1 to community being able to judge

AR: Leaving the choices up to the sender doesn't work. We need rules the sender and receiver will agree on. Outside observers need to be able to judge what's a mistake.

<Zakim> dbooth, you wanted to ask why it matters if the essense of a snapshot/state of a resource can be conveyed in a message?

DB: Drilling on Information Resource. I like changing things along the lines of what Noah suggested, which suggests that the instantaneous state of a resource is transmissible in a message.

<alanr> range retrievals are described in terms of "content" which I understand as "representation". Not resource.

<alanr> i.e. can't describe "content" headers in ways that don't change across conneg types.

DB: I think that, with that change, it has the same impact as the change I was proposing.

<noah_> I'm happy with the change, but I too found the "it returns a 200 if it can return a 200" circular, so I accept they're in the same spirit, but not that they're the same.

<alanr> no. jar

JR: If you look at "what can you infer?", and assume people are behaving well, what inferences can you draw from a 200? Can I learn anything? As far as I can tell, all you can infer is that the owner of the resource >could have< given you a full fidelity representation. Perhaps s/he SHOULD have. Right now, you don't know if they did.

<Zakim> dbooth, you wanted to say that actually what primarily matters in the first order is that you might get a 200 back

JR: I'd like to go in the direction of finding out what you can infer.

DB: At risk of appearing to contradict myself-- the ability to get a 200 back is the first order of what we want to know, because it tells you about whether a GET is useful to attempt. Then we can follow up from there.

<alanr> cause I see subject of our work to figure out what's needed for "S" use on the "W"

<Zakim> alanr, you wanted to whether it might be a good strategy to approach the problem by designing a new protocol. Then see if what we find useful can be ported back to http.

<jar> thanks dbooth, i forgot that it ends at 10 instead of 10:30. let's plan next steps next

AR: I wonder if it's better starting fresh here. Should we consider designing a new protocol that shows us what we really would want for the semantic web if we had a clean slate, learn from that, then use it to inform our analysis of how to use HTTP well.

<dbooth> I think we are making progress, BTW.

How long to meet?

HT: Prefer one hour

JR: Need to figure out next meeting date.

??: January 8th

JR: We will meet for one hour on the 8th.
... Now we need to find out what our homework is for next meeting. Noah did some looking at RFC 2616, as we agreed. Maybe next goal is to have some RDF written down?

I'd like to suggest that Alan and Jonathan spend some time enumerating some examples of the sort of inferences that they would like to be able to make - ie come at the problem from the other end. Then we can explore whether the are plausible lines of reasoning to reach those kinds of conclusions. I suspect that any chain of reasoning would take in request and response headers aswell as response codes. Start with the conclusions you'd like to be able to justify and backward-chain rather than forward - that would be my suggestion.

<ht> 2396 is where the notions of non-retrievable and time-varying resource comes in: http://www.ietf.org/rfc/rfc2396.txt

AR: Is our goal to get to inferences? In any case, we need to be clear on our goals for the work in this group. Can discuss in email.

<noah_> Henry, that's very helpful.

<ht> Compare that to 1738

HT: I'm in favor of aiming for being able to make inferences. 1738 is original URL spec.

<ht> Having said that (about 2396), I believe that _idea_ comes from Roy Fielding's thesis

NM: I think inferences is the right goal, but I know less about semweb than everyone else here, so am adding zero in saying that.

<alanr> we could certainly conclude that http response codes are not relevant for SW interactions.

HT: The HTTP RFC is up for rewriting. If there are things we need, then we need to be able to tell the drafters that. Hence, an important output of this group should be input to the redrafting of HTTP.

<alanr> current response codes

<dbooth> Another goal is to clarify conventions needed that build upon the old-fashioned Web arch to make Semantic Web work.

JR: We also talked about something like a layered protocol on top of HTTP. We should keep that option open.

HT: Yes.

<jar> ht, we have the option of making or recommending new "law", not necessarily the http revision

<alanr> bye (after dbooth)

AR: I think we need to clarify conventions for applying the old web for semweb -- that's a bit more than interpreting it.
... Also, on the question of whether everything in the universe "has" a URI, I'm doing some work on the question of what a URI means. One way to look at it is that a URI denotes a set of assertions.

<alanr> no uris for people, then.

JR: Adjourned.

<dbooth> http://dbooth.org/2007/uri-decl/

Summary of Action Items

[End of minutes]

________________________________
Minutes formatted by David Booth's scribe.perl<http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm> version 1.128 (CVS log<http://dev.w3.org/cvsweb/2002/scribe/>)
$Date: 2007/02/23 21:38:13 $

Attachments

application/octet-stream attachment: w3c_home

Received on Friday, 14 December 2007 17:16:45 UTC