- From: Jonathan Rees <jar@creativecommons.org>
- Date: Sat, 14 Mar 2009 18:07:35 -0400
- To: Larry Masinter <masinter@adobe.com>
- Cc: "www-archive@w3.org" <www-archive@w3.org>
On Sat, Mar 14, 2009 at 12:14 PM, Larry Masinter <masinter@adobe.com> wrote: >> The httpRange-14 rule can be reinterpreted: If you make an RDF theory >> of HTTP or of some 200-yielding HTTP resource, > > I don't understand the category of "200-yielding HTTP resource" > in terms of the temporal behavior. HTTP talks about some resource and says what the various methods do with it. It's vague on what GET does but I think it's not too much of a stretch that the intent is for a 200 response to a GET to be either a REST-representation of the current state of the resource (if a "network data object") or information that comes from the resource (if a "network service"). If pushed I would say I meant "a server that located the resource in response to a GET responded with a 200" and that this 200 had whatever meaning was conferred by the GET method, which I would say is that the response carries some information that came from the resource (a representation of its state or information provided by the service). Thus the category is that of resources for which well-intentioned servers might yield a 200 in response to a GET (at any point in the resource's history). Non-operational, of course, and perhaps vacuous, since there is no experiment you could do to distinguish the case where the response was coming-from some located resource from the case where the server was doing something else (such as simulating such behavior). But that's what makes it a model. I was just using a shorthand that I thought you'd understand; sorry for being inaccurate. > There are resources which are > identified by URIs, and there are HTTP servers that respond to > requests. Some requests will get a 200 response and some requests > will get some other response, or no response at all, or will fail > to open a TCP connection or other behavior will be evidenced, and > the behavior varies over time. I don't understand how > "200-yielding" is used as a category distinction. I guess a better category distinction would be between the sorts of "resources" that are treated in 2616 and those "resources" that are covered by 3896 but not 2616. All I meant was that if you get a 200, then according to HTTP the URI identifies (well, locates) a resource of the kind that 2616 is about. If I get a 404, that says the HTTP server doesn't locate any resource using the request-URI (at this time). But some other 3896 resource (perhaps one that *couldn't* be treated by 2616 because it's not a "network resource") might be meant if you were to use the URI with another protocol or language. I already agreed that this is at the non-normative non-operational level. I think there is still something to be said even if we agree not to talk in this way (see below). > Some URIs > have never produced a 200 in the past but might in the future. > Others have consistently produced 200 responses in the past, > but might stop any minute now. Yes, but if you know what the resources are (how the server is implemented), you might be able to make some plausible predictions or credible commitments, especially if the context is bounded somehow. (HTTP doesn't tell you much about the resource, so you would have to learn about it in some other way, or gamble.) > ("duri" provides a way of at least fixing the temporal behavior > if you want to talk about the 'expected operational behavior' > at a point in time. My view is that all web servers had a > time of initial operation, and have (or will have) a time when > they cease to function, and so temporal variance of operational > behavior is a serious problem for assertions made about > time-independent concepts.) I agree that http: URIs, in the HTTP protocol, identify things that are physically contingent, not entities determined by any ideal or principle (such as a document having some checksum, or a series of versions of a document, or even a web service possessing any regular properties). But I don't think time dependence is the killer here, I think it's physical grounding. Non-operational time-dependent "concepts" are useful too. >> please try to put >> the RDF referent of the URI in classes that are fairly closely tied to the way >> HTTP is used - e.g. documents, web pages, REST resources, web services, >> and so on. > > What does it mean to "put the RDF reference of the URI in classes"? To put something in a class = assert that the thing rdf:type the class, or to make some other assertion that entails such a type assertion. You might say, in RDF, that the URI refers to operational behavior, or to a REST resource, or to a TimBL "generic resource", or to a FRBR "expression", or an IAO "information content entity". Although mutually inconsistent, I'd say they all can be related to the way HTTP is used (or more generically, to what we do with the web). That is, we put documents (or more elaborate entities such as conneg representation sets, version sequences, services, etc.) on the web. When we want to talk about the documents, it is expedient to use http: URIs to mean the documents, not what happens to be on the web. If the two get out of sync - well that's some kind of bug, but for HTTP we get what's on the web, while in RDF we continue to talk about the document. (To make this work we have to have information adequate for identifying the document, such as statements about checksums, write dates, mirror locations, and so on. Something more elaborate for time-varying documents or idealized services.) No avoiding this other than to "break the web" by using duri: or lsid:. > If you have an assertion expressed in RDF using a URI, how do you > "put the RDF reference" in a class? There's a model in which an > assertion written in terms of RDF and URIs is mapped to an assertion > about the real world. Is your request ("please try") in the context > of "when constructing the model"? No, it's "when specifying the theory". The advice can be elaborated as recommended practice about writing RDF (and therefore about theory / ontology design). I see no particular problem with this - it's like a style guide. The "please try" is in the context of writing RDF and prose that might guide others in the formation of their models. I guess I'm not saying this very well. > I think I understand what you're asking if it's in terms > of constructing a model, but I’m less sure what's involved > in constructing a model. I like the idea that a "model" is > (abstractly) the way in which you map the identifiers in a > language to concepts or entities in the real world, but I > can't imagine how one would "write down" a model, since the > only languages we have for writing anything down require > a model themselves to understand them. I take model-building as a story about what people do with axiom sets (or theories) and the non-logical paraphernalia that accompanies them, such as the strings comprising URIs, labels, comments, and so on. What people do is to locate or create mental, written, and physical things that behave consistently with the theory. Talking about such "modeling" is of course just another bunch of talk, but it's a way of talking that has been successful for 70 years or so. The idea is that modeling is essentially private, while logic is public and has agreed-on ground rules (like baseball). The paraphernalia is a gray area (that I've sometimes argued with Pat Hayes about). If someone decides to ignore the documentation you write about what you intend for models of your theory to look like, in some sense that's ok (if sad) because only the logic is "normative". In Pat's view, if the logic doesn't almost force you to create either something close to the intended models or no model, then the axioms are inadequate and you need to make them more detailed. > >> Make "identification" under HTTP as close as you can to >> "identification" in a model of your RDF theory. > > HTTP doesn't have a theory of "identification" -- at least > "identification" was not a topic of the HTTP working group. It has a theory of "location" and I thought it was clear that the resource that the server locates given the request-URI would be the one that was "identified" according to 3986. Can you give me an example of a situation in which they're different? I don't mean "as close as you can" to be taken too strongly - it was probably a misstatement. One could guarantee closeness by saying that http: URIs have to mean in RDF what they locate in HTTP. I don't think that's necessary or desirable. But that doesn't mean there should be no relation. > httpRange might be proposing a model of "identification", and > then proposing that RDF model makers use it? Which I suppose > is useful to those thinking about using RDF and making RDF > models? Yes. I think the intent was for identification to be the same everywhere, as in AWWW, but for httpRange-14 the relevant (but unstated) context is RDF. >> If we could relate HTTP to RDF, this could set an example for relating >> other protocol pairs, and if the approach became methodical, we might >> be able to make suggestions about some general theory of the >> (recommended) meanings of URIs - identifying equivalences, types, and >> relations in one domain and showing how they correspond to >> equivalences, types, and relations in another. > >> Personally I think HTTP/RDF is the only case worth pursuing, and >> it should be useful to limit the hunt to this one quarry. > > I don't understand your optimism that any possible relationship > between the HTTP protocol and the RDF representation system > would be useful enough in general to be worth pursuing. We're talking about an idea that goes back to Tim's writings in 1989, and the W3C metadata activity that spawned RDF. Originally RDF realized Tim's idea of having descriptions of things on the web (resources). Things on the web were identified by their URLs, and you would say things about them like their author, publication date, and so on. The "semantic web" idea was a retrofit. The idea has problems, as we both well know. There is certainly a difference between the things that RDF writers are naming using HTTP URIs, and the resources located (according to the HTTP protocol) using those URIs. Sure there is no normative way to link these two things, so a priori there is no connection between the way a URI is used in the two situations. But I think that current practice is mostly consistent with some kind of alignment, most of the time. Just because the notion of metadata for something named by an http: URI is not and cannot be rigorous doesn't mean we should embrace either a total free-for-all or a complete separation of namespaces. If HTTP gives us wildly varying results over time, or always 404s, then that URI probably won't be used in RDF to name a stable document. Contrariwise, httpRange-14 says that if the HTTP behavior might lead one to think that the HTTP resource is an implementation of something that has been put on the web, then don't use the URI in RDF to name something that can't be implemented on the web. (The gap in my story is a good definition of "implemented". Work in progress.) Obviously the web and web servers are imperfect and no matter how hard we try we will fail to infallibly put something like Moby Dick or the US Constitution "on the web" at some URI. I.e. the correspondence between HTTP and RDF can never be perfect (when we use RDF to talk about something that is not *by definition* on the web, and Moby Dick is not). But the key to uniting the web and the semweb is to exploit the correspondence, when it works, and this in turn depends on denying the correspondence, when it is known not to be there. A more principled design is to require assertions, in RDF, that a resource (e.g. Moby Dick), or its current state, is on the web at some http: URI, or to give equivalent resolution information. A very similar discussion is taking place on public-awwsw, by the way. See e.g. the thread that contains this message http://w3.org/mid/9D77C204-5E7E-4DFA-9447-01FA243E7C5E@creativecommons.org But there, I'm taking the tack of laying out all available "models" of so-called "information resources" in an attempt to show that they are mutually inconsistent. I think there is a middle ground here, between the squishy AWWW position (including httpRange-14) and your hardball doesnt-work-so-give-up-already. I'm being pushed toward it from both sides, which is good. If I end up with no ground to stand on, then I will retreat to some more easily explained position. > I don't think there's any generic model of http URIs that > can serve to give RDF the representational power it needs to > make assertions about anything other than web servers and > how they behavior. I agree with this. http: URIs are used non-generically in RDF. That's a flaw in the way the httpRange-14 rule is formulated. That doesn't mean the idea is necessarily a bad one. I think there ought to be a way to distinguish page-and-document things (network resources and the things they are meant to model, implement, or be modeled by) from other things, and say you are asked not to use an http: URI in RDF if there is a corresponding network resource that is not a model/implementation/modeling target of it. There also ought to be a reason for this, so that you'll be motivated to follow such a rule; but making the reason concise and convincing has been difficult (notwithstanding a certain individual's strong assertions that this makes a huge difference). > ("tdb" and HTTP URIs might be used with such a model to > allow assertions about other things, of course, at the cost > of the extra syntax necessary.) > >> Whether anyone else will like this, I'm not sure - certainly the semantic web >> view is different from the above, since it thinks the world is just >> one big happy >> ontology. I don't buy that. (RDF != semantic web) > > I don't think the world is "just one big happy ontology", but > I'm not sure I've seen anyone stand up directly for that point > of view -- when pushed on it, most seem to think that someone > else made that assumption when RDF was designed, and they're > happy that someone else has figured this out. So I think it's > a disconnect, not a difference of opinion. I guess I equate the use of "identifies" in AWWW with what I called the "one big happy ontology" view. If like AWWW you think "identification" is objective - that URIs have meanings determined by well-defined nose-following rules ("self-describing web") - you are pretty much saying the same thing as that there is a globally consistent ontology. In my reading AWWW and Noah's recent finding stand up for this view. I think this is very similar to to the idea of context-free identification that you've complained about. Perhaps I'm being harsh. I guess I've been skirting around this, but the proposal to replace httpRange-14 can be sketched as follows (I haven't written this down before so it's likely to be rife with errors): RDF can be written that uses an http: URI that might constrain the interpretation of the URI (the way it is modeled), depending of course on how other URIs are interpreted and so on. This RDF may be written by the "URI owner" or by someone else. If your RDF suggests that the URI is to name something that is "on the web" or "can be put on the web", then you are asked to make your best effort to make operational HTTP behavior agree with what your RDF says, in a common sense way (i.e. if the RDF says it's stateless, don't vary the HTTP representations over time; if it says the author is Millard Fillmore, make sure the HTTP representations are by him; etc.). Of course if you don't control the HTTP behavior of the URI, getting agreement may be difficult, so ordinarily RDF telling you how it's to be interpreted (in RDF contexts) should come from the "URI owner". If the resource doesn't have a corresponding HTTP resource that implements it (and in particular if such implementation doesn't make sense, e.g. if the resource is a dog), then refrain from issuing 2xx responses. "Implementation" will be difficult to define rigorously, but see "best effort" above. If you encounter an http: URI in some RDF, then the only principled thing to do is to look for some credible (or useful) RDF or other intelligence that bears on the intended interpretation, since without a theory of the referent you haven't a clue how to model it. Doing GETs is like playing with dynamite both because of instability and content negotiation, and because there are multiple mutually inconsistent modeling techniques that may apply. The productive interpretation of the URI may depend as much on the context in which you found it as on any operational HTTP behavior. I'm not putting forth this theory on AWWSW quite yet... I'm trying to lead others to it by stages. Please don't take me as dogmatic about httpRange-14; I have to argue its side since you're arguing the opposite. The forces of current practice and installed base are quite strong and they will need to be reckoned with sooner or later; and if I'm going to abandon it I need to be persuaded by a cost/benefit analysis, because the cost of flushing it (as opposed to modifying it) at this point will be quite high. Thanks for helping me explore this territory. Best Jonathan
Received on Saturday, 14 March 2009 22:08:15 UTC