Re: What the meanging of "is" is

On Sat, Mar 14, 2009 at 12:14 PM, Larry Masinter <masinter@adobe.com> wrote:
>> The httpRange-14 rule can be reinterpreted: If you make an RDF theory
>> of HTTP or of  some 200-yielding HTTP resource,
>
> I don't understand the category of "200-yielding HTTP resource"
> in terms of the temporal behavior.

HTTP talks about some resource and says what the various methods
do with it. It's vague on what GET does but I think it's not too
much of a stretch that the intent is for a 200 response to a GET to be
either a REST-representation of the current state of the resource (if a
"network data object") or information that comes from the resource
(if a "network service"). If pushed I would say I meant "a server that
located the resource
in response to a GET responded with a 200" and that this 200
had whatever meaning was conferred by the GET method,
which I would say is that the response carries some information that came
from the resource (a representation of its state or information
provided by the service).

Thus the category is that of resources for which well-intentioned
servers might yield a 200 in response to a GET (at any point in
the resource's history).

Non-operational, of course, and perhaps vacuous, since there is
no experiment you could do to distinguish the case where the
response was coming-from some located resource
from the case where the server was doing something else (such
as simulating such behavior). But that's what makes it a model.

I was just using a shorthand that I thought you'd understand; sorry
for being inaccurate.

> There are resources which are
> identified by URIs, and there are HTTP servers that respond to
> requests. Some requests will get a 200 response and some requests
> will get some other response, or no response at all, or will fail
> to open a TCP connection or other behavior will be evidenced, and
> the behavior varies over time.  I don't understand how
> "200-yielding" is used as a category distinction.

I guess a better category distinction would be between
the sorts of "resources" that are treated in 2616 and those "resources" that
are covered by 3896 but not 2616. All I meant was that if
you get a 200, then according to HTTP the URI identifies (well, locates) a
resource of the kind that 2616 is about. If I get a 404, that says
the HTTP server doesn't locate any resource using the
request-URI (at this time).
But some other 3896 resource (perhaps one that *couldn't*
be treated by 2616 because it's not a "network resource")
might be meant if you were to
use the URI with another protocol or language.

I already agreed that this is at the non-normative non-operational
level. I think there is still something to be said even if we agree
not to talk in this way (see below).

> Some URIs
> have never produced a 200 in the past but might in the future.
> Others have consistently produced 200 responses in the past,
> but might stop any minute now.

Yes, but if you know what the resources are (how the server is
implemented), you might be able to
make some plausible predictions or credible commitments,
especially if the context is bounded somehow.
(HTTP doesn't tell you much about the resource, so you would
have to learn about it in some other way, or gamble.)

> ("duri" provides a way of at least fixing the temporal behavior
> if you want to talk about the 'expected operational behavior'
> at a point in time. My view is that all web servers had a
> time of initial operation, and have (or will have) a time when
> they cease to function, and so temporal variance of operational
> behavior is a serious problem for assertions made about
> time-independent concepts.)

I agree that http: URIs, in the HTTP protocol, identify
things that are physically contingent, not entities determined
by any ideal or principle (such as a document having some checksum,
or a series of versions of a document, or even a web service
possessing any regular properties). But I don't think time dependence
is the killer here, I think it's physical grounding. Non-operational
time-dependent "concepts" are useful too.

>>  please try to put
>> the RDF referent of the URI in classes that are fairly closely tied to the way
>> HTTP is used - e.g. documents, web pages, REST resources, web services,
>> and so on.
>
> What does it mean to "put the RDF reference of the URI in classes"?

To put something in a class = assert that the thing rdf:type the class, or
to make some other assertion that entails such a type assertion. You
might say, in RDF, that the URI refers to operational behavior, or to a REST
resource, or to a TimBL "generic resource", or to a FRBR "expression",
or an IAO "information content entity". Although mutually inconsistent,
I'd say they all can be related to the way HTTP is used (or more generically,
to what we do with the web). That is, we put documents (or more
elaborate entities such as conneg representation sets, version sequences,
services, etc.) on the web.
When we want to talk about the documents, it is expedient to use
http: URIs to mean the documents, not what happens to be on the web.
If the two get out of sync - well that's some kind of bug, but for HTTP
we get what's on the web, while in RDF we continue to talk about the
document. (To make this work we have to have information
adequate for identifying the document, such as statements about
checksums, write dates, mirror locations, and so on. Something
more elaborate for time-varying documents or idealized services.)
No avoiding this other than to "break the web" by using duri: or lsid:.

> If you have an assertion expressed in RDF using a URI, how do you
> "put the RDF reference" in a class? There's a model in which an
> assertion written in terms of RDF and URIs is mapped to an assertion
> about the real world. Is your request ("please try") in the context
> of "when constructing the model"?

No, it's "when specifying the theory". The advice can be
elaborated as recommended practice about
writing RDF (and therefore about theory / ontology
design). I see no particular problem with this - it's like a style
guide. The "please try" is in the context of writing RDF and prose
that might guide others in the formation of their models.

I guess I'm not saying this very well.

> I think I understand what you're asking if it's in terms
> of constructing a model, but I’m less sure what's involved
> in constructing a model. I like the idea that a "model" is
> (abstractly) the way in which you map the identifiers in a
> language to concepts or entities in the real world, but I
> can't imagine how one would "write down" a model, since the
> only languages we have for writing anything down require
> a model themselves to understand them.

I take model-building as a story about what people
do with axiom sets (or theories) and the non-logical paraphernalia
that accompanies them, such as the strings comprising URIs,
labels, comments, and so on. What people do is to locate or
create mental, written, and physical things that behave
consistently with the theory. Talking about such "modeling"
is of course just another bunch of talk, but it's a way of
talking that has been successful for 70 years or so. The idea
is that modeling is essentially private,
while logic is public and has agreed-on ground rules (like
baseball). The paraphernalia is a gray area (that I've
sometimes argued with Pat Hayes about). If someone
decides to ignore the documentation you write about
what you intend for models of your theory to look like, in some sense
that's ok (if sad) because only the logic is "normative".
In Pat's view, if the logic doesn't almost force you to
create either something close to the intended models or no model, then the
axioms are inadequate and you need to make them more
detailed.
>
>>  Make "identification" under HTTP as close as you can to
>> "identification" in a model of your RDF theory.
>
> HTTP doesn't have a theory of "identification" -- at least
> "identification" was not a topic of the HTTP working group.

It has a theory of "location" and I thought it was clear that
the resource that the server locates given the request-URI
would be the one that was "identified" according to 3986.
Can you give me an example of a situation in which they're
different?

I don't mean "as close as you can" to be taken too strongly -
it was probably a misstatement.
One could guarantee closeness by saying that http: URIs have
to mean in RDF what they locate in HTTP. I don't think that's
necessary or desirable. But that doesn't mean there should
be no relation.

> httpRange might be proposing a model of "identification", and
> then proposing that RDF model makers use it? Which I suppose
> is useful to those thinking about using RDF and making RDF
> models?

Yes. I think the intent was for identification to be the same
everywhere, as in AWWW, but for httpRange-14 the relevant
(but unstated) context is RDF.

>> If we could relate HTTP to RDF, this could set an example for relating
>> other protocol pairs, and if the approach became methodical, we might
>> be able to make suggestions about some general theory of the
>> (recommended) meanings of URIs - identifying equivalences, types, and
>> relations in one domain and showing how they correspond to
>> equivalences, types, and relations in another.
>
>> Personally I think HTTP/RDF is the only case worth pursuing, and
>> it should be useful to limit the hunt to this one quarry.
>
> I don't understand your optimism that any possible relationship
> between the HTTP protocol and the RDF representation system
> would be useful enough in general to be worth pursuing.

We're talking about an idea that goes back to Tim's writings
in 1989, and the W3C metadata activity that spawned RDF.
Originally RDF realized Tim's idea of having descriptions of
things on the web (resources). Things on the web were
identified by their URLs, and you would say things about them
like their author, publication date, and so on. The "semantic web"
idea was a retrofit.

The idea has problems, as we both well know.
There is certainly a difference between the things
that RDF writers are naming using HTTP URIs, and the
resources located (according to the HTTP protocol) using
those URIs.

Sure there is no normative way to link these two things,
so a priori there is no connection between the way a URI
is used in the two situations. But I think that current
practice is mostly consistent with some kind of alignment,
most of the time. Just because the notion of metadata
for something named by an http: URI is not and cannot
be rigorous doesn't mean we should embrace either a
total free-for-all or a complete separation of namespaces.

If HTTP gives us wildly varying results over time, or
always 404s, then that
URI probably won't be used in RDF to name a stable document.
Contrariwise, httpRange-14 says that if the HTTP behavior
might lead one to think that the HTTP resource is an
implementation of something that has been put on the
web, then don't use the URI in RDF to name something
that can't be implemented on the web.

(The gap in my story is a good definition of "implemented".
Work in progress.)

Obviously the web and web servers are imperfect and no
matter how hard we try we will fail to infallibly put something like
Moby Dick or the US Constitution "on the web" at some
URI. I.e. the correspondence between HTTP and RDF
can never be perfect (when we use RDF to talk about something
that is not *by definition* on the web, and Moby Dick is not). But
the key to uniting the web and the semweb is to exploit the
correspondence, when it works, and this in turn depends
on denying the correspondence, when it is known not to
be there.

A more principled design is to require assertions, in RDF, that a
resource (e.g. Moby Dick), or its current state, is on the
web at some http: URI, or to give equivalent resolution
information.

A very similar discussion is taking place on public-awwsw, by
the way. See e.g. the thread that contains this message
http://w3.org/mid/9D77C204-5E7E-4DFA-9447-01FA243E7C5E@creativecommons.org
But there, I'm taking the tack of laying out all available "models"
of so-called "information resources" in an attempt to show that
they are mutually inconsistent. I think there is a middle ground
here, between the squishy AWWW position (including httpRange-14)
and your hardball doesnt-work-so-give-up-already. I'm being
pushed toward it from both sides, which is good. If I end up with
no ground to stand on, then I will retreat to some more
easily explained position.

> I don't think there's any generic model of http URIs that
> can serve to give RDF the representational power it needs to
> make assertions about anything other than web servers and
> how they behavior.

I agree with this. http: URIs are used non-generically in RDF.
That's a flaw in the way the httpRange-14 rule is formulated. That
doesn't mean the idea is necessarily a bad one. I think there ought
to be a way to distinguish page-and-document things (network
resources and the things they are meant to model, implement,
or be modeled by) from other things, and say you are asked
not to use an http: URI in RDF if there is a corresponding network
resource that is not a model/implementation/modeling target of it.
There also ought to be a reason for this, so that you'll be
motivated to follow such a rule; but making the reason concise
and convincing has been difficult (notwithstanding a certain
individual's strong assertions that this makes a huge difference).

> ("tdb" and HTTP URIs might be used with such a model to
> allow assertions about other things, of course, at the cost
> of the extra syntax necessary.)
>
>> Whether anyone else will like this, I'm not sure - certainly the semantic web
>> view is different from the above, since it thinks the world is just
>> one big happy
>> ontology. I don't buy that. (RDF != semantic web)
>
> I don't think the world is "just one big happy ontology", but
> I'm not sure I've seen anyone stand up directly for that point
> of view -- when pushed on it, most seem to think that someone
> else made that assumption when RDF was designed, and they're
> happy that someone else has figured this out. So I think it's
> a disconnect, not a difference of opinion.

I guess I equate the use of "identifies" in AWWW with what I
called the "one big happy ontology" view. If like AWWW you
think "identification" is objective - that URIs have
meanings determined by well-defined
nose-following rules ("self-describing web") - you are
pretty much saying the same thing as that there is a globally
consistent ontology. In my reading AWWW and Noah's recent
finding stand up for this view. I think this is very similar to
to the idea of context-free identification that you've complained
about.

Perhaps I'm being harsh.

I guess I've been skirting around this, but the proposal to
replace httpRange-14 can
be sketched as follows (I haven't written this down before
so it's likely to be rife with errors):
RDF can be written that uses an http: URI that might
constrain the interpretation of the URI (the way it is modeled),
depending of course on how other URIs are interpreted and
so on. This RDF may be written by the "URI owner" or by
someone else. If your RDF suggests that
the URI is to name something that is "on the web" or "can be
put on the web", then you are asked to make your best effort
to make operational HTTP behavior agree with what your RDF
says, in a common sense way (i.e. if the RDF says it's stateless,
don't vary the HTTP representations over time;
if it says the author is Millard Fillmore, make sure the
HTTP representations are by him; etc.). Of course if you
don't control the HTTP behavior of the URI, getting
agreement may be difficult, so ordinarily RDF telling you
how it's to be interpreted (in RDF contexts) should
come from the "URI owner".

If the resource doesn't have a corresponding HTTP resource
that implements it (and in particular if such implementation
doesn't make sense, e.g. if the resource is a dog),
then refrain from issuing 2xx responses. "Implementation"
will be difficult to define rigorously, but see "best effort"
above.

If you encounter an http: URI in some RDF, then the
only principled thing to do is to look for some credible
(or useful) RDF or other intelligence that bears on the intended
interpretation,
since without a theory of the referent you haven't a
clue how to model it. Doing GETs is like playing with dynamite
both because of instability and content negotiation, and
because there are multiple mutually inconsistent modeling
techniques that may apply.
The productive interpretation of the URI may
depend as much on the context in which you found it
as on any operational HTTP behavior.

I'm not putting forth this theory on AWWSW quite yet...
I'm trying to lead others to it by stages.

Please don't take me as dogmatic about httpRange-14;
I have to argue its side since you're arguing the opposite.
The forces of current practice and installed base are
quite strong and they will need to be reckoned with sooner
or later; and if I'm going to abandon it I need to be
persuaded by a cost/benefit analysis, because the cost
of flushing it (as opposed to modifying it) at this point
will be quite high.

Thanks for helping me explore this territory.

Best
Jonathan

Received on Saturday, 14 March 2009 22:08:15 UTC