Re: Towards resolution of httpRange-14

Patrick.

Thank you for your thoughtful comments.  I'll copy this to a public 
archive
so that we can either refer to it if necessary.

You propose an alternative architecture, and you provide a critique of 
the
one I propose.  Both in general make sense largely.  A pragmatic 
stumbling block
is that you use the word "identify" for a different relation than the 
one I
use it for, so your comments and mine seem completely at odds.
This sort of problem is a key to so many discussions in this sort of 
area,
so I'll do my best to do what my old physics prof would have done and 
work
with your terms.  I'll prefix your use with p: in the now XML-honored 
fashion,
and mine with t:. I have taken the liberty of putting in what you I 
think
meant in your message, generally of course p:.

Bear in mind, then, that in my architecture "t:identify"
relates the URI to an Information Resource, which
is an abstract concept you don't have in your architecture,
a thing having meaning or abstract information content.
Information Resources is a term I chose to define.

(An Information Resources is connected with though not the same as the 
time-varying set of representations one might expect to get for a given 
URI.
Sometimes but not always
     ?u  t:identifies  ?ir.
     ?u  p:identifies  ?s.
     ?ir dc:subject    ?s.
)

On Mar 9, 2005, at 10:50, Patrick Stickler wrote:

>
> Tim,
>
> I wanted to offer a few comments regarding your summary of alternative
> approaches outlined in http://www.w3.org/DesignIssues/HTTP-URI which
> may (hopefully) show how the more generalized any-kind-of-resource
> view on this issue may in fact be more reasonable and coherent
> than you perhaps now see it.
>
> I don't attempt (nor would I expect you'd want me) to address your
> document point by point. I merely touch on a few key points that I
> see as pivotal to this discussion.
>
> If these comments are not helpful, feel free to disregard them. Also
> feel free to forward/share them however you like. I'm sharing them
> specifically with you, but have no problem with them being made public.
>
> --
>
> 1. In section 1 you state "[if] we look purely at HTTP URIs, they 
> define
> a web of information objects".
>
> Depending on how you see that web being structured/organized, I think
> that statement is compatible with http: URIs p:identifying cars.
>

> The cars are not directly part of that "web of information objects"
> but they are within the scope of information interchange and utility
> provided by that web.
>
> Consider that the http: web is a set of information objects (streams of
> bits) which are representations of resources (any resources).

Ok, here your information object is a Representation: strictly,
a stream of bits plus an Internet Content Type.
I'll use a capital R for the class, whose definition which I think we 
agree about.
(This is not InformationResource)


> A given
> representation may include a link which is expressed by referring to
> a resource, using the URI p:identifying that resource. That link 
> encoded
> in the content of the representation relates both (a) the resource
> represented by that representation with the resource referred to in
> the link, and also (b) the representation containing the link to 
> any/all
> representations of the resource referred to in the link.

So in N3,
{	
	 	?s1 representation ?r1.
		?s2 representation ?r2.
         ?r1  linksTo  ?r2.
} => {
	    ?s1  relatedTo ?s2
}


> Thus, the web consists of Representations which are linked together
> due to the functionality of the HTTP protocol, such that, from one
> representation, on may "traverse" to another representation. Browsing
> the web is simply jumping from representation to representation. Yet
> the web machinery does not really care what those URIs acutally
> p:identify.

True.
(However, it does care what the URIs t:identify)

> The REST architecture essentially can be distilled into:
>    - you have a URI, it p:identifies "something" (and REST doesn't 
> care)
>    - you dereference the URI to get a p:representation of that 
> "something"

I've added a p: on this relation, as in my architecture, the 
relationship
is not between the IR and the Representation, not between the something 
and the
Representation.

>    - the representation of that "somthing" may refer to other 
> "somethings"
>    - one may utilize the links and the dereferencing process to move
>      from p:representation of "something" to a p:representation of 
> "someotherthing"
>    - at no time does REST care, nor is it relevant to REST 
> functionality
>      what "something" a given URI p:identifies
>    - users benefit when there is consistency in the nature of 
> representations
>      and their link-defined (indirect) relationships

I think here you are saying that one requires some consistency in the
mapping of a URI to a Representation.
If you are saying something else, it is still true!
If the mapping of URIs to Representation is randomized, then the
web loses any usefulness.
However, the set of representations which one would expect to get from 
a URI
can be big (and can change with time).
My point is that the fundamental point on which the web depends is that
the information content (in a Shannon sense) is the more or less same,
or if not constant is a function which is clear to bother readers and
writers (like the current front page of the Times).
Here is where the InformationResource comes in.
If you don't model that in the architecture, then you can't require or
talk about the consistency which actually makes the web work.


> Thus, if a URI p:identifies a car, and one dereferences that URI and 
> gets
> a representation of the car, and that representation has a link which
> refers to the owner of the car, and one dereferences that URI and
> gets a representation of the owner of the car, etc. etc. the fact
> that the URIs p:identify the car and the owner (person) of the car 
> makes
> no difference to the user experience.

Yes.  That is why p:identifies is not part of the Web architecture.
 From the point of view of the Web as an information space, p:identifies
is not relevant.

> In each case, a user is moving
> from representation (stream of bits) to representation, and the utility
> of those representations is not (for the web) really tied to what those
> URIs actually p:identify.

Exactly.

> Taking this view, neither cars nor poems are part of that web of
> information objects, yet both are "on the web" because they can be
> effectively described and related by information objects on the web.

For your definition of "on the web", not mine.

> --
>
> 2. In section 2.1.2 you conclude (apparently) that if a URI 
> p:identifies
> a car, then you have no URI to p:identify and refer to the web page.
>
> I fail, though, to see how you can come to that conclusion. I see no 
> reason
> why one cannot have clear, unambiguous, and distinct URIs for both the 
> car
> and the web page about the car, as well as for each representation, and
> use those distinct URIs effectively.
>
> And one can use redirection to share/reuse the same representations
> amongst distinct resources. For example:
>
> http://example.com/aCar       p:identifies a particular car
> http://example.com/aCar.html  p:identifies a web page about the car
> http://example.com/aCar.jpg   p:identifies an image of the car
> http://example.com/aCar.rdf   p:identifies an RDF description of the 
> car
>
> and redirection is used such that when one dereferences the
> URI http://example.com/aCar, one is (by default) redirected to
> http://example.com/aCar.html. If one uses content negotiation
> and requests either a JPEG representation or RDF/XML
> representation, they would be redirected accordingly to
> http://examole.com/aCar.jpg or http://example.com/aCar.rdf,
> etc.

Ok, so you are saying that if one

		?u1  http:RedirectTo302 ?u2

That ?u1 and ?u2 can identify completely different levels of thing.
That could work.
I don't like it, because I actually think that when HTTP is used in name
server mode, for example, users are entitled to use the pre-direction 
URI
as a valid URI for the web page.
In current web usage, the pre- and post- redirection URIs can be 
interchanged
largely, and this idea that they p:identify totally different types of
thing seems to revise history. But let's go along with it for now.


> Thus, there is an intersection between the representations
> accessible for the car with those of the web page, image,
> and RDF description; yet no ambiguity about which URI
> p:identifies which resource, and no impact to the web
> behavior, since dereferencing any of the above URIs results
> on obtaining a suitable/reasonable representation.

Well, when you say there is no ambiguity as to which
URI p:identifies which resource, there is in the sense that an HTTP
client cannot tell using existing protocols which URIs are supposed
to p:identify cars and which are supposed to p:identify web pages.

> The representations themselves can also be p:identified by URI,
> e.g. the server could assign urn:uuid: URIs to each, such that
> one would then be able to make clear and unambiguous statements
> about the car, the web page, the image, the RDF description,
> and any actual representation (stream of bits) ever recieved when
> dereferencing the URIs p:identifying those resources; including
> the ability to talk about how representations change over time.

Ummm ... giving the representations URIs yes is possible, and
yes allows metadata to be given but no doesn't per se give you
the way to give the types of the various things p:identified.
But you could, then, with RDF, for example.

> Thus, using an http: URI to p:identify a car does not preclude being
> able to refer unambiguously to a web page about the car. Those
> are two distinct resources and as such deserve to be p:identified
> by two distinct URIs. The important point is to be clear what
> each particular URI actually p:identifies, and not be careless
> or sloppy in guessing or presuming what it p:identifies based
> simply by the nature of the representations accessible.

Here is a major problem I have with the p:identifies architecture.
You ask people not to make assumptions about what is p:identified.
However, whenever a URI is quoted and people stick it in a web browser,
and maybe bookmark it, they are using the architectural point
that there will be an expected consistency of representations
for that URI.  And the consistency they expect on the web is,
looking at it either as an engineering question, or
more philosophically, is that the information content will
be consistent.

Witness the fact that if you bookmark something (before or after 
redirection)
and once you get back an HTML page and the next time a PNG of that page
you are not upset.  But if you bookmark a picture of a car
(whose URI which you actually say p:identifies the car)
and next day you get back the parts inventory of the car
(also a valid representation of the car, but totally different 
information content)
then the we user has just cause to be upset.
So, I think t:identifies is essential to the web architecture
and p:identifies is, as you point out, irrelevant.


> (this is one of the key motivations for URIQA, to be able to
> provide an efficient and reliable means to ask what a given
> URI actually p:identifies, to allow URI owners to explicitly
> and formally publish such information in a manner that
> automated agents can utlize with no further knowledge than
> the URI itself)

Here you have had to extend the protocol to remove the ambiguity.
It is a workable design.  We could move the web to it.
I don't like it because it specifically undermines use of the
Semantic Web tools to talk about the existing web.
I wouldn't be able to use RDF to talk about the authors and
dates of web pages, unless I had already used URIQA to determine that
I was actually dealing with the URI of a web page, not a car.

In other words, your architecture world work, as a new design.
Could be an improvement. But it is not compatible with what's out there.

> --
>
> 3. In section 2.2.2 you discuss the approach of using redirection
> whereby when dereferencing a URI p:identifying e.g. a car the client
> would be (perhaps with multiple hops) redirected to another URI
> which p:identifies an information resource that would ultimately
> result in obtaining a representation. Fine. That is in fact the
> approach that much (even most) of the community uses (those who
> use http: URIs to p:identify resources such as cars).

Well, there are not a lot of people i have come across using the URIQA
architecture. I do know DC does a redirect.  I don't know a lot of
reasoning systems which use it automatically.

I do know lots of ontologies where you can get useful reusable 
information
by dereferencing the document x which defines the terms for the form 
x#y.


> A distinct redirect response for such cases, such as 343, is not
> however necessary. The present semantics of 302 is, I think, quite
> sufficient, equating to "representations for the resource p:identified
> by the request URI can be accessed via this alternate URI". No
> equivalence between the resources p:identified by the request URI
> and the redirect URI are to be presumed. All that can be inferred
> is that these two resources share some number of representations.
>
> Both DC and Nokia (and I'm sure many others) use this approach
> with great success.

I don't think the DC users actually dereference the URIs at all
in the course of automated processing.

> C.f. http://sw.nokia.com/WebArch-1/resolvesAs
>

(cwm  http://sw.nokia.com/WebArch-1/resolvesAs
doesn't parse it as it seems to return text/html when
cwm *should* be asking it for rdf/xml and/or N3.
But it could. And then cwm --closure=p 
http://www.w3.org/2000/10/swap/test/uriqa/t1
would work too maybe or something like it)


> (note also that the above URI p:identifies a property, and
> dereferencing it results in redirection to a web page
> describing that property -- i.e. a representation of the
> property, and the web page)

In fact, a p:representation of the Property, and a t:representation of 
the
web page.

Yes, I see that this could work.  I just think it is squatting on
existing WWW architecture in an inappropriate way.

> --
>
> 4. In section 2.2.2 (and elsewhere) you seem to suggest that using
> http: URIs to p:identify e.g. cars introduces an ambiguity and 
> usability
> problem for those wishing to annotate/describe/refer to web pages,
> such that they will be unnable to or unsure of how to refer
> accurately to the resource in question (your specific example
> referred to ambiguity between a web page about the Vietnam war
> vs. the Vietnam war).

Yes.

> Yet, in fact, this form of ambiguity has existed since the
> very beginning of the web, such that there is no clear way
> to determine whether a given URI p:identifies some abstract
> information resource, a particular form of expression of
> that information resource, or a specific representation of
> that resource.

However, there is a consistency for t:identifies.
Hence the superiority of the relation for describing web architecture.

> E.g. if one has the URI http://example.com/myCar
>
> which resolves to the following text/html encoded data stream
>
> [
> <html>
> <body>
> <pre>
> My car.
> It is blue.
> When I am not in it,
> I am blue too.
> </pre>
> </body>
> </html>
> ]
>
> Assuming, for the sake of discussion, your position that the
> URI http://example.com/myCar must p:identify an information
> resource and thus we can exclude it t:identifying a car,

(thank you!)  we exclude it from t:identifying a car certainly.

> how is a
> user to know if that URI t:identifies
>
> (a) a poem about a car (the abstract body of information)
> (b) a particular edition of the poem, with particular line breaks
> (c) a particular translation of the poem (e.g. in English)

Good point. However, I would point out that
these are all in a sense "a poem", just a poem specified
more or less generically.  I think generic t:identification
is really important.
http://www.w3.org/DesignIsses/Generic

For answer, well, you'll find on w3C tech reports a list of the URIs
and which they are (latest version, this version, etc).
You'll also find links in blogs and online magazine to
a persistent link for a given article rather than the time-varying 
"current" one.
You'll see little flags linking to different languages, etc.


> (d) a web page containing a poem

I don't think that the distinction here is fine.
I would be more inclined to say that the document is a poem,
and it is a web page. This web page is a poem about a car.
There is no level difference, certainly nothing worth extracting
in the architecture.

> (e) an HTML encoding of a web page containing a poem

That is *not* identified. The architecture does not have to give a URI
for everything under the hood. The HTML encoding is
an octet stream which, when paired with the content type "text/html",
forms a t:representation of the poem.  Neither the representation or
the octect stream nor the HTTP response noe the HTTP transaction are 
given
a URI in general, and certainly not that URI.


> (f) some other information resource

The web replies on it *not* being a different one.
If I see the poem and sent you the URI, I generally expect you to
see the poem.  You must be able to use the name of the thing for the 
thing.
I don't expect you to get a different poem.
I don't expect you to get a different resource (say a picture) or the 
same thing
because the server has deemed that the URI p:identifies something and
both Representations are p:representations of that thing.

> ???
>
> All of the above options are compatible with your view.

Well, no I have gone over them above.

>  Yet
> some user wishing to e.g. use Annotea or make RDF assertions
> pertaining to whatever it is they are experiencing when
> dereferencing that URI cannot be clear about what they are
> acutally talking about, even if http: URIs are constrained
> to p:identify "information resources".

When users annotate things with human language, they are not
semantic web engines.  In natural language, it its quite
normal to convert between levels implicitly. This
is not a guide for the architecture.

> If one were to make a recommendation (or warning) based on
> the user experience of dereferencing that URI, they still
> would have no way of doing so unambiguously.
>
> This is because the web/REST architecture simply doesn't
> care what the URIs actually p:identify, or need to care,
> because it works just fine serving representations via
> URIs irregardless of what those URIs p:identify.


However, it really depends on consistency in what they t:identify.

> There is no fundamental difference between the ambiguity
> "poem or web page about poem" versus "car or web page about car".

That wasn't a web page *about* a poem, it was a web page which was a 
poem.
Let us not split hairs as to what we mean by "web page".
But note that giving a poem in a different font or different character 
encoding
is a whole lot different from the 'about" relationship between a subject
of a document and the document.


> The web architecture simply does not provide the machinery
> to be clear about what URIs p:identify. That's why we need
> the semantic web (and IMO why we need solutions such
> as URIQA).

Certainly to use p:identify, you have a good argument for needing 
something extra, perhaps URIQA.

But the web architecture already requires a concept of t:identify.
I argue that you can't mess with that.

You don't *have* to mess wit hit is you use the time-honored way
of identifying them by the document in which they are defined.
Like "US citizen for the purposes of article 1234 of the Act".
It maybe clumsy for a few pathological cases like wordnet,
and we may have foaf: and dc: which we would have to transition.

So maybe we need some sort of compromise.
A new HTTP redirect could separate the distinction between
a document and its subject from that between a generic document and a 
specific one.
DC and FOAF could be fitted out with that.
It could maybe be made into something more general along the lines
of "I can't just give you a t:representation of that, but here is
something which tells you about it and how to access it".
For example "that is a huge document -- suggest you query it with 
sparql"
or "that is an abstract concept, definitive ontology is in this file".

> --
>
> 4. You have often stated, in various forms and at various
> times, as you do in this document, "that wasn't the model I
> had when URIs were invented and HTTP was written".
>
> OK. Fair enough. We should certainly give strong value to
> original design considerations and be very hesitant to
> question and/or diverge from them.
>
> Yet, is it not reasonable to consider that perhaps a very large
> and diverse segment of the web community have all noticed and
> beneficially exploited a fairly intuitive generalization of
> the original design and usage of http: URI in order to substantially
> broaden the scope and coverage of the web? and have done so
> in a way that maximizes the benefit of a globally deployed
> and proven infrastructure without negatively impacting
> (or substantially impacting at all) the user experience
> traditionally associated with the web?

While I think many people have done the normal human thing and
used rather interchangeably in language documents and the things
they describe and their URIs - this is normal - I think you are
misleading if you mean to suggest that many people apart from you
are building semantic web things which work using the URUQA 
architecture.
And I don't think you address the expectation among web users and 
application
designers for the architectural constraint of consistency
of information content as a function of URI.

In a way this expectation is so obvious that it goes without saying.

> With rare exceptions, I think it is fair to say that those
> using http: URIs to p:identify cars, properties, people, etc.
> are not acting carelessly or with disregard to the tradition,
> history, and standards-grounded definition of the web. Most
> of these folks have thought long and hard about why they
> chose the approach they did -- many suffering (and continuing
> to suffer) angst over the potential, real, or percieved
> conflict with your original conception. Yet the benefits
> are sufficiently great to motivate increasingly wide
> adoption of this more generalized view.

I grant you that from the semantic web point of view it
is much nicer to just use the HTTP space wiht gay abandon.
And clearly a possibility would be to make a new
HTTP-like space (a bit like Larry's tdb: space, That Defined By,
or maybe a completely separate protocol)
which has the sort of properties you describe.
Documents in fact could then be just be concepts in the web,
and asking about them would return one or more representatations
just expressed in RDF instead of HTTP. The whole protocol on
the wire could in fact be RDF.

Given the fact that daap: servers and so on sprout across the net
an great speed, maybe that would not be such  a stupid idea
for a semantic web space first and foremost.
web2:2005/com/nokia/WebArch1.1/resolvesAs

Let me tell you a general basis for a concern I have with redefining
the shared expectations of HTTP.
An architecture allows growth by making constraints.
http: (IMNSHO) is constrained to be a space of documents.
mailto: is constrained to be a space of message destinations.
These constraints give the architecture its form.
They define http as a simple service which can be migrated to
a different implementation (say a nifty peer-peer version)
with time.  Because the features delivered are constrained.
Similarly with mailto: -- one could replace all of SMTP
bit by bit while keeping the URIs because the service
is just message delivery. No lookup.

    The unyielding medium is not only endured,
     it's that upon which Art depends:
    For who can perform on a tightrope secured
     at only one of its ends?" -- Piet Hein


> Perhaps there's a gem of an idea buried under all this debate
> which offers enough benefit to enough of the web community
> to justify embracing this evolution of the original design.

Certainly let's evolve it.  But let's not break it.

> Just a thought...

Thanks.

> Warmest regards,
>

Likewise,

Tim

> Patrick
>
>
> --
>
> Patrick Stickler
> Senior Architect
> Forum Nokia Online
> Tampere, Finland
> patrick.stickler@nokia.com

Received on Thursday, 10 March 2005 21:40:06 UTC