Re: Towards resolution of httpRange-14 from Patrick Stickler on 2005-03-10 (www-archive@w3.org from March 2005)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Thu, 10 Mar 2005 13:48:52 -0500
To: "ext Tim Berners-Lee" <timbl@w3.org>
Cc: public-www-archive@w3.org
Message-Id: <0e10307421105ab8ed9274d64954da82@nokia.com>
On Mar 9, 2005, at 17:16, ext Tim Berners-Lee wrote:

> Patrick.
>
> Thank you for your thoughtful comments.  I'll copy this to a public 
> archive
> so that we can either refer to it if necessary.

OK.

>
> You propose an alternative architecture, and you provide a critique of 
> the
> one I propose.  Both in general make sense largely.  A pragmatic 
> stumbling block
> is that you use the word "identify" for a different relation than the 
> one I
> use it for, so your comments and mine seem completely at odds.

I'm not sure I do. In the FAQ section of your summary, you
state that "identify" is being used in the sense of "denotes"
per the RDF MT. That is also how I use "identify". So perhaps
either I have misunderstood your usage of "identify" or
you have misunderstood mine (or we are not agreed on the
meaning of "denotes" ;-)

> This sort of problem is a key to so many discussions in this sort of 
> area,
> so I'll do my best to do what my old physics prof would have done and 
> work
> with your terms.  I'll prefix your use with p: in the now XML-honored 
> fashion,
> and mine with t:. I have taken the liberty of putting in what you I 
> think
> meant in your message, generally of course p:.

OK. Let's try that.

Though it may very well be that

   p:identifies owl:sameAs t:identifies .


>
> Bear in mind, then, that in my architecture "t:identify"
> relates the URI to an Information Resource, which
> is an abstract concept you don't have in your architecture,
> a thing having meaning or abstract information content.
> Information Resources is a term I chose to define.

The architecture I propose does not *exclude* information
resources. It simply does not rely on that concept. I think
it's good to clarify that point.

It also occurs to me that where our use of the term "identify"
differs is simply in its permitted range. I.e.

   p:identifies rdfs:range rdf:Resource .
   t:identifies rdfs:range t:InformationResource .

Yet in both cases, we are talking about denotation per the
RDF MT.

???

>
> (An Information Resources is connected with though not the same as the 
> time-varying set of representations one might expect to get for a 
> given URI.
> Sometimes but not always
>     ?u  t:identifies  ?ir.
>     ?u  p:identifies  ?s.
>     ?ir dc:subject    ?s.
> )

The above is compatible with my architecture, but reflects
a particular use case. I.e. the following is fully acceptable
to me:

    ?u  p:identifies ?ir .
    ?y  p:identifies ?s .
    ?ir dc:subject   ?s .

What counts is that ?u and ?y are identified distinctly,
and that you don't introduce ambiguity.

I.e. in your example above, the real problem (per my
architecture) is that you are using the same URI ?u
to identify two different things ?ir and ?s (presuming
that they are in fact different things). But I appreciate
what the example is trying to illustrate, and it's
a useful example regardless).

>
> On Mar 9, 2005, at 10:50, Patrick Stickler wrote:
>
>>
>> Tim,
>>
>> I wanted to offer a few comments regarding your summary of alternative
>> approaches outlined in http://www.w3.org/DesignIssues/HTTP-URI which
>> may (hopefully) show how the more generalized any-kind-of-resource
>> view on this issue may in fact be more reasonable and coherent
>> than you perhaps now see it.
>>
>> I don't attempt (nor would I expect you'd want me) to address your
>> document point by point. I merely touch on a few key points that I
>> see as pivotal to this discussion.
>>
>> If these comments are not helpful, feel free to disregard them. Also
>> feel free to forward/share them however you like. I'm sharing them
>> specifically with you, but have no problem with them being made 
>> public.
>>
>> --
>>
>> 1. In section 1 you state "[if] we look purely at HTTP URIs, they 
>> define
>> a web of information objects".
>>
>> Depending on how you see that web being structured/organized, I think
>> that statement is compatible with http: URIs p:identifying cars.
>>
>
>> The cars are not directly part of that "web of information objects"
>> but they are within the scope of information interchange and utility
>> provided by that web.
>>
>> Consider that the http: web is a set of information objects (streams 
>> of
>> bits) which are representations of resources (any resources).
>
> Ok, here your information object is a Representation: strictly,
> a stream of bits plus an Internet Content Type.
> I'll use a capital R for the class, whose definition which I think we 
> agree about.
> (This is not InformationResource)

I take that to mean that we agree about what a Representation is
(a stream of bits).

(I see the Internet Content Type to be information about that
stream of bits, not an inherent part of the Representation itself,
since, after all, one stream of bytes can be interpreted validly
as corresponding to many different Internet Content Types)

As an aside, I also find it useful to explicitly view Representations
as the atomic components of the web, such that the representation
of a Representation is a bit-equal copy of itself.

>
>
>> A given
>> representation may include a link which is expressed by referring to
>> a resource, using the URI p:identifying that resource. That link 
>> encoded
>> in the content of the representation relates both (a) the resource
>> represented by that representation with the resource referred to in
>> the link, and also (b) the representation containing the link to 
>> any/all
>> representations of the resource referred to in the link.
>
> So in N3,
> {	
> 	 	?s1 representation ?r1.
> 		?s2 representation ?r2.
>         ?r1  linksTo  ?r2.
> } => {
> 	    ?s1  relatedTo ?s2
> }

Not exactly. I see it as

{	
    ?s1 representation ?r1.
    ?s2 representation ?r2.
    ?s2 representation ?r3.
    ?s2 representation ?r4.
    ?r1 refersTo ?s2.
} => {
    ?s1 relatedTo ?s2 .
    ?r1 linksTo ?r2 .
    ?r1 linksTo ?r3 .
    ?r1 linksTo ?r4 .
}

such that the link expressed in ?r1 is simply referring to ?s2,
and the hypertext link relationships between ?r1 and each of ?r2,
?r3, and ?r4 derive from the fact that ?r2, ?r3, and ?r4 are
representations of the resource ?s2 referred to in ?r1.

The realization/traversal of those links can depend on a number
of variables relating to client/server interaction governed by
HTTP, conneg, access control, preferences, location, etc.

Thus those links are potential links, not garunteed links, yet
they can be inferred nonetheless, and given the right context,
they all can be traversed.

>
>
>> Thus, the web consists of Representations which are linked together
>> due to the functionality of the HTTP protocol, such that, from one
>> representation, on may "traverse" to another representation. Browsing
>> the web is simply jumping from representation to representation. Yet
>> the web machinery does not really care what those URIs acutally
>> p:identify.
>
> True.
> (However, it does care what the URIs t:identify)

Why. I think that the web machinery only cares that one may interact
with representations given a particular URI, yet that interaction
is agnostic as to what the URI identifies (denotes).

How do you see the web architecture (i.e. the protocols/models, not
human expectations) caring about what URIs identify (either
p:identify, t:identify, or denote per the RDF MT)?

>
>> The REST architecture essentially can be distilled into:
>>    - you have a URI, it p:identifies "something" (and REST doesn't 
>> care)
>>    - you dereference the URI to get a p:representation of that 
>> "something"
>
> I've added a p: on this relation, as in my architecture, the 
> relationship
> is not between the IR and the Representation, not between the 
> something and the
> Representation.

Well, again, I don't see that the meaning of p:identifies and 
t:identifies
is fundamentally different -- only its range of usage.

Thus, if in this case, the "something" is an Information Resource, then
p:identifies is functionally equivalent to t:identifies.

But if that "something" were e.g. the planet Neptune, then p:identifies
would be OK to use, but t:identifies would have a range conflict.

Right?

>
>>    - the representation of that "somthing" may refer to other 
>> "somethings"
>>    - one may utilize the links and the dereferencing process to move
>>      from p:representation of "something" to a p:representation of 
>> "someotherthing"
>>    - at no time does REST care, nor is it relevant to REST 
>> functionality
>>      what "something" a given URI p:identifies
>>    - users benefit when there is consistency in the nature of 
>> representations
>>      and their link-defined (indirect) relationships
>
> I think here you are saying that one requires some consistency in the
> mapping of a URI to a Representation.

Absolutely. But not as a characteristic of the architecture. Consistency
is simply a usabilty issue. My architecture (even yours I think) doesn't
depend on consistency. You can have random representations served for
each request. But true utility and benefit are proportional to the 
degree
of consistency in resolving URIs to representations in the same context.

Do you agree with that statement?

> If you are saying something else, it is still true!
> If the mapping of URIs to Representation is randomized, then the
> web loses any usefulness.

Right. But the web doesn't "break", from an architectural point of view.

> However, the set of representations which one would expect to get from 
> a URI
> can be big (and can change with time).

Agreed.

> My point is that the fundamental point on which the web depends is that
> the information content (in a Shannon sense) is the more or less same,
> or if not constant is a function which is clear to bother readers and
> writers (like the current front page of the Times).

I agree, and see this as true for both architectures.

> Here is where the InformationResource comes in.
> If you don't model that in the architecture, then you can't require or
> talk about the consistency which actually makes the web work.

There I disagree. I think we can just as effectively talk about
consistency of representation, because the consistency has to
do with a consistency of user experience -- and there is a well
established and mature field of study addressing exactly such
usability issues, which does not succeed or fail on the existence
of some concept of "Information Resource".

I assert that it is straightforward to discuss consistency of
user experience as it pertains to the resolution of URIs to
representations without having to refer to a concept such as
"Information Resource", much less constrain the range of http:
URIs to such a class of resources.

That said, I'm also not opposed to defining a class of "Information
Resources", if folks find such a concept useful.

In fact, statements about consistency of user experience on the
web can honor both the generalized, unrestricted range of http:
URIs and the identification of Information Resources such that
it can be a useful and valid best practice such that:

    When an http: URI is used to identify a resource which is not
    an Information Resource, resolution of that URI should employ
    a redirection to another URI which identifies an Information Resource
    via which representations of the original resource can be accessed.

or some such wording... but hopefully you get the point.

>
>
>> Thus, if a URI p:identifies a car, and one dereferences that URI and 
>> gets
>> a representation of the car, and that representation has a link which
>> refers to the owner of the car, and one dereferences that URI and
>> gets a representation of the owner of the car, etc. etc. the fact
>> that the URIs p:identify the car and the owner (person) of the car 
>> makes
>> no difference to the user experience.
>
> Yes.  That is why p:identifies is not part of the Web architecture.

Well, hey, that is the crux of this debate, right?

p:identifies *is* a part of the architecture that I, and a wide range
of others, use every day, and with great success.

> From the point of view of the Web as an information space, p:identifies
> is not relevant.

But neither is t:identifies IMO.

The web architecture as an information space simply uses URIs
as access keys irregardless of any higher level semantics we wish
to attribute to those URIs.

*Humans* traditionally presume that those URIs identify something
(and often get confused or guess wrong about what the owner thinks)
but the web machinery itself doesn't care one way or another.

The rub comes only now when we want to add the semantic web layer
atop the web layer and see that there is a gap, and there are two
(primary) views about how to bridge that gap.

Both views appear to be coherent and self-consistent. But one view
would invalidate a broad range of successfully deployed and popular
applications.

I honestly see this httpRange-14 debate boiling down to which view
would be the least disruptive to adopt -- and I would hope you
would at least appreciate that the generalized, non-constrained view
results in the least pain insofar as re-tooling and rework of
solutions and methodology is concerned.

>
>> In each case, a user is moving
>> from representation (stream of bits) to representation, and the 
>> utility
>> of those representations is not (for the web) really tied to what 
>> those
>> URIs actually p:identify.
>
> Exactly.

Lovely.

>
>> Taking this view, neither cars nor poems are part of that web of
>> information objects, yet both are "on the web" because they can be
>> effectively described and related by information objects on the web.
>
> For your definition of "on the web", not mine.

Fair enough.

>
>> --
>>
>> 2. In section 2.1.2 you conclude (apparently) that if a URI 
>> p:identifies
>> a car, then you have no URI to p:identify and refer to the web page.
>>
>> I fail, though, to see how you can come to that conclusion. I see no 
>> reason
>> why one cannot have clear, unambiguous, and distinct URIs for both 
>> the car
>> and the web page about the car, as well as for each representation, 
>> and
>> use those distinct URIs effectively.
>>
>> And one can use redirection to share/reuse the same representations
>> amongst distinct resources. For example:
>>
>> http://example.com/aCar       p:identifies a particular car
>> http://example.com/aCar.html  p:identifies a web page about the car
>> http://example.com/aCar.jpg   p:identifies an image of the car
>> http://example.com/aCar.rdf   p:identifies an RDF description of the 
>> car
>>
>> and redirection is used such that when one dereferences the
>> URI http://example.com/aCar, one is (by default) redirected to
>> http://example.com/aCar.html. If one uses content negotiation
>> and requests either a JPEG representation or RDF/XML
>> representation, they would be redirected accordingly to
>> http://examole.com/aCar.jpg or http://example.com/aCar.rdf,
>> etc.
>
> Ok, so you are saying that if one
>
> 		?u1  http:RedirectTo302 ?u2
>
> That ?u1 and ?u2 can identify completely different levels of thing.

Sure. Why not.

> That could work.

It *does* work. It's proven and used daily in numerous applications.

> I don't like it, because I actually think that when HTTP is used in 
> name
> server mode, for example, users are entitled to use the pre-direction 
> URI
> as a valid URI for the web page.

But that's not licensed by the HTTP spec. Clients, caches, etc. should
not presume that a 302 redirect is permanent, and it is clear that
all references to the original URI should remain unchanged.

(arguably, though, 303 would perhaps be better than 302, but I still
consider 302 to be just fine)

That said, it really depends on what you mean by "a valid URI".

The redirect implies a relationship between two resources, such
that they share representations -- and thus, one could "validly"
access representations of the first resource via the URI of
the second, but it would be careless, dangerous, and ill advised
to rely on the second URI as the primary means by which
representations of the resource identified by the first URI
would be regularly accessed.

Hence the note in the HTTP spec not to take the redirect URI
as permanent.

Again, see the description of

http://sw.nokia.com/WebArch-1/resolvesAs


> In current web usage, the pre- and post- redirection URIs can be 
> interchanged
> largely, and this idea that they p:identify totally different types of
> thing seems to revise history. But let's go along with it for now.
>

I'm not convinced of that. But if enough folks are, we could
use 303 as a matter of "best practice" (or, even define yet another
redirect response code).

But I see this as a minor side issue to the core of this debate,
not a flaw, per se, in the architecture I employ.

>
>> Thus, there is an intersection between the representations
>> accessible for the car with those of the web page, image,
>> and RDF description; yet no ambiguity about which URI
>> p:identifies which resource, and no impact to the web
>> behavior, since dereferencing any of the above URIs results
>> on obtaining a suitable/reasonable representation.
>
> Well, when you say there is no ambiguity as to which
> URI p:identifies which resource, there is in the sense that an HTTP
> client cannot tell using existing protocols which URIs are supposed
> to p:identify cars and which are supposed to p:identify web pages.

True, but that's true for both architectures. I could have more
correctly stated that avoidance of ambiguity is facilitated
by each distinct resource having a distinct URI such that each
can be referred to unambiguously.

(and with solutions like URIQA, clients can achieve clarity
rather than being left to guess about the meaning of URIs)

>
>> The representations themselves can also be p:identified by URI,
>> e.g. the server could assign urn:uuid: URIs to each, such that
>> one would then be able to make clear and unambiguous statements
>> about the car, the web page, the image, the RDF description,
>> and any actual representation (stream of bits) ever recieved when
>> dereferencing the URIs p:identifying those resources; including
>> the ability to talk about how representations change over time.
>
> Ummm ... giving the representations URIs yes is possible, and
> yes allows metadata to be given but no doesn't per se give you
> the way to give the types of the various things p:identified.
> But you could, then, with RDF, for example.

Right. I.e. using URIQA or something similar.

The point is that we can give all these distinct resources
distinct (http:) URIs and both describe them as well as
provide representations for them.

>
>> Thus, using an http: URI to p:identify a car does not preclude being
>> able to refer unambiguously to a web page about the car. Those
>> are two distinct resources and as such deserve to be p:identified
>> by two distinct URIs. The important point is to be clear what
>> each particular URI actually p:identifies, and not be careless
>> or sloppy in guessing or presuming what it p:identifies based
>> simply by the nature of the representations accessible.
>
> Here is a major problem I have with the p:identifies architecture.
> You ask people not to make assumptions about what is p:identified.
> However, whenever a URI is quoted and people stick it in a web browser,
> and maybe bookmark it, they are using the architectural point
> that there will be an expected consistency of representations
> for that URI.

Yes, but that is not incompatible with the p:identifies architecture.
In fact, such consistency is considered just as important.

> And the consistency they expect on the web is,
> looking at it either as an engineering question, or
> more philosophically, is that the information content will
> be consistent.

Yes, and again, I see no difference on this point between
the two architectures.

>
> Witness the fact that if you bookmark something (before or after 
> redirection)
> and once you get back an HTML page and the next time a PNG of that page
> you are not upset.  But if you bookmark a picture of a car
> (whose URI which you actually say p:identifies the car)
> and next day you get back the parts inventory of the car
> (also a valid representation of the car, but totally different 
> information content)
> then the we user has just cause to be upset.

Yes, but such inconsistent behavior is neither more likely
nor more tolerated by the p:identifies architecture.

And such inconsistent behavior is neither less likely nor
less tolerated by the t:identifies architecture.

On this point, there simply is *no difference*.

> So, I think t:identifies is essential to the web architecture
> and p:identifies is, as you point out, irrelevant.
>

Again, I don't see any difference.

The web machinery is just as indifferent to "Information Resources"
as it is to arbitrary resources insofar as what http: URIs actually
identify.

It really only matters at the semantic web layer, and whatever solution
we adopt should be as broad and encompassing as the semantic web.

I think the p:identifies architecture achieves that elegantly,
and is also a deployed and proven approach. If the web was to
crumble to dust because of the p:identifies architectural view,
it would have done so long ago, but it keeps on purring without
the least burp.

>
>> (this is one of the key motivations for URIQA, to be able to
>> provide an efficient and reliable means to ask what a given
>> URI actually p:identifies, to allow URI owners to explicitly
>> and formally publish such information in a manner that
>> automated agents can utlize with no further knowledge than
>> the URI itself)
>
> Here you have had to extend the protocol to remove the ambiguity.

Which you would have to do regardless, as the ambiguity is
just as acute with the t:identifies architecture.

> It is a workable design.  We could move the web to it.
> I don't like it because it specifically undermines use of the
> Semantic Web tools to talk about the existing web.

????

I've found that it *empowers* semantic web tools, and that
dealing with RDF/XML instances, or 3rd party repositories
(eg. SPARQL servers) which must be known, as the foundational
means to interchange and discover knowledge to fail to meet
very substantial bootstrapping issues.

Both RDF/XML instances and SPARQL are *VERY* important,
and not to be devalued in any way by the above statement.

My point is that being able to ask the web authority for
an authoritative machine-processible description of a
specific resource identified by a specific URI is just
as fundamental and essential as being able to ask the
web authority for a representation of a resource via
its identifying URI.

I see the web and semantic web as two sides of the same
coin, where that coin is the set of URIs identifying
resources and on one side, you have the traditional HTTP
machinery for interacting with representations, and on
the other hand you have URIQA for interacting with
descriptions -- and it is via those descriptions that
you identify and subsequently exploit more comprehensive
services such as SPARQL engines, etc.

But just as the web would not work if you had to first
know from where to ask for a representation given a
particular URI, likewise the semantic web will not
work (bootstrap) if you must first know where (e.g.
a given SPARQL server) to ask for a description of
the resource identified by that URI.

> I wouldn't be able to use RDF to talk about the authors and
> dates of web pages, unless I had already used URIQA to determine that
> I was actually dealing with the URI of a web page, not a car.

Exactly!

But even with your architecture, the same kinds of ambiguity
exists. How can you know whether a URI idenitifies a poem,
a translation of a poem, a web page about the poem, etc.
etc. etc.

You seem to presume a lack of ambiguity with your architecture
that is present in mine -- yet both architectures face the
very same problem.

URIQA is just as relevant, necessary, and beneficial for your
architecture!


>
> In other words, your architecture world work, as a new design.
> Could be an improvement. But it is not compatible with what's out 
> there.

On the contrary, it is exactly the fact that the p:identifies
architecture *is* compatible with "what's out there" as well as
all the new semantic web solutions being deployed that so many
folks prefer, and utilize, that architecture.

Insofar as compatiblity with "what's out there", I think it's
a very fair comment to note that the t:identifies architecture
is in fact *not* compatible with numerous deployed solutions
that are "out there" (DC, RSS, XMP, Creative Commons, ...)

If you really care about being compatible with "what's out there"...

>
>> --
>>
>> 3. In section 2.2.2 you discuss the approach of using redirection
>> whereby when dereferencing a URI p:identifying e.g. a car the client
>> would be (perhaps with multiple hops) redirected to another URI
>> which p:identifies an information resource that would ultimately
>> result in obtaining a representation. Fine. That is in fact the
>> approach that much (even most) of the community uses (those who
>> use http: URIs to p:identify resources such as cars).
>
> Well, there are not a lot of people i have come across using the URIQA
> architecture. I do know DC does a redirect.  I don't know a lot of
> reasoning systems which use it automatically.

The redirection approach has nothing whatsoever to do with URIQA,
no more so that Flash or WebDAV or SOAP or any other technology
would have to do with redirection.

Yes, we at Nokia redirect alot of requests to a URIQA server,
but that's a local solution (a good one, and one folks are
invited/encouraged to consider) but only one of countless
options for this redirection approach.

The key is that representations of any resource can be served
with a minimal amount of burden insofar as management of
actual representations is concerned, as multiple resources
can (via redirection) share a common set of representations.

>
> I do know lots of ontologies where you can get useful reusable 
> information
> by dereferencing the document x which defines the terms for the form 
> x#y.
>

Sigh....

I really, really don't want to go into the gory details of
all the scalability/bandwith issues that one is faced by
having to deal with secondary access via fragment identifies.

I've posted more than enough information about this to the TAG list.

Yes, you can do it. Yes, it can work in certain applications. And
yes, it is even compatible and acceptable with the p:identifies
architecture -- but it does not scale up to very large documents
and does not scale down to limited capability devices (which
have limited ability to process arbitrary MIME types to properly
interpret fragement identifiers according to the MIME type).

In short, if mobile applications are limited to secondary
access to representations of non-information-resources, it
will *substantially* reduce the utility of the web infrastructure
for mobile applications -- possibly to the point of Nokia
looking at other alternatives for mobile information interchange
than directly via the web.

It's that serious a problem. If you have not reached clarity
about these scalability/bandwith issues, I encourage you to
revisit the TAG mailing list archives and re-read my posts
on this topic.


>
>> A distinct redirect response for such cases, such as 343, is not
>> however necessary. The present semantics of 302 is, I think, quite
>> sufficient, equating to "representations for the resource p:identified
>> by the request URI can be accessed via this alternate URI". No
>> equivalence between the resources p:identified by the request URI
>> and the redirect URI are to be presumed. All that can be inferred
>> is that these two resources share some number of representations.
>>
>> Both DC and Nokia (and I'm sure many others) use this approach
>> with great success.
>
> I don't think the DC users actually dereference the URIs at all
> in the course of automated processing.

Whether they do or not is irrelevant.

The fact is that one can dereference a DC property URI
and access representations of that resource -- without
recourse to *anything* but the core web machinery.

That is *hugely* powerful and useful functionality.

You are going to be very hard pressed to convince folks who
appreciate just how powerful and useful that is to give it up.

>
>> C.f. http://sw.nokia.com/WebArch-1/resolvesAs
>>
>
> (cwm  http://sw.nokia.com/WebArch-1/resolvesAs
> doesn't parse it as it seems to return text/html when
> cwm *should* be asking it for rdf/xml and/or N3.
> But it could. And then cwm --closure=p 
> http://www.w3.org/2000/10/swap/test/uriqa/t1
> would work too maybe or something like it)

Right. The default representation is for humans, and
is presented as HTML.

And you could, ahem, also just have cwm ask

MGET /WebARch-1/resolvesAs HTTP/1.1
Host: sw.nokia.com

e.g.

curl -X MGET "http://sw.nokia.com/WebArch-1/resolvesAs"

;-)


>
>
>> (note also that the above URI p:identifies a property, and
>> dereferencing it results in redirection to a web page
>> describing that property -- i.e. a representation of the
>> property, and the web page)
>
> In fact, a p:representation of the Property, and a t:representation of 
> the
> web page.

I don't see the difference. Unless you are also positing that
only Information Resources can have representations. I.e.

    p:representation rdfs:domain rdf:Resource .
    t:representation rdfs:domain t:InformationResource .

???

>
> Yes, I see that this could work.  I just think it is squatting on
> existing WWW architecture in an inappropriate way.

Well, I guess the core of this debate is about convincing folks
of that point.

I see it as correctly and effectively utilizing
the existing web architecture.

>
>> --
>>
>> 4. In section 2.2.2 (and elsewhere) you seem to suggest that using
>> http: URIs to p:identify e.g. cars introduces an ambiguity and 
>> usability
>> problem for those wishing to annotate/describe/refer to web pages,
>> such that they will be unnable to or unsure of how to refer
>> accurately to the resource in question (your specific example
>> referred to ambiguity between a web page about the Vietnam war
>> vs. the Vietnam war).
>
> Yes.
>
>> Yet, in fact, this form of ambiguity has existed since the
>> very beginning of the web, such that there is no clear way
>> to determine whether a given URI p:identifies some abstract
>> information resource, a particular form of expression of
>> that information resource, or a specific representation of
>> that resource.
>
> However, there is a consistency for t:identifies.

I don't see how you have demonstrated this. You have frequently
asserted this, but have not actually provided any motivating
arguments.

> Hence the superiority of the relation for describing web architecture.

Again, you are basing this conclusion on a premise that you
have not proven.

I consider that premise to be false, and thus disagree with
your conclusion.

>
>> E.g. if one has the URI http://example.com/myCar
>>
>> which resolves to the following text/html encoded data stream
>>
>> [
>> <html>
>> <body>
>> <pre>
>> My car.
>> It is blue.
>> When I am not in it,
>> I am blue too.
>> </pre>
>> </body>
>> </html>
>> ]
>>
>> Assuming, for the sake of discussion, your position that the
>> URI http://example.com/myCar must p:identify an information
>> resource and thus we can exclude it t:identifying a car,
>
> (thank you!)  we exclude it from t:identifying a car certainly.
>
>> how is a
>> user to know if that URI t:identifies
>>
>> (a) a poem about a car (the abstract body of information)
>> (b) a particular edition of the poem, with particular line breaks
>> (c) a particular translation of the poem (e.g. in English)
>
> Good point. However, I would point out that
> these are all in a sense "a poem", just a poem specified
> more or less generically.  I think generic t:identification
> is really important.
> http://www.w3.org/DesignIsses/Generic
>
> For answer, well, you'll find on w3C tech reports a list of the URIs
> and which they are (latest version, this version, etc).
> You'll also find links in blogs and online magazine to
> a persistent link for a given article rather than the time-varying 
> "current" one.
> You'll see little flags linking to different languages, etc.

Fair enough. But the question is about a specific URI, and
how, knowing only that URI, do you disambiguate between the
many possible information resources it could identify.

I.e. the problem of ambiguity is just as acute with your
architecture -- its scope is simply more constrained because
the set of possible resources which are information resources
is a subset of all possible resources -- but the problem
is *identical* and just as significant.

>
>
>> (d) a web page containing a poem
>
> I don't think that the distinction here is fine.
> I would be more inclined to say that the document is a poem,
> and it is a web page. This web page is a poem about a car.
> There is no level difference, certainly nothing worth extracting
> in the architecture.

I disagree. I see a level distinction between the
poem and a web page via which the poem is communicated.

And here is the rub: the definition of what is or is not
an Information Resource is slippery at best. I've asked the
TAG to provide even a short list of "typical" examples,
and I expect that producing even a list of a dozen
examples of "typical" information resources would be an
exercise in futility. Because folks will disagree about
degree. Because the conceptual distinction is too fuzzy,
and the boundry between Information Resources and non
Information Resources too imprecise.

And thus, it is not possible to base an architectural
distinction on such an imprecise classification.

That doesn't mean such a classification can't be useful,
but it will always be a matter of debate whether some
resource X is or is not a valid information resource.

>
>> (e) an HTML encoding of a web page containing a poem
>
> That is *not* identified. The architecture does not have to give a URI
> for everything under the hood.

Agreed. The *architecture* does not *have* to, but it should *allow* it.

> The HTML encoding is
> an octet stream

Really? I was thinking about the abstraction, such as might
be processed by a DOM API or XSLT script (let's say XHTML
rather than HTML, for the sake of discussion).

A particular HTML instance could be serialized in multiple
distinct octet streams yet still constitute the very same
document.

But again, such debates are irrelevant given the p:identifies
architecture. Identify whatever you like, and (ideally) say
what the URI identifies.

And let OWL reasoners deal with the social disagreement and
resulting contradictions ;-)

> which, when paired with the content type "text/html",
> forms a t:representation of the poem.  Neither the representation or
> the octect stream nor the HTTP response noe the HTTP transaction are 
> given
> a URI in general, and certainly not that URI.
>

Why not? If I say that

    http://example.com/myCar rdf:type p:HTMLInstance .

and always return the same text/html encoded octet stream as a 
representation,
I see no architectural problems.

I'm clear about what my URI identifies, and I consistently serve
a valid representation.

I just don't see where there is an *architectural* issue there.

There might be a *philosophical* issue, or an issue of style or
methodology, but the bottom line is that *nothing breaks* and
it is clear to users (humans or machines) what I mean by that
URI and what information I associate/publish via that URI.

I see that as the web working very affectively.

>
>> (f) some other information resource
>
> The web replies on it *not* being a different one.

???

> If I see the poem and sent you the URI, I generally expect you to
> see the poem.

Yes, per the principle of consistent behavior, you would
expect me to have the same experience as you.

I agree.

> You must be able to use the name of the thing for the thing.

Sure.

> I don't expect you to get a different poem.

Or perhaps, you don't expect me to have a different web experience.

> I don't expect you to get a different resource (say a picture) or the 
> same thing
> because the server has deemed that the URI p:identifies something and
> both Representations are p:representations of that thing.

Why not? Perhaps the representation *you* recieved was a Flash
animation of the poem, which presented each line in a timed
sequence, with pauses in between and some nice background
music. But I am browsing the web on my phone, and so I
recieve an XHTML MP encoded representation which is fairly
basic, but efficient.

In no way do either of us, given either the debated
architectures, know what the URI actual identifies.

What counts is that our experience -- across all variability
of representation, is sufficiently consistent (according to
the goals/wishes of the URI owner) such that our experiences are
similarly meaningful (according to the goals/wishes of the
URI owner).

You may have simply said to me: "Nice, eh?" and I was left
wondering what the heck you were taking about. The poem?
The animation (which I can't see). The music (which I can't
hear). The choice of font (which may or may not be the
same for both of us). The total overall experience (which is
not exact for both of us).

You are touching on usability issues that extend far, far
beyond the boundries of this particular debate and which
are issues of equal siginficiance for *both* architectures.

These are not issues introduced or exacerbated by the
p:identifies architecture -- and I would strongly argue
that the p:identifies architecture more cleanly fits with
solutions such as URIQA which *do* help to address these
kinds of usability an social meaning issues.

>
>> ???
>>
>> All of the above options are compatible with your view.
>
> Well, no I have gone over them above.
>
>>  Yet
>> some user wishing to e.g. use Annotea or make RDF assertions
>> pertaining to whatever it is they are experiencing when
>> dereferencing that URI cannot be clear about what they are
>> acutally talking about, even if http: URIs are constrained
>> to p:identify "information resources".
>
> When users annotate things with human language, they are not
> semantic web engines.  In natural language, it its quite
> normal to convert between levels implicitly. This
> is not a guide for the architecture.

My point is that ambiguity between possible Information Resources
identified by a given URI is just as acute a problem as ambiguity
between possible resources of any kind identified by a given URI.

Your architecture does not preclude or alleviate this ambiguity,
which is what I understood you to be suggesting.

>
>> If one were to make a recommendation (or warning) based on
>> the user experience of dereferencing that URI, they still
>> would have no way of doing so unambiguously.
>>
>> This is because the web/REST architecture simply doesn't
>> care what the URIs actually p:identify, or need to care,
>> because it works just fine serving representations via
>> URIs irregardless of what those URIs p:identify.
>
>
> However, it really depends on consistency in what they t:identify.

I just don't understand what you mean by that statement.

Again, you seem to be suggesting that t:identify has no
such ambiguity. I assert (and think I've demonstrated)
that it does.

Perhaps we're just talking past one another on this point.

???

>
>> There is no fundamental difference between the ambiguity
>> "poem or web page about poem" versus "car or web page about car".
>
> That wasn't a web page *about* a poem, it was a web page which was a 
> poem.

How the heck to *you* know. How does anyone know? You *can't*
based on the URI alone, or by desciphering any representations
accessible via that URI.

It *could* identify a web page about a poem, if that's what
the URI owner thinks it identifies.

And both a poem and a web page about a poem are distinct
Information Resources -- therefore http: URIs constrained
to only indentify Information Resources are still ambiguous.


> Let us not split hairs as to what we mean by "web page".
> But note that giving a poem in a different font or different character 
> encoding
> is a whole lot different from the 'about" relationship between a 
> subject
> of a document and the document.

It is only a difference of degree, and folks will
debate ad nauseum about where lines should be drawn
along that continuum.

The p:identifies architecture alleviates any necessity
to have that debate, insofar as the architecture is
concerned.

The t:identifies architecture will have to live with an
endless debate about the nature and membership of the class of
Information Resources.

>
>
>> The web architecture simply does not provide the machinery
>> to be clear about what URIs p:identify. That's why we need
>> the semantic web (and IMO why we need solutions such
>> as URIQA).
>
> Certainly to use p:identify, you have a good argument for needing 
> something extra, perhaps URIQA.

And, as I hope I've clarified, for t:identify as well.

>
> But the web architecture already requires a concept of t:identify.
> I argue that you can't mess with that.

I just don't see how it is essential (or that I've understood
what you really mean by that).

>
> You don't *have* to mess wit hit is you use the time-honored way
> of identifying them by the document in which they are defined.
> Like "US citizen for the purposes of article 1234 of the Act".
> It maybe clumsy for a few pathological cases like wordnet,
> and we may have foaf: and dc: which we would have to transition.

I'm not sure you grasp just how disruptive, and unrealistic
such a "transition" would be.

Prediction: it will never happen. And trying to force it will
be the beginning of the end of the semantic web as we know it.

>
> So maybe we need some sort of compromise.
> A new HTTP redirect could separate the distinction between
> a document and its subject from that between a generic document and a 
> specific one.

What about a redirect from a non-information resource
to another non-information resource.

I.e. a URI identifying the planet neptune redirects
to a URI identifying an image of the planet neptune
which redirects to a URI identifying a particular
scaling and resolution of that image which eventually
gets served as a JPEG encoded stream of bits.

302 works fine for all of that. What's broken? And
again, how do you reliably and without confusion
decide when to use 302 or your new redirect?

> DC and FOAF could be fitted out with that.
> It could maybe be made into something more general along the lines
> of "I can't just give you a t:representation of that, but here is
> something which tells you about it and how to access it".

Hmmm.... I would say that a 302 does not mean "I can't give you
a representation" but rather "I can't *directly* give you a
representation, but you can access a representation via this
other URI".

> For example "that is a huge document -- suggest you query it with 
> sparql"

???

What is a huge document?

Query it with SPARQL where?

Why be tied to a particular (albeit execellent) solution?

Why not a relational database, why not a perl-cgi script,
why not a static representation?

You are needlessly limiting the range of possible solutions.

The success of the web is because it is simple and decentralized,
and clients need a minimal amount of special knowledge to get
achieve a substantial amount of functionality.

You are hobbling the semantic web by forcing centralized, specific
solutions.

The generalized architecture leaves it just as free and open
to users to decide how *they* wish to provide access to representations
of resources identified by *their* URIs.

    - A URI identifies a resource. Any resource. Period.

    - The owner of the URI decides how to server representations of that 
resource.

    - Redirection is a useful way to minimize the overhead of 
representation
      management (for either architecture).

    - URI owners may choose to use SPARQL, or URIQA, or WebDAV, or MySQL,
      or any of countless tools and solutions (existing or still to 
appear)
      to serve representations *as they choose*. The (p:identifies) 
architecture
      does not care. It neither constrains nor favors any approach. 
There is
      total and complete freedom for URI owners to decide however they 
want
      to serve representations -- *including* e.g. redirection to a 
URIref
      with fragment identifier!


> or "that is an abstract concept, definitive ontology is in this file".
>
>> --
>>
>> 4. You have often stated, in various forms and at various
>> times, as you do in this document, "that wasn't the model I
>> had when URIs were invented and HTTP was written".
>>
>> OK. Fair enough. We should certainly give strong value to
>> original design considerations and be very hesitant to
>> question and/or diverge from them.
>>
>> Yet, is it not reasonable to consider that perhaps a very large
>> and diverse segment of the web community have all noticed and
>> beneficially exploited a fairly intuitive generalization of
>> the original design and usage of http: URI in order to substantially
>> broaden the scope and coverage of the web? and have done so
>> in a way that maximizes the benefit of a globally deployed
>> and proven infrastructure without negatively impacting
>> (or substantially impacting at all) the user experience
>> traditionally associated with the web?
>
> While I think many people have done the normal human thing and
> used rather interchangeably in language documents and the things
> they describe and their URIs - this is normal - I think you are
> misleading if you mean to suggest that many people apart from you
> are building semantic web things which work using the URUQA 
> architecture.

That's not what I meant at all. In fact, I regret having
even mentioned URIQA since (a) it is not directly relevant
to this discussion and (b) the arguments for the p:identifies
architecture are far broader and fundamental than that particular
tool.

Yes, URIQA nicely demonstrates the benefits of the p:identifies
architecture, and presumes that architecture, but this is not
an argument for URIQA, and if URIQA did not exist, I would still
be arguing the very same position.

And per your comment above, I meant to state that there are many
folks using the p:identifies architecture (not URIQA) very
effectively.

> And I don't think you address the expectation among web users and 
> application
> designers for the architectural constraint of consistency
> of information content as a function of URI.

1. I think I've addressed it repeatedly, both in this interchange
as well as in numerous posts to the TAG and elsewhere. I'm a
strong proponent of consistent user experience, and as I note
above, the issue of consistent user experience is not significant
to this debate as it is (or should be) equally embraced by both
(all) competing web architectures.

2. You seem to presume that the t:identifies architecture somehow
ensures or at least increases, by some mystical (to me) manner,
the consistency of user experience -- yet the t:identifies
web can be just as inconsistent.

>
> In a way this expectation is so obvious that it goes without saying.

And as I agree with this expectation, and that both architectures
should, and do, embrace it, let's go without saying anything further
on this particular point ;-)

>
>> With rare exceptions, I think it is fair to say that those
>> using http: URIs to p:identify cars, properties, people, etc.
>> are not acting carelessly or with disregard to the tradition,
>> history, and standards-grounded definition of the web. Most
>> of these folks have thought long and hard about why they
>> chose the approach they did -- many suffering (and continuing
>> to suffer) angst over the potential, real, or percieved
>> conflict with your original conception. Yet the benefits
>> are sufficiently great to motivate increasingly wide
>> adoption of this more generalized view.
>
> I grant you that from the semantic web point of view it
> is much nicer to just use the HTTP space wiht gay abandon.
> And clearly a possibility would be to make a new
> HTTP-like space (a bit like Larry's tdb: space, That Defined By,
> or maybe a completely separate protocol)
> which has the sort of properties you describe.
> Documents in fact could then be just be concepts in the web,
> and asking about them would return one or more representatations
> just expressed in RDF instead of HTTP. The whole protocol on
> the wire could in fact be RDF.

Yet this is unnecessary. One web is enough, and it works
just fine using the p:identifies architecture.

I've long asked for you or anyone to present any evidence of
how the p:identifies architecture actually causes somethine
to *break* or in any way is detrimental, in practice, to any
specific application and I've never seen any form of response
that I could recognize as meeting that challenge.

That challenge remains.

I'm a very practical guy. I'm very much swayed by hard evidence.

All the evidence I can see simply proves that the p:identifies
architecture is (a) useful, (b) efficient, (c) reliable,
(d) quite intuitive, and (e) successful -- with no negative
impact to any existing web application.


>
> Given the fact that daap: servers and so on sprout across the net
> an great speed, maybe that would not be such  a stupid idea
> for a semantic web space first and foremost.
> web2:2005/com/nokia/WebArch1.1/resolvesAs

Well, if we were starting from scratch, sure, but since we are
not, and can easily exploit the existing globally ubuiquitous
web, why start over?

>
> Let me tell you a general basis for a concern I have with redefining
> the shared expectations of HTTP.

Before you start, let me stress that a key point of this debate
is that I (and probably others) do not see this as "redefining
the shared expectations of HTTP" or in any way conflicting with
common user expectations about the web experience. So, again,
you're arguing on the basis of what I consider to be a false
premise...

> An architecture allows growth by making constraints.
> http: (IMNSHO) is constrained to be a space of documents.
> mailto: is constrained to be a space of message destinations.
> These constraints give the architecture its form.
> They define http as a simple service which can be migrated to
> a different implementation (say a nifty peer-peer version)
> with time.  Because the features delivered are constrained.
> Similarly with mailto: -- one could replace all of SMTP
> bit by bit while keeping the URIs because the service
> is just message delivery. No lookup.
>
>    The unyielding medium is not only endured,
>     it's that upon which Art depends:
>    For who can perform on a tightrope secured
>     at only one of its ends?" -- Piet Hein
>

On principle, I agree 100% with what you state above about
constraints. Absolutely.

This is not about constrants vs. no constraints. It is about
the nature and degree of those constraints.

Alot of folks have found that a slight and intuitive (to them)
generalization (broadening) of certain (arguable) historical
constraints on the usage of http: URIs provids substantial
benefits without significant (or any) impact to current web
usability.

It's that simple.

We can take the "historical" view and say, that's not how
it was designed or intended to be used, and not advance. Or
we can take the "evolutionary" view and say, OK, some
"mutations" from the original are in fact beneficial, and
as the world changes, so too should things evolve.

The advent of the semantic web is a major (some would say
cataclysmic ;-) change to the world wide web. The evolution
from t:identifies to p:identifies is (for the sake of arguement)
a mutation that is widely encountered on the web today, and it
has been found to be a very useful evolutionary step.

I would hope that we wouldn't have to just wait and see which
variant species becomes extinct ;-)  That could take *alot*
longer than most interested parties are, I think, prepared
to endure.

>
>> Perhaps there's a gem of an idea buried under all this debate
>> which offers enough benefit to enough of the web community
>> to justify embracing this evolution of the original design.
>
> Certainly let's evolve it.  But let's not break it.

Wow. Hmmm...   Since there is no hard evidence that this "mutation"
actually does break anything, perhaps this is a seed of potential
reconciliation on this issue -- such that, so long as there is no
clear evidence of breakage, such an evolution is acceptable?

>
>> Just a thought...
>
> Thanks.
>
>> Warmest regards,
>>
>
> Likewise,
>

Too bad we didn't have time or inclination to have this chat
in real-time in Boston...  Oh well...

Cheers,

Patrick


> Tim
>
>> Patrick
>>
>>
>> --
>>
>> Patrick Stickler
>> Senior Architect
>> Forum Nokia Online
>> Tampere, Finland
>> patrick.stickler@nokia.com
>
Received on Monday, 14 March 2005 13:23:23 UTC