RE: referendum on httpRange-14 (was RE: "information resource") from Patrick.Stickler@nokia.com on 2004-10-20 (www-tag@w3.org from October 2004)

From: <Patrick.Stickler@nokia.com>
Date: Wed, 20 Oct 2004 14:42:04 +0300
To: <timbl@w3.org>
Cc: <www-tag@w3.org>, <sandro@w3.org>, <Norman.Walsh@Sun.COM>
Message-ID: <1E4A0AC134884349A21955574A90A7A50ADD50@trebe051.ntc.nokia.com>
***------------------------------------------------------------***
*** NOTE: The comments contained herein are my own, and do not ***
***       (necessarily) reflect the official views of Nokia    ***
***------------------------------------------------------------***


> -----Original Message-----
> From: ext Tim Berners-Lee [mailto:timbl@w3.org]
> Sent: 20 October, 2004 04:19
> To: Stickler Patrick (Nokia-TP-MSW/Tampere)
> Cc: www-tag@w3.org; sandro@w3.org; Norman.Walsh@Sun.COM
> Subject: Re: referendum on httpRange-14 (was RE: "information 
> resource")
> 
> 
> 
> On Oct 19, 2004, at 4:09, <Patrick.Stickler@nokia.com> wrote:
> 
> >
> >
> >> -----Original Message-----
> >> From: www-tag-request@w3.org
> >> [mailto:www-tag-request@w3.org]On Behalf Of
> >> ext Tim Berners-Lee
> >> Sent: 18 October, 2004 22:03
> >> To: Sandro Hawke
> >> Cc: www-tag@w3.org; Norman Walsh
> >> Subject: Re: referendum on httpRange-14 (was RE: "information
> >> resource")
> >>
> >>
> >>
> >> The range of HTTP is not a question of belief, It is a question of
> >> design.
> >> The Web was designed such that the Universal Document Identifiers
> >> identified documents.
> >> This was refined to generalize the word "Document" to the
> >> unfortunately
> >> rather information-free "Resource".
> >> The design is still the same.
> >> The web works when person (a) publishes a picture of a dog,
> >> person (b)
> >> bookmarks it, mails the URI to person (c) assuming that 
> they will see
> >> more or less the same picture, not the weight of the dog.
> >>
> >> That is why, while the dog is closely related to the picture,
> >> it is not
> >> what is identified, in the web architecture, by the URI.
> >>
> >> There is a reason.
> >>
> >> Tim
> >
> > Fine. And if the URI used to publish the *picture* of the dog
> > identifies the *picture* of the dog, then one would presume to
> > GET a representation of the *picture* of the dog. No argument
> > there, obviously.
> >
> > Getting the weight of the dog via a URI identifying a picture of
> > the dog would be unexpected (arguably incorrect) behavior per
> > *either* view of this debate. So your example does not argue for
> > or against either view.
> >
> > Also, using a particular URI to identify the *picture* of a dog
> > does *not* preclude someone using some *other* URI to identify the
> > *actual* dog and to publish various representations of that dog via
> > the URI of the actual dog itself; and someone bookmarking the
> > URI of the *actual* dog should derive just as much benefit
> > from someone bookmarking the URI of the *picture* of the dog,
> > even if the representations published via either URI differ
> > (as one would expect, since they identify different things).
> 
> No, they would *not* gain as much benefit.
> They would, under this different design, not have any expectation of
> the same information being conveyed to (b) as was conveyed to (a).
> What would happen when (b) dereferences the bookmark? Who knows
> what he will get?  Something which is *about* the dog. Could be
> anything.  That way the web doesn't work.

I strongly disagree. And your statements directly contradict AWWW.

It is a best practice that there be some degree of consistency
in the representations provided via a given URI. Per

http://www.w3.org/2001/tag/2004/webarch-20041014/#URI-persistence

[
   Good practice: Consistent representation

   A URI owner SHOULD provide representations of the identified 
   resource consistently and predictably.
]

That applies *both* when a URI identifies a picture of 
a dog *and* when a URI identifies the dog itself.

*All* URIs which offer consistent, predictable representations will be 
*equally* beneficial to users, no matter what they identify.


> The current web relies on people getting the same information from 
> reuse of the same URI.

I agree. And there is a best practice to reinforce and promote this.

And nothing pertaining to the practice that I and others employ, by
using http: URIs to identify non-information resources, in any way
conflicts with that.

> The system relies on the URI being associated with information of 
> consistent
> content, not of consistent subject.

I disagree. And I do not see how you have offered any arguments
to substantiate this claim.

In fact, this seems contradictory to even your own position, as
illustrated below (see the example below of the speech versus the audio
clip versus the transcript).

> You can manke new URI schemes for arbitrary objects, but a very
> convenient method is to use identifiers with a

???

Not sure where you were going there, but I hope it was not suggesting
that either (a) use URIs other than http: URIs to identify non-information
resources or (b) use URIrefs with fragids to identif non-information
resources; both approaches having substantial practical drawbacks
over using http: URIs to denote any arbitrary resource whatsoever.


> > I think it is a major, significant, and beneficial breakthrough
> > in the evolution of the web that the architecture *was* generalized
> > to the more general class of resources -- so that users can
> > name, talk about, and provide access to representations of, any
> > thing whatsoever.
> 
> 1. The URI itself was never constrained -- only HTTP URIs.

Hmmm.... that's the very crux of this debate; i.e., whether
http URIs were actually originally thus constrained, or whether
any such constraint was clear from the specs, and whether it is
even a reasonable constraint, or whether significant, non-harmful 
utility can be obtained by assuming no such constraint, is
what this issue is all about.

Even if such a constraint was presumed for the original, intended 
use of the http URI scheme, I've yet to see any substantial evidence 
that using http URIs to identify non-information resources, such 
that representations of those resources are directly accessible via 
the web, is harmful or in any way problemmatic to any existing web 
application.

And the existing widespread practice of using http: URIs to identify
non-information resources is evidence that either the specs were
not clear regarding that constraint and/or the utility of abandoning
any such constraint is sufficiently great. I think it is a bit of
both. Yet even if the specs had been clearer, I think folks would
still have used http: URIs to denote arbitrary resources, because
the utility is so great.

In either case, shall we chain ourselves to the historical record,
or continue to move forward where there are proven benefits and 
(so far) no proven drawbacks?

> 2. A great way is to write RDF files so you refer to a concept as 
> described in a document, a la foo#bar

Great? Perhaps for tighly controlled, monolithic systems where one
has total control over every aspect of every application. But, with all 
due respect, I've found that methodology to be highly constraining, 
difficult to use for modular knowledge management, and most significantly,
found that it introduces nontrivial practical problems with regards to 
efficient access to representations of such secondary resources.

(and I don't appear to be alone in that experience)

Just because it might work great for you, does not mean it will
work great for anyone else, much less a majority of users.

(and to be fair, that argument applies equally well to my own experiences)

> > To ask a pointed question, Tim, do you believe that the web cannot
> > evolve beneficially in a direction beyond your original design?
> 
> Of course I don't believe that. The web is a seething mass of 
> flexibility points,
> designed to allow large chunks to be replaced.
> 
> However, to extend it is one thing,

And not even valid *extensions* such as URIQA (or WebDAV) are at 
all welcome, are they ;-)

>  but to "evolve" it in a way which
> destroys the basic assumptions of the current web may make 
> nice working
> prototypes, but is is really destructive.

Firstly, we're not talking about prototypes here. Perhaps you missed
all those places where I talked about "broadly deployed applications".

Secondly, can you provide actual evidence that such a practice is 
destructive? (neither you nor anyone else has so far).

Thirdly, the more general, agnostic model allowing any resource 
to be identified by an http: (or any) URI does not destroy
[all of] the basic assumptions of the web, but only removes a 
single assumption, and one that is in any case of debatable
clarity and utility. 

So please stop painting this general, agnostic model as somehow
"deviant" or "subversive". It is not. In fact, for many,
it reflects the natural state of the *presently* deployed
web. It is not something that some of us wish the web to
become. It is what the web already *is*, today, and we are
very happy with it that way, and do not wish to see it forcefully
reverted to a previous, and less useful, form.

You, personally, may not be particularly happy with how this
more general model is *already* part of the web today, but
that does not change the fact that it is a *reality* of current
web applications.

As I've said before, this is really a closed issue that remains
open simply because some folks refuse to accept what already *is*
part of the presently deployed web.

Unless you can point to explicit, unambiguous text in the specs
which is clearly violated by particular practices, and/or can
identify and demonstrate actual harm to existing or even currently
envisioned web applications, then you cannot reasonably argue that 
the current state of the web, including applications employing the
more general model, should be declared invalid or incorrect.

> Here we are trying to get the semantic web, 

Please don't suggest that I am not, also, trying to get to
the semantic web; much less imply that I am in any way
ignorant of what it means to get to the semantic web.

I am *deploying* the semantic web. I know alot better than many
what it means to actually *deploy* the semantic web and produce
successful, scalable, manageable, and affordable solutions based
on semantic web technologies.

Please do not presume to lecture me about the goals and visions
of the semantic web as if I don't "get it".

It is precisely because I *do* "get it" that I am concerned about
these architectural issues and take the time to go round and round
and round about issues that, on their *technical* merits, should
have been resolved a long time ago.

If every *toaster* is to eventually be on the semantic web, if my
PDA is going to tell the car stereo in my rental car what my music
preferences are, if my mobile phone is going to suggest a nearby
restaurant that I'll probably like because it's lunchtime, taking
my preferences and their menus into account, etc. etc. then the 
interchange of knowledge between semantic web agents *must* be 
efficient and scalable.

Per your model it is *not*. And that has been *demonstrated*. 

Forced indirect access to representations via URIrefs with fragment
identifiers is inefficient and non-scalable. Yes, it can work in
some cases, for some applications, for some data. But it is *not*
a scalable solution for the future of the web and semantic web.

If we are to actually "get to the semantic web", we need to be
able to have direct access to representations of any arbitrary
resource, and that access must be as efficient as possible.

> which really 
> cares about the
> difference between a dog and a picture of a dog, to operate over
> and also to model the HTTP web, which doesn't care about
> dogs at all.   The http://.../foo#bar design uses the same 
> flexibility 
> point
> as the hypertext design uses: to take a language, and convert local 
> identifiers
> in documents in that language into global identifiers using the 
> document URI and "#".

I've never questioned the coherence of your model. I simply 
question its practicality, based on proven scalability problems.

I also do not consider your more restrictive model any less compatable 
with the specs than the general model. There is ambiguity there. The
choice must be based on which offers greater benefit. I think the greater
benefit of the general model has been demonstrated.

If I need to access information (a representation) of that dog
using your approach, I am forced to always do so indirectly, via
the identity of some other resource, rather than directly via
the identity of the dog itself.

As an engineer, and also from real-world experience deploying
real semantic web applications, I find that unacceptable. 

Yes, the semantic web does indeed care about the difference between
a dog and a picture of a dog, *BUT* the semantic web does not care
one bit about the nature of the URIs used to identify those resources!

The semantic web does not in any way *prefer* that URIs are only
ever used to identify information resources whereas URIrefs with
fragment identifiers are used to identify other kinds of resources.

All that matters to the semantic web is that (a) distinct URIs are
used to identify distinct resources and (b) knowledge about resources
is available in some efficient and trustworthy manner. 

The web is certainly expected to be a primary means to
to publish/access knowledge about resources, and fortunately,
semantic web agents can presume that a given URI is taken to identify
the same resource both in RDF statements and for accessing representations
of that resource on the web. 

Thus, insofar as the semantic web is concerned, the hash vs. slash
debate is mostly *irrelevant*, and is only relevant regarding the
efficient interchange of knowledge via the web. The more general
model is simply more efficient, flexible, and scalable than the
more restricted model; both for publication/accessing as well as
for modular knowledge management.

I've outlined the problems with using URIrefs with fragment identifiers
in numerous ways in numerous forums. I won't repeat them here. I
continue, though, to wait for you or anyone to actually address those
identified problems or to demonstrate either how the more general agnostic
approach already in use is in any way worse than what you advocate,
or to demonstrate how your approach is better.

Your comments thus far on these issues seem to merely (a) restate your 
view (which I think we all understand) or (b) recast the examples used
in discussions to reflect the presumptions of your view (e.g. someone
says some URI identifies a dog, you say "well, if that URI identifies
a picture of a dog", etc. and the discussion deteriorates from there). 
Neither form of response actually address the issues and challenges 
presented.

> One can certainly design different protocols, in which the URIs 
> (without hashes)
> denote arbitrary objects, and one fetches some sort of information 
> about them.
> I know you have been designing such systems -- you described them in
> the RDF face-face meeting in Boston.  These are a different system:
> similar to HTTP, but yo added more methods, and you don't 
> have URIs for 
> the
> documents.  

You are blurring two issues. The use of http: URIs to identify any
arbitrary resource is a distinct issue from the HTTP extensions
offered by URIQA.

Either provides benefit independent of the other, though together,
they do indeed offer a tremendous amount of utility (IMO a 
potential, fundamental building block for the semantic web).

But please leave URIQA out of this particular discussion. It is not a 
component of this particular issue (httpRange-14).

> But it is a different design to the current web. 

No more so that WebDAV would constitute a different design to the 
current web. But again, let's leave URIQA out of this particular
discussion.

> You claim 
> utility
> for it.  Maybe it would be useful.  But please don't call it HTTP.

Firstly, while I may be more vocal about these issues than others,
that does not mean that (a) I am the only one who holds these views,
or that (b) I was the among the first to see the utility of using
http: URIs to identify non-information resources.

Secondly, this issue exists because the specs are not clear, and
there are existing practices which reflect both views. If the specs
were clear on this issue (httpRange-14) then it would not be a TAG
issue that has remained unresolved for a very, very long time. The
TAG would simply say "Spec X says Y, so don't do that" and that would
be that.

The TAG issue httpRange-14, and the related "hash vs. slash" debate,
exists precisely *because* of the fact that I, and others, consider it 
acceptable (and highly beneficial) to use http: URIs to denote non-information
resources, and also consider such practice a fully valid use of HTTP.

Thus, your statement above is not addressing the issue, but merely
dismissing it.

> But I claim great benefit in designing the semantic web 
> cleanly on top 
> of the HTTP web so that the facilities of each support each other and 
> become one large consistent system.

And I do not suggest doing anything differently. 

You are suggesting, though, that the more general model is not as
clean a design for integrating the web and semantic web, yet you
provide no evidence of why. And it is my view that the more general
model actually provides the simpler, more balanced, cleaner integration 
because it allows both the web and the semantic web layers to maintain 
the same exact agnostic view about what URIs identify, i.e. to share
the very same "range" of resources; thus, stated particular to HTTP:

Simple, balanced (clean) integration per the general, fully agnostic model:

  * On the web, http: URIs can identify any resource.
  * On the semantic web, any URI can identify any resource.
  * A given http: URI identifies the same resource on both the web and semantic web.
  * The web provides for direct access to representations of any http: URI identified resource.
  * The semantic web provides for making statements about any URI identified resource.

  Range of Web:          any resource
  Range of Semantic Web: any resource

Versus the more complex, imbalanced (less clean) integration per the restrictive model:

  * On the web, http: URIs can only identify information resources.
  * On the semantic web, any URI can identify any resource.
  * A given URI identifies the same resource on both the web and semantic web.
  * The web provides direct access to representations of only http: URI identified information resources.
  * The semantic web provides for making statements about any URI identified resource.

  Range of Web:          information resource
  Range of Semantic Web: any resource

Now, which model really provides the cleaner, more balanced integration?

> You ask what utility there is in this rule.
> 
> There is great utility in the fact that any person, on seeing a web 
> page,
> can use the URI instead of the content as a shorthand for 
> that content.

Really? I think what they *see* is a presentation of a representation
of whatever resource is identified by that URI, which may be a partial view of
that resource.

Just because all of the substance of an information resource *can* be
trasferred in a message, does not mean that all of the substance of
an information resource *must* be transferred in a message. (this
is a point that AWWW could also explicitly make clear).

Furthermore, it would be a mistake on the part of the user to try
to equate the request URI with what might be successfully presented
by their browser per a successful response.

The URI may identify a speech, yet the browser may play an audio
stream of the speech, because there is an MP3 representation and
the user prefers MP3 over HTML and the user's browser supports
MP3 audio streams, -- and the user may then mistake the URI as identifying
the audio stream rather than the speech itself, and they email that 
URI to their friend and say "listen to this", but their friend's browser 
doesn't support MP3 audio streams, so they get a transcript instead, 
encoded in HTML and are subsequently confused about how their friend expects 
them to "listen" to a textual transcript. 

Yet the above scenario is entirely possible per your restricted 
model of what an http URI can identify, and shows that this is not
a problem inherent in the more general model, but a more fundamental
problem at the very foundation of the web and also reflected by
the principle of URI opacity.

Thus, here is yet another example of how one cannot ever presume
anything about what resource is identified by a given URI or about
the nature of a resource identified by a given URI solely
based on any arbitrary representation(s).

Your restricted model does *nothing* to avoid that scenario. It is
a challenge that the semantic web must solve. And restricting
http URIs to identifying only information resources will not
help the semantic web one way or another to meet that challenge.

At the end of the day, the creator/owner of *every* URI has to
say what that URI identifies, and ideally, also tell us something
about the nature of the resource in question.

> This is so simple that people often haven't thought about it.
> (And thinking about it leads to the aspects of version, language, and 
> content type.)
> This is done in all the hypertext links and bookmarks and billions of
> places where the web is used.   Your proposed "evolution" would break 
> that.

NO. It would not. And IMO the onus is on you to prove it.

And BTW, if this evolution would break such things, then such things
would *already* have broken, since the web has *already* evolved
into the more general model.

Web links refer to resources, not to representations (unless the resource
referred to is, by coincidence, a representation). And links refering
to non-information resources which consistently resolve to representations
in a predictable manner *ALREADY* work *just* as well as any other link
to any other type of resource.

Here's one for you: consider the following link which refers
to a property (which is presumed not to be an information resource):

   <a href="http://sw.nokia.com/FN-1/published">Publication Date</a>

Go ahead. Follow the link. Email it to anyone. Have them resolve the
link. Was your experience the same? Did you both get a consistent
representation of that property? I bet you did.

There, I've *proven* that the general approach does not break the web.

(at least, I didn't notice the web crashing from way up here in Finland,
 perhaps I should give it a minute or two...  nope, still nothing...)

> I hope that this is now clear.

Your position is clear. It has been reasonably clear to me for some time.

Yet the *superiority* of your position has not been demonstrated.

> > The core of your argument seems to be "Because the web was not
> > originally designed to do that, it cannot and should not do that".
> 
> No, it is that what you propose is inconsistent with the way the web 
> works now.

Please demonstrate how it is inconsistent. E.g.

Per your restricted model: 

   * An http: URI always identifies an information resource
   * Links referring to resources identified by http: URIs provide access to
     representations of those resources.
   * Resolution of URIs to representations should be consistent.

   Successfully traversing the link <a href="http://example.com/foo">Blargh</a> 
   results in being presented with a representation of the resource identified
   by the URI <http:/example.com/foo>.

Per the general model:

   * An http: URI identifies any resource
   * Links referring to resources identified by http: URIs provide access to
     representations of those resources.
   * Resolution of URIs to representations should be consistent.

   Successfully traversing the link <a href="http://example.com/foo">Blargh</a> 
   results in being presented with a representation of the resource identified
   by the URI <http:/example.com/foo>.

Now, presuming that <http://example.com/foo> actually does identify an information
resource, exactly *what* breaks given the more general, agnostic model? In fact,
no matter what kind of resource it is, how does the general model change the
way that users use web links to access information? It doesn't.

We've already clarified that users should not conclude anything about what
a given URI actually identifies based on accessible representations. What
the web user is primarily (or only) concerned about is the consistency of 
the representations accessible via a given URI. They don't usually care
at all what the URI actually identifies. It's the automated systems that
really care, and it's the automated systems that will rely on the
semantic web machinery to clarify what those URIs actually identify
and what the nature of those identified resources actually is.

The general model, and the integration of the semantic web with 
the web based on that general model, will have no significant
impact on the way users use web links to access information. 

> > Yet actual practice and deployed solutions demonstrate that there
> > is clear benefit to the more general model; and there does
> > not appear to be any substantial evidence that applying that
> > more generalized model is harmful or problemmatic to the actual
> > real-world functioning of the web, or that the narrower, more
> > restricted (original) model is clearly better.
> 
> That is because you have not really looked at the implications of what
> you are saying -- 

Thank you for paying me the complement of being short-sighted. 

Forgive me for not returning the complement.

> you are assuming, I suspect, that web suers 
> will go on
> using URIs as they do, and your software will use them differently,
> and that the two won't bother each other.  But I am aiming higher  -
> for one consistent design across WWW and SW.

My aim, I assure you, is just as high (hmmm, perhaps even higher) 
than yours.

Though I would hope and expect that final resolution of this issue
would not lie in just my or your personal view.

Nevertheless, I consider the more general agnostic model to provide 
a more consistent, seamless integration of the web and semantic
web layers than your restricted model. 

Allowing for efficient web access (including *mobile* web access) to 
representations of any arbitrary resource, including their descriptions, 
and employing the semantic web machinery to talk about and reason about 
any arbitrary resource reflects not only the present, but
also the future.

Restricting direct access to representations to a particular subclass
of resources will simply hobble the web and semantic web and exclude
a significant amount of utility, much of which is already demonstrated.

> > If you, or anyone, feels that there *is* evidence either showing
> > how the more generalized view is harmful, or how the narrower
> > (original) view is better, then I would love to see it.
> 
> Maybe that explanation will help, maybe it won't

I appreciate you making the effort, but it does not. Sorry. 

You have not presented any actual evidence that your restricted model is 
better than the more general model, or how the general model acually causes 
any real problems for either systems or users.
  
> Best Wishes,

Likewise,

Patrick


> Tim BL
> 
> > Regards,
> >
> > Patrick
> 
>
Received on Wednesday, 20 October 2004 11:43:47 UTC