RE: referendum on httpRange-14 (was RE: "information resource") from Patrick.Stickler@nokia.com on 2004-10-26 (www-tag@w3.org from October 2004)

From: <Patrick.Stickler@nokia.com>
Date: Tue, 26 Oct 2004 12:00:45 +0300
To: <timbl@w3.org>
Cc: <www-tag@w3.org>, <sandro@w3.org>, <Norman.Walsh@Sun.COM>
Message-ID: <1E4A0AC134884349A21955574A90A7A5647273@trebe051.ntc.nokia.com>
> -----Original Message-----
> From: ext Tim Berners-Lee [mailto:timbl@w3.org]
> Sent: 25 October, 2004 18:33
> To: Stickler Patrick (Nokia-TP-MSW/Tampere)
> Cc: www-tag@w3.org; sandro@w3.org; Norman.Walsh@Sun.COM
> Subject: Re: referendum on httpRange-14 (was RE: "information 
> resource")
> 
> 
> 
> On Oct 20, 2004, at 7:42, <Patrick.Stickler@nokia.com> wrote:
> 
> >> -----Original Message-----
> >> From: ext Tim Berners-Lee [mailto:timbl@w3.org]
> >> Sent: 20 October, 2004 04:19
> >>
> >> On Oct 19, 2004, at 4:09, <Patrick.Stickler@nokia.com> wrote:
> >>> [...]
> >>> Also, using a particular URI to identify the *picture* of a dog
> >>> does *not* preclude someone using some *other* URI to identify the
> >>> *actual* dog and to publish various representations of 
> that dog via
> >>> the URI of the actual dog itself; and someone bookmarking the
> >>> URI of the *actual* dog should derive just as much benefit
> >>> from someone bookmarking the URI of the *picture* of the dog,
> >>> even if the representations published via either URI differ
> >>> (as one would expect, since they identify different things).
> >>
> >> No, they would *not* gain as much benefit.
> >> They would, under this different design, not have any 
> expectation of
> >> the same information being conveyed to (b) as was conveyed to (a).
> >> What would happen when (b) dereferences the bookmark? Who knows
> >> what he will get?  Something which is *about* the dog. Could be
> >> anything.  That way the web doesn't work.
> >
> > I strongly disagree. And your statements directly contradict AWWW.
> 
> Precicsely.
> The hypothesis you proposed ( using a particular URI to identify the 
> *picture* of a dog
> does *not* preclude someone using some *other* URI to identify the
> *actual* dog) led to the conclusion (that the representations would
> not carry consistent content) you strongly disagree with.
> The hypothesis fails.

I honestly don't follow you here.

You claimed, I believe, that if someone uses a URI to identify a dog, 
and someone else bookmarks a link based on that URI, that they
cannot get consistent behavior from that bookmark -- that they won't
ever know what they might get when they follow that link.

It was that specific claim that I disagree with.

If that's not what you were claiming, then please clarify.

> > It is a best practice that there be some degree of consistency
> > in the representations provided via a given URI.
> 
> Absolutely.
> 
> > That applies *both* when a URI identifies a picture of
> > a dog *and* when a URI identifies the dog itself.
> >
> > *All* URIs which offer consistent, predictable 
> representations will be
> > *equally* beneficial to users, no matter what they identify.
> 
> Now here seems to be the crunch.
> The web architecture relies, we agree I think, on this consistency
> or predictability of representations of a given URI.

I agree with some aspects of that statement, but not all.

The *utility* of the web architecture *does* rely on consistency 
of representations.

The web *architecture* itself does not *rely* on consistency of
representations.

Consistency/predictability of representations is a measure of the 
"quality" of the links, rather than an essential requirement of the 
web machinery. The HTTP protocol itself does not care one bit whether
representations accessible via a given URI are perfectly static or
chaotically random. Users care, but that doesn't mean consistency
is a feature of the web architecture proper.

That's why, presumably, consistency of representations is a
best practice, rather than a functional requirement.

Yet, being a best practice, that also means that no web client
or user can presume that all links will exhibit consistent behavior
and applications must take potential variability into account,
and not take consistent behavior for granted.

> The use of the URI in the web is precisely that it is associated
> with that class of representations which could be returned for it.
>
> Because the "class of representations which could be returned"
> is a rather clumsy notion, we define a conceptual thing
> which is related to any valid representation associated with the URI,
> and as the essential property of the class is a similarity in
> information content, we call the thing an Information Resource.
>
> So a URI is a string whose sole use in the web architecture
> is to denote that information resource.

Again, your position is clear. Your model is coherent
and logical. I've said that time and time again. There is
no need to continue to explain your model. I get it. I
think most everyone else also gets it.

It is simply not agreed that it is the *best* model to 
apply for the future of the web and semantic web.

The issue is whether the design choice to constrain representations
to a particular class of resources is (a) necessary, (b) optimal,
or (c) sufficiently clear and determinable. I believe the evidence
strongly supports an answer of 'no' for all three points.

It is not necessary to the functioning of the web that a representation
be constrained to representing solely some body of information. This
has been demonstrated.

It is not optimal that a representation be constrained to representing
solely some body of information. This has been demostrated.

It is not clear from any given representation that either (a) the entire
body of information of the resource is included in the representation
or (b) that there is not additional information (e.g. links and other 
markup which conveys information) which are not part of some information
resource,  and hence, one cannot know from any representation what bits 
absolutely are part of that information resource versus part of the
representation and which bits might be missing from the representation;
which IMO nullifies any utility that might be had from any presumed
architectural relationship between representations and information resources,
since fidelity cannot be measured nor full and complete fidelity relied 
upon.

Thus, the design constraint you advocate (however clear, coherent,
and useful for your own particular mental processes) has not been
demonstrated to be necessary or most optimal or even reliably determinable
in the real context of deployed web applications.

> Now if you say in the semantic web architecture that the same  will 
> identify
> a dog, you have a conflict.

Sigh...  only if you presume the restricted model... 

Your argument above appears to be:

   IF every http URI identifies an information resource
   AND you use an http URI to identify a non-information resource
   THEN you have a conflict

Since this very debate is about the initial premise of that
argument, the argument fails to actually address the issue
at hand.

IMO, the initial premise is false, therefore the argument fails.

> >> The current web relies on people getting the same information from
> >> reuse of the same URI.
> >
> > I agree. And there is a best practice to reinforce and promote this.
> >
> > And nothing pertaining to the practice that I and others employ, by
> > using http: URIs to identify non-information resources, in any way
> > conflicts with that.
> 
> Well, it does if the semantic web can talk about the web, as the 
> semantic web
> can't be ambiguous about what an identifier identifies in the way that
> one can in english.

Give me one concrete example of any problem introduced by the
general, agnostic model. Just one. 

I've provided hard evidence from real world, deployed applications
that the general, agnostic model works, and that the restricted 
model has severe scalability and efficiently problems.

Claims without hard evidence to back them up do not help this
debate, but merely waste time and energy. This issue needs to
be decided on demonstrated benefit of one model over the other
for real world web and semantic web applications.

My evidence is on the table. I don't see any evidence of any sort
either supporting the restricted model or reflecting any drawbacks
to the general model.

Until there is actual, concrete evidence either in support of
the restricted model or showing real, practical drawbacks to the 
general model, I don't see any point in continuing this discussion.

> I want my agent to be able to access a web page, and then use the URI
> to refer to the information resource without having to go and 
> find some 
> RDF
> somewhere to tell it whether in fact it would be mistaken.

Sigh.

Tim, there have been numerous examples presented during the course
of this debate showing that the above goal cannot be achieved reliably,
even with the restricted model, even if all http URIs are 
constrained to identify only information resources.

I will offer at least one example here:

Consider the following five distinct information resources: 

1. A novel

2. A particular edition of a novel, with distinct wording discrepancies
   from other editions

3. A specific publication of a particular edition of the novel, 
   containing the full textual content of the edition, along with an 
   introduction specific to that particular publication and a glossary
   not part of the original novel

4. A eBook version of the above publication of that edition of the novel,
   modularized by inserting section divisions particular to that 
   eBook publication and a table of contents with references to the
   individual sections, and with copyright statements included at
   the end of each section

5. The initial section of the above eBook publication containing
   only the front matter and table of contents with references to 
   its subsequent sections, and with a copyright statement at the end

*ALL* of the above resources are distinct, and could be identified by a 
distinct http URI according to the restricted model.

Now, you encounter some URI and via that URI you obtain some 
representation in the form of an HTML document, and it appears
that the representation concerns the novel in question.

Yet which information resource exactly does that URI actually identify? 

They all are concerned with the novel in question.

They all are information resources.

OK, so you guess (applying full human cognitive abilities; let's
forget about automated agents guessing about such things...)

How can you know for sure, since you have no idea about the number 
of possible information resources that the recieved HTML document could 
be a representation of (you are not privy to that list of resources above
or to every publisher's intentions, conceptions, or practices)?

>From the URI alone you cannot know *which* actual resource the URI 
identifies such that you could make any *reliable* statements about it, 
e.g. when it was created, or who the creator was, or how many words
it contains, etc.

Even if you somehow guess correctly about which resource the URI
identifies, how can you know which bits of information conveyed in the 
representation might be part of the substance of the information resource
or simply part of the the representation alone? Since even if the URI
denotes the novel alone, the representation of the novel might still 
contain a copyright notice which is certainly not part of the novel 
itself, etc.).

Or how do you know which bits of the information resource might
be missing? Since a representation of e.g. the novel provided via that
URI may still be the modular first section of an eBook publication of 
that novel, and thus does not contain the entire substance of the
novel!

There is nowhere any requirement or expectation that there be a 1:1
correspondence of information between an information resource and
any one of its possible representations.

--

This example alone should be sufficient to illustrate that you
cannot *reliably* conclude what resource a given URI identifies
or anything about that resource based solely by the representations
accessible via that URI. You can guess. You might guess right.
But you cannot ever know for sure. The web architecture *cannot*
provide that for you.

I consider that to be as fundamental and reliable a fact about the
nature of the web as there is.

And it is that fact that makes the semantic web so important, as
without the semantic web, we cannot be clear and sure about what 
any particular URI actually identifies and about the true nature
of the resource identified.

--

*** HOWEVER ***

(and *please* give full consideration to the following)

Even though you cannot know for sure what a given URI identifies,
or know for certain about the true nature of the identified resource,
you *can* at least hope (even expect) that the best practice of 
consistent representation has been followed, such that you can
reliably link to that resource (whatever it is) and derive 
consistent benefit from the web behavior provided by that link
due to the predictability/consistency of representations accessible
via that link.

You still cannot reliably make any conclusions about the identity
or nature of the actual resource, but you can nevertheless
derive real benefit from consistent web behavior afforded by
its accessible representations.

If you want to be more precise, and know exactly what the URI
identifies and what the nature of that resource is, then you
*have* to bring the semantic web machinery into play, and you
*have* to have a way to ask precise questions about the resource,
specifically in terms of the URI in question.

--

The utility of a web link is dependent on the consistent
representation of the resource accessible via the link URI, 
not to the identity or nature of the resource identified
by that URI.

The web layer does not (and should not) care about the identity or 
nature of the resources identified by particular URIs; only about 
consistency of representation/accessibility per those URIs.

It is the semantic web layer that cares about the identity and 
nature of the resources identified by particular URIs.

And the clean integration between the web and semantic web is that
for any given URI, it is presumed at both layers that it identifies
the same resource, so that the semantic web layer can describe
those resources of which representations are provided at the web layer.

The restricted model in fact blurs the distinction between the web
and semantic web layers, by making the web care about a particular
class of resource yet not others; in fact, discriminating against
all other classes of resource by relegating them to the more
complex, expensive mechanisms necessary for indirect access.

> I want to be able to model lots and lots of uses of URIs in existing
> technology in RDF. This means importing them wholesale,
> it needs the ability to use a URI as a URI for the web page without 
> asking
> anyone else.

I expect then that you will be disappointed, and your applications 
untrustworthy (unless you restrict your applications to data and 
tools that you have absolute and complete control over).

Regards,

Patrick
Received on Tuesday, 26 October 2004 09:01:55 UTC