RE: Summary: Section 2: What does a URI identify? from Williams, Stuart on 2002-03-16 (www-tag@w3.org from March 2002)

From: Williams, Stuart <skw@hplb.hpl.hp.com>
Date: Sat, 16 Mar 2002 15:34:48 -0000
To: "McBride, Brian" <bwm@hplb.hpl.hp.com>
Cc: www-tag@w3.org, Norman Walsh <Norman.Walsh@sun.com>
Message-ID: <5E13A1874524D411A876006008CD059F192A2E@0-mail-1.hpl.hp.com>
Hi Brian,

> -----Original Message-----
> From: Brian McBride [mailto:bwm@hplb.hpl.hp.com]
> Sent: 16 March 2002 13:24
> To: Norman Walsh; www-tag@w3.org; skw@hplb.hpl.hp.com
> Subject: Re: Summary: Section 2: What does a URI identify?
> 
> 
> At 12:57 15/03/2002 -0500, Norman Walsh wrote:
> >At a recent telcon, Stuart Williams and I agreed to publish our one
> >page summary of section 2 of the architecture document this week. We
> >are aware of a few comments that have not been addressed yet, and I
> >expect this publication will generate a whole lot more, so please
> >remember that this is a work in progress. (In fact, discussion of this
> >document is on the agenda for the *next* TAG meeting, so 
> this document
> >cannot even purport to represent the consensus of the TAG :-).
> >
> >   http://www.w3.org/2001/tag/doc/identify.html
> >
> >                                         Be seeing you,
> >                                           norm
> 
> I'm really glad to see the tag taking a look at this issue.  RDFCore has 
> some dependencies on the outcome, so I hope to follow this discussion with

> interest.
> 
> 
> Some comments:
> 
> [[
> 2 What Does a URI Identify?
> 
> On the web URIs identify resources."Any information that can 
> be named can 
> be a resource." [RFC2396]. In fact, this relationship can be taken as 
> axiomatic: if a resource has a URI, it is identifiable on the 
> web. If it 
> does not, it is not.
> ]]
> 
> I cannot find the quoted text ("Any information ...") in RFC 2396.

Mia culpa... the quote in 2396 is "A resource can be anything that has
identity." Will substitute. The other quote was from a paper by Roy
Fielding.
 
> [[
> 
> 2.2 Resources
> ...
> The set of values mapped by a resource are equivalent resource 
> representations and/or resource identifiers (giving further 
> indirection or redirection). Dereferencing a resource identifier yields a 
> representation of the current value of the referenced resource. At some 
> time, t, the set of values that a resource maps to may be empty, which
allows 
> a concept to be identified before a realisation of the concept exists (or 
> indeed after it has been retired).
> ]]
> 
> What notion of equivalence is meant here.  How can I determine whether two

> values are equivalent?  This  para talks about "the current value".  Is 
> this the same sense of the term 'value' used in "At some time, t, the set 
> of values that ..." or is there some notion that a resource has state, and

> it is the value of that state that is referred to?

Roy may care to comment. We were trying to draw on some of the things he has
written (the referenced paper).

I think the idea here is that with something like content-negotiation
deferencing a URI yields one of number of possible representations. However,
the multiplicity of representations are regarded as equivalent because in
some sense they are representations of the same resource. 

Regarding your last question above. It seems to be a question about the
relationship between the state of a resource and a representation of that
state. I think of representations as being derived from the state of the
resource, however, its not clear to me whether these representations
actually encapsulate the full state of the resource - so we probably have
more work to do here :-)

As regards how can you determine whether two values are equivalent? I don't
think you can... I think you are being told that they are (at least within
Web Architecture) - how can I tell that a png representation and an svg
representation are equivalent of a resource are equivalent?

Another thought that occurs here is with repect to the notion of state
shared between a user-agent such as a browser and a resource, in that
resource (perhaps) makes some assumptions about what the UA is presenting to
its user. If these two get out of sync (back-button problems) unexpected
things can happen. 

> [[
> RDF provides the ability to described resources by their relationship to 
> one another which leads to the notion of existentally qualified resources.

> For example, there exists a person whose internet mailbox is identified by

> the URI mailto:timbl@w3.org. This identifies the person of Tim Berners-Lee

> by reference to the URI of his internet mailbox without it being necessary

> to assign a URI to identify the concept of the person Tim Berners-Lee.
> ]]
> 
> It is not the resource that is existentially qualified.  RDF has the
notion 
> of a b-node which performs a role similar to that of existentially 
> qualified variables in first order logic.  Just as in:
> 
>    x + 3 = 4
> 
> x is not the number 1, x is a variable, so b-nodes in RDF are not 
> resources, they are variables.  Any of the values a b-node 
> can take can be assigned a URI.

I understand your point... how would like it expressed that RDF enables you
to identify things by description rather than directly by URI?
 
> [[
> 2.3 Properties of Resources
> ...
> 
> Two different URI's may identify the same resource, but it is only the 
> authorities that asssign those URIs that can make the commitment to them 
> identifying the same resource.
> ]]
> 
> Can they?  Is that  a proposal?

It's certainly up for discussion - this is a first-cut after all.

> The alternative notion, is that each different URI denotes a different 
> resource, and to define a notion of equivalence between 
> resources.  Different notions of equivalence are possible;  
> resources A and B denote the same set of values at time t, for a set of
time 
> intervals  {[t1,t2]} or over all time.

Ok... but given that this is over future time... any commitment to
equivalence seems to me to remain in the pervue of the authorities that
assign URI's to resources.

> Consider for example, http://www.w3.org/.  This web page is mirrored; I 
> don't know what the url's of the mirrors are; lets say
http://www.w3.inria.fr/ 
> is one for the purpose of discussion.  There is presumably a propagation 
> delay between updating the master version and that 
> change propagating, so there is a period when an HTTP GET on the two 
> different URL's will return different values.  Does this mean 
> that these two URI's denote different resources, or is it that the 
> implementation is an imperfect realization of the ideal.

I think of those as two different resources... one is a mirror, one is
primary... 

> More importantly, how can we know that these two URL's will always denote 
> the same set of values.  We cannot predict the future.  The French 
> government could choose next month, to require that all web pages served 
> from French web servers contain some metadata which depends on the origin 
> of the page.  How can we say today, that two URL's will, for all time, 
> denote the same mapping to values.

I think that the important point that your making is that infact not even
the authorities that assign URI's to resources can make a commitment that
dereferencing two different URIs will yield equivalent results over all
time. Which I thhink takes us to the point that given two URI's in general:

	- you cannot tell that whether the identify the same (single)
resource.
	- you cannot tell whether they reference equivalent resources.

> [[
> We are dealing here with two time dependent mappings. Firstly a time 
> dependent mapping between and identifier and a resource ...
> ]]]
> 
> Oh that's horrible!  Later in the document it states:

The document is not finished and has been composed from different bits of
writing. It's an artifact of the TAG working in public.

> [[
> An absolute URI always means the same thing, regardless of 
> the context in which it occurs.
> ]]
> 
> and
> 
> [[
> The resource identified by a particular URI should always be "the same", 
> when it is identified by that URI.
> ]]
> 
> That seems a little contradictory.

Agreed... but there is the reality of things like the dot-com collapse, the
re-allocation of DNS names and the emergence of new web-sites and the
reassignment (by new authorities) of identifiers to new resources - which is
in part back to your comment that you cannot predict the future - and it
also in part reinforces as separation between identifiers and the resources
they identify.

> [[3.1 What about Fragment Identifiers?
> 
> If a URI contains an sharp character (a " # "), the string that follows
the 
> " # " is a fragment identifier. Fragment identifiers are a mechanism for 
> identifying part of a resource.
> ]]
> 
> Are resources atomic, or can the parts of a resource also be resources?

I think we will find opinion divided on this. We'll have to see where the
discussion get us.

My own view is not to regard URI references with fragment identifiers as
identifying resources. I'm not inflexible about that, I just haven't found a
way to go round the circle and make it join up at the end.

> [[
> This means that in general, it's not possible to determine what a fragment

> identifier means without retreiving the resource into which it points.
> ]]
> 
> This sentence uses the term 'means' which is rather ill defined here.
> 
> If this sentence is trying to say that it is not possible to determine the

> bytes which represent the fragment without retrieving a representation of 
> the whole resource, then that is true given current web practise.  But if 
> that is the sense in which the word 'means' is used here, then it is also 
> not possible to determine what http://www.w3.org/  *means* without 
> retrieving it.

I think we will have to work on what we're trying to express here. This is
also bound up with "what does a document mean".

Loosely... what this is trying to get at is that the interpretation (and
no-doubt you'll pick on the word interpretation as ill-defined) of a
fragment id is scoped by the MIME type of a representation.

> [[
> The fragment identifier identifies some sub-part of a resource
representation.
> ]]
> 
> I don't follow this.  Consider the resource identified by 
> http://example.org/doc/.  Consider that there are two representations of 
> this resource, one in say xhtml and the other in svg, and that each 
> contains a fragment '#chapter1'.  Can we not say that 
> http://example.org/doc/#chapter1 names chapter one of the 
> document?  Can we not say that http://example.org/doc/#chapter1 names a 
> resource, and that to display that resource a browser has to retrieve the
resource 
> http://example.org/doc/ and then interpret the value returned in a way
that 
> is dependent on the mimetype to compute the representation of 
> chapter 1.

I think that's something that can be organised with care... but I don't
think that its generally true. What if the representation is "text/plain"?

> The fact that computing the representation of a fragment is mimetype 
> dependent, does not mean that a URI with a fragment identifier cannot name

> an abstraction which has multiple representations with different
mime-types.

Not sure what you mean by "...computing the representation of a
fragment...".

If I understand what you are suggesting, then I think that there are case
where what are suggesting can be accomplished. But that is different from it
being the case in general.

> [[
> A URI that consists of only a fragment identifier (i.e, one 
> that begins with a " # ") always points into the document that contains
the URI, 
> irrespective of the effective base URI.
> ]]
> 
> This statement is presumably based on RFC 2396:

Yes...

> 
> [[
> 4.2. Same-document References
> 
>     A URI reference that does not contain a URI is a reference to the
>     current document.  In other words, an empty URI reference within a
>     document is interpreted as a reference to the start of that document,
>     and a reference containing only a fragment identifier is a reference
>     to the identified fragment of that document.
> ]]
> 
> However, there is an escape clause.  The same paragraph goes 
> on to say:
> 
> [[
> 4.2. Same-document References
> [...]
>     However, if the URI reference occurs in a context that is always
>     intended to result in a new request, as in the case of HTML's FORM
>     element, then an empty URI reference represents the base URI of the
>     current document and should be replaced by that URI when transformed
>     into a request.
> ]]

I think a narrow read of this only covers the case where the URI reference
is empty (no '#'). 

Do you know what this "escape clause" is intended to mean and how it is
intended to be applied?

> 
> Brian
> 

Thanks for your interest.

Best regards

Stuart
Received on Saturday, 16 March 2002 10:35:07 UTC