RE: introducing URIs [was: 13 Aug Arch Doc...] from Williams, Stuart on 2002-08-16 (www-tag@w3.org from August 2002)

From: Williams, Stuart <skw@hplb.hpl.hp.com>
Date: Fri, 16 Aug 2002 15:43:37 +0100
To: "'Dan Connolly'" <connolly@w3.org>, "Ian B. Jacobs" <ij@w3.org>
Cc: www-tag@w3.org
Message-ID: <5E13A1874524D411A876006008CD059F04A06FCC@0-mail-1.hpl.hp.com>
Hi Dan,

In [1] you asked me:

> Please take a look at my "introducing URIs" message to see
> if it really looks like folly.

Well... the short answer is yes... I continue to think that redefinition of
the terminology around URI would be folly (more at the very end below).

On the plus side (I think)... I believe I have come to be comfortable with
the notion that URI composed with fragment id can indeed be used to identify
Resources. That is elaborated below - I'd appreciate your thoughts, because
the reasoning below may be flawed, and I would like to be agreeing on the
reasoning as well as the conclusion. I think I also came close to
understanding TBL's position on httpRange-14 and trip at the last hurdle.

regards

Stuart
[1] http://lists.w3.org/Archives/Public/www-tag/2002Aug/0149.html

> -----Original Message-----
> From: Dan Connolly [mailto:connolly@w3.org]
> Sent: 13 August 2002 22:55
> To: Ian B. Jacobs
> Cc: www-tag@w3.org
> Subject: introducing URIs [was: 13 Aug Arch Doc...]
> 
> 
> 
> On Tue, 2002-08-13 at 15:29, Ian B. Jacobs wrote:
> [...]
> > [1] http://www.w3.org/2001/tag/2002/0813-archdoc
> 
> "1.1 Use of terms URI and URI reference in this document
> 
> RFC 2396 divides the world ..."
> 
> Hmm... that starts from a perspective that we intend to
> obsolete. How about starting from the other perspective:
> 
> =======
> 
> Chapter 1: Identifiers and resources
> 
> The Web is a universe of resources; resources are a generalization
> over documents, files, menu items, machines, and services, as
> well as people, organizations, concepts, etc. Web architecture
> starts with a uniform syntax of identifiers for resources, so
> that we can refer to them, access them, describe them, share
> them, etc. The syntax employs an extensible set of schemes. Several of
> the schemes incorporate established identification mechanisms
> into this syntax:
> 
> 	mailto:nobody@example.org
> 		mailbox names (including DNS domain names)
> 	ftp://example.org/aDirectory/aFile
> 		ftp file names (including DNS domain names)
> 	news:comp.infosystems.www
> 		newsgroup names
> 	tel:+1-816-555-1212
> 		telephone numbers
> 	urn:uuid:@@look-up-syntax
> 		UUIDs, from Apollo/DCE/COM
> 
> and others incorporate new naming schemes, including those
> introduced as a consequence of new protocols:
> 
> 	http://www.example.org/something?with=arg1;and=arg2
> 		HTTP resources
> 	ldap:@@look-up-ldap-syntax
> 		LDAP entries
> 	urn:oasis:SAML:1.0 (@@double-check)
> 		a namespace from an Oasis specification

No problem up to here... really nice.

> Indentifiers in any of these schemes can be composed with
   ^sp
> a fragment identifier to yield an identifier for a
> resource that is a part of, or view on, another resource:

This begins to wriggle (maybe)... at the very least this suggests that the
sort of thing identified by a URI composed with a fragment identifier is a
specialisation of the sort of thing that a plain URI identifies. The former
is 'constrained' to identify a resource that is either "part of" another
resource or a resource which is a "view on" another resource. For there to
be no specialisation here, for the resource identified by the URI#frag
composition to be fully a first class resource, all resources would have to
be "part of" or a "view on" to some other resource (and maybe that is also
the case... up to some 'Top').

I tried to discuss some of this with a colleague locally we got round to
using Cars as an example to discuss this stuff to see where we get to... so
bear with me... I'm not sure where this will conclude...

Maybe we can think of the #1 piston of the engine of Dan's car. I've chosen
"Dan's" car because we can perhaps also explore in the in the context of
httpRange-14 which question whether or not it is legitimate to identify a
car with an absolute HTTP URI. We can consider both cases below. A car is
also interesting because we can think of it in real-world terms with VINs on
chasis, engine numbers on engine blocks, and maybe batch and serial markings
on individual pistons.

Case 1
------
http://example.com/myCar 		identifies a particular Car.

http://example.com/myCar#engine	identifies the engine in the car identified
by http://example.com/myCar.

http://example.com/myCar#piston1	identifies the #1 piston (in the
engine) in the car identified by http://example.com/myCar.


That the piston and the engine are part of the car is evident from their
respective identifiers. That the piston is also part of the engine is not
evident from the identifiers... but I could 'mint' another identifier for
the engine, and identify the piston as part of the engine. Hence:

http://example.com/myCar/engine	also identifies the engine in the car
identified by http://example.com/myCar.

http://example.com/myCar/engine#piston1
						also identifies the #1
piston in the engine in the car identified by http://example.com/myCar.


Similarly, if there were parts of a piston to speak of then we could
'promote' the piston to having a URI such that:

http://example.com/myCar/engine/piston1
						also identifies the #1
piston in the engine in the car identified by http://example.com/myCar.

Note that the 'equivalence' of the multiple identifiers of the piston is
only asserted by the authority that assigns the URI. It is not a general
property of URI syntax that replacement of a '#' with a '/' results in two
identifiers that identify the same resource, but it is a convention used in
this example.

Case 2
------
http://example.com#myCar
	identifies a particular Car.

Now I'm stuck...! How do name the engine in a way that reflects that it is
part of the Car? So... scratch the above and go postfix (shouldn't be hard
for some one from HP).... Start again:

http://example.com/myCar#
	identifies a particular Car.

http://example.com/myCar#engine
	identifies the engine in the car identified 
	by http://example.com/myCar#

http://example.com/myCar#piston1
	identifies the #1 piston (in the engine) in 
	the car identified by http://example.com/myCar#.


And as before we can 'prompt' fragments into a URI so that we can further
fragment the fragment, hence:

http://example.com/myCar/engine#
	also identifies the engine.

http://example.com/myCar/piston1#
	also identifies the #1 piston.

and

http://example.com/myCar/engine/piston1#
	also identifies the #1 piston.

The 'equivalence' of various of these identifiers again is assured by the
assigment authority and *not* the URI syntax.

This also provides scope to ask... so what to http://example.com/myCar,
http://example.com/myCar/engine etc. identify... and the assignment
authority might reply... well that's easy... they identify documents that
describe the various things identified by extending each identifier with a
'#'. 

However, this does then leaves the ambiguity about whether
http://example.com/myCar#engine identifies a part of a car or a part of a
document. Hmmm I think that the balloon just bulged somewhere else.

Conclusion
----------
With case 2 I was beinging to feel that I understand where TBL is coming
from on httpRange-14... however I was disappointed to end in an ambiguity.

With case 1, a part, SP, of a part, P, of a resource, R, can realily be
identified as a part of R. But to refer to SP as a part of P requires that P
also be identified by an independent URI.  The equivalence of the URI of P
and the compostion of the URI of R with the fragment identifier of P is
assured only by the assignment authority and cannot be deduced from the URI
syntax of the identifiers.... which are opaque in general (by design).

--

Sorry that was so long... I found it useful... apologies if you do not. I'd
be interested in your thoughts. Clearly a car, its engine and a piston
within that engine are all resources... case 1 does seem to work... and the
simple conclusion is that in order to identify a part of something as a part
of that something the URI reference, the something needs to be identified
with URI (no fragment).

I would have liked case 2 to have worked... it was doing fine until I answer
the question of what the plain URI (might) identify. Maybe there is a
different answer eg. they don't identify anything which makes case 2 work.
But I think it then makes the two cases equivalent with infix and postfix
use of the '#' and a need to adminsitratively assign distinct URI to
resources and documents (also resources) that describe those resource.

--

Ok... real bottom line is that I have convinced myself that a URI+fragment
can reference a resource (case 1). Queue the counter arguments.... :-)

> 	ftp://example.org/aDirectory/aDocument#section1
> 	http://www.example.org/aList#item1
> 	http://www.example.org/states#texas
> 
> Note that while this composition is syntactically fully general,
> many cases such as mailto:nobody@example.org#abc
> don't make much sense to any deployed software or
> specifications.
> 
> To summarize, a <dfn>Uniform Resource Identifier</dfn>, or
> <dfn>URI</dfn>, is a
> character sequence starting with a scheme name, followed by
> a number of scheme-specific fields, optionally
> followed by a fragment identifier.

Personnally I continue to prefer that we align our terminology with RFC2396
and use that terminology precisely and carefully. And... despite the above I
do believe that it would be folly to redefine the term URI in a way that is
inconsitent with RFC 2396 or its successor. *IF* a successor to RFC2396 were
to make such a terminological change, I would be 'ok'ish for the TAG's
Architecture to follow suit... BUT I think that such a change would be a
huge diservice those who in the past have been careful in their use of the
URI/URI reference (and same-document reference) terms. I think the likely
maintenance impact on existing specifications that have carefully got it
right should be objectively evaluated by those who advocate such a change (I
was going to say doesn't bear thinking about, but that's more emotional than
rational).

So... yes I continue to think such a change in the use of tersm is indeed
folly.

> This URI syntax is accompanied by a shorthand
> <dfn>URI reference</dfn> syntax.
> A URI reference is an abbreviation of a URI that can be expanded
> by combining it with a base URI. For example, in a document
> whose base URI is http://example/dir1/dir2/file1 ,
> the URI reference ../file2 abbreviates http://example/dir1/file2
> and the URI reference #abc
> abbreviates http://example/dir1/dir2/file1#abc.

The latter I think is wrong, #abc is a same document reference is is *not*
evaluated with respect to a base URI [RFC2396 see Sections 4.2 and 5.2].

Again, I would request the use of the RFC2396 terms "Relative URI" and "Same
Document Reference"... yes the identifiers in your example are URI
References too... but is the "Absolute URI" that you used as a base URI.
Mixing these up going forward will, IMO, add confusion rather than
clarification.

> [[NOTE: The current URI specification, RFC2396, uses a more
> constrained definition of the term URI; by that definition,
> identifiers that include fragment identifiers are not URIs.
> The TAG intends to request a revision to RFC 2396 to adopt
> the less constrained definition used here.]]
> 
> =======
> 
> then continue with 1.2 Resources and URIs.
> 
> 
> > [2] http://www.w3.org/2001/tag/#tag-attn
> > -- 
> > Ian Jacobs (ij@w3.org)   http://www.w3.org/People/Jacobs
> > Tel:                     +1 718 260-9447
> -- 
> Dan Connolly, W3C http://www.w3.org/People/Connolly/


regards

Stuart
Received on Friday, 16 August 2002 10:57:26 UTC