RE: [metaDataInURI-31]: Initial draft finding for public review/comment. from Patrick.Stickler@nokia.com on 2003-07-10 (www-tag@w3.org from July 2003)

From: <Patrick.Stickler@nokia.com>
Date: Thu, 10 Jul 2003 14:41:56 +0300
To: <skw@hp.com>, <MDaconta@aol.com>
Cc: <www-tag@w3.org>
Message-ID: <A03E60B17132A84F9B4BB5EEDE57957B5FBBFF@trebe006.europe.nokia.com>
 

-----Original Message-----
From: ext Williams, Stuart [mailto:skw@hp.com]
Sent: 09 July, 2003 15:08
To: Stickler Patrick (NMP/Tampere); MDaconta@aol.com
Cc: www-tag@w3.org
Subject: RE: [metaDataInURI-31]: Initial draft finding for public review/comment.


For me part of the question is when I (or a piece software I might write) look at a URI (assigned by someone else), what do I allow myself to know (intrinsically from examining the URI rather than going without asking an authority)? (BTW "Nothing" is an acceptable answer). Do I allow myself access to the knowledge embedded in a bunch of normative specifications (and possibly published assignment policies)?
 

At one time, I was a big proponent of "peeking" into URIs to deduce 
knowledge about the resource denoted, as there did not appear any
other reliable, standardized, and globally ubiquitous manner of 
obtaining fundamental knowledge about resources.
 
You may even recall back a couple of years when I was exploring
various means to do this via a more regular and well defined ontology
for classifying and relating URI schemes.
 
After a good bit of thought and work, it became evident that
a more general solution was needed. And that work lead to
URIQA.
 
It is very true that at some stage, an agent is going to have
to examine the lexical properties of the URI to apply any
protocols or processes defined in terms of the URI scheme,
any subscheme, and its structure.
 
But given the function of URIs as identifiers, I am now of the
opinion that any knowledge that might be encapsulated in
the URI should be justified by the needs of identification, not
of description.
 
For description, more open methods such as URIQA which are
not limited either by URI structural constraints nor the need
(or at least desire) for persistant URIs.
 

 
Do I allow myself to know that the URI  <mailto:skw@hp.com> mailto:skw@hp.com identifies an "Internet mailing address", because RFC2396 (and successors) allow me to identify a scheme component and RFC2368 as the registered scheme specification for the mailto URI scheme tells me "The mailto URL scheme is used to designate the Internet mailing  address of an individual or service". IMO, if I allow myself to know that mailto:skw@hp.com identifies an "Internet mailing address" then I have 'peeked', allbeit at only the scheme component. Likewise for other URI schemes that state the sort of thing that they are used to identify. Some would say, "No, you don't allow yourselve to know such things... don't peek, URI are opaque (to a web client)." Others want to build rule driven systems that very much depend allowing such inferences... and it was only a little peek... hardly a peek at all... (maybe).
 

There is certainly a gray area, in the case where URI schemes impose
restrictions on the kinds of resources that should be denoted by instances
of those URI schemes. Likewise for any subschemes, etc.
 
That someone would be able to infer an rdf:type assertion based on
a URI scheme is logical, and I wouldn't necessarily fault a particular
application from doing so.
 
But the number of URI schemes that have such explicit extensions
are few, and so it does not seem worthwhile to make such inferences
a regular part of the overall web architecture.
 
And since such cases can be addressed just as well by URIQA,
why introduce yet another, less general, less flexible, less
scalable, and less robust mechanism for publishing knowledge
about resources?

 
Another example, that links with the httpRange-14 debate. One position in that debate is that http scheme URIs without fragment identifiers may only be used to identify network accessible resources, and may not be used to identify abstract concepts (eg. a particular emotion) or a real-world object (like DanC's car or a person) eg. using  <http://people.example.com/stuart> http://people.example.com/stuart to identify me would be frowned upon.  
 

I personally take the opposing view, and find the use of URIs with fragment
identifiers to be unwise and problemmatic, and see no reason, technical,
philosophical, or practical which would warrant any such restriction from
using http: URIs to denote any entity whatsoever that can be named and
thus referred to.
 
One of the greatest contributions provided by REST is the abstraction
away from "files" or "streams of bytes" allowing URIs to consistently
denote a resource (any resource, even Dan C's car) irrespective of 
whether such resources are bit equal to their representations.
 

 However, within this particular position, it is ok to identify abstract concepts and real world artifacts with http scheme URIs that include a fragment identifier ie.  <http://people.example.com#stuart> http://people.example.com#stuart would be fine. If we were to accept this particular position, then the presense or absense of a fragment component (even a null fragment) in an http URI allows an inference to be made about whether the referenced resource is a network accessible resource or an abstract concept or real-world thing. Some folks want to build systems that rely on such distinctions... and IMO this again is peeking. 
 

I agree. And I would assert that the position that only URIs having fragment
identifiers can denote non-digitized resources is (a) already rejected by common 
usage (b) unnecessarily and IMO unjustifiably restrictive, and (c) unsupported
by and contrary to the present definition of HTTP, which requires web clients
to omit the fragment ID from requests. After all, if the fragment ID is disposable
in a transaction between client and server, how can it be the basis for the
semantic web, since the identity of the actual resource denoted by the
URIref with fragid is lost?
 
So any inferences that one may presume to derive from the presence or
absence of a fragid will be suspect at best, and with the advent of URIQA,
unnecessary in any case.

 
"Don't peek inside URI's" is a very simple thing to say and to understand, but I think that even amongst those that might say it there is a temptation to peek - with good reason - so as a prinicple it may be a little to simplistic.
 

I'm not going to take an absolutist view about peeking into URIs. I'm a
pretty pragmatic and practical person. It's not a question of recommending
"Don't do this" so much as it is a question of  actually recommeding "Do it"
 as a general methodology.
 
 Peeking inside URIs will IMO always be plagued with problems,
because many/most of the generalities that are percieved in the
structure of URIs are not absolute, and hence not reliable.
 
Better to emphasize/promote/optimize methods of publishing
and accessing knowledge that is expressed in an explicit manner
and governed by a consistent and well defined model theory.
 
Over time, I think, the need for specialized URI schemes will decrease,
as will the amount of knowledge packed into URIs in general, as
technologies such as URIQA become ubiquitous.
 
Cheers,
 
Patrick
 

 
Stuart
--
 
-----Original Message-----
From: Patrick.Stickler@nokia.com [mailto:Patrick.Stickler@nokia.com] 
Sent: 9 July 2003 10:13
To: MDaconta@aol.com; skw@hp.com; www-tag@w3.org
Subject: RE: [metaDataInURI-31]: Initial draft finding for public review/comment.



 

-----Original Message-----
From: ext MDaconta@aol.com [mailto:MDaconta@aol.com]
Sent: 08 July, 2003 21:11
To: skw@hp.com; www-tag@w3.org
Subject: Re: [metaDataInURI-31]: Initial draft finding for public review/comment.


In a message dated 7/8/2003 6:44:46 AM US Mountain Standard Time, skw@hp.com writes:



I would appreciate some feedback on this draft. Whether a simpler, shorter,
finding is a better path to take? Whether "Don't peek inside URIs" is all
that need be said?




Hi Stuart,

First, to answer your questions:
1. A simpler and shorter finding is only better for the "don't peek inside" position.
2. I disagree with the "Don't peek inside URIs" sentiment.  

The "Don't peek inside" position stresses the use of identification as an assertion of 
uniqueness and possibly a mechanism to locate that unique thing.  In essence, 
an opaque "pointer".  While those are necessary functions of a URI, 
imbuing an identifier with additional metadata should be 
encouraged.  First, additional metadata in a URI makes it 
easier to keep the URI "cool" (as in  <http://www.w3.org/Provider/Style/URI.html)> http://www.w3.org/Provider/Style/URI.html) by
adding classification metadata to the identifier (as with the W3C URLs in your 
finding). 
 

What if the metadata changes? Then you have a different URI, and things break.
 
URIs with metadata embedded in them which might change are hardly "cool".


Second, additional metadata in a URI enables a higher-level
of efficient processing on resources by applications that *just* want 
to process URIs.  Opaque URIs would eliminate that increasing possibility.
 

There are better (ie. generalized, scalable, flexible) ways to provide access
to resource descriptions than embedding such knowledge in the URIs that
denote them.
 
C.f. http://sw.nokia.com/URIQA.html
 
Cheers,
 
Patrick



--
Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com
  


Best wishes,

- Mike
---------------------------------------------------
Michael C. Daconta
Chief Scientist, APG, McDonald Bradley, Inc.
www.daconta.net
Received on Thursday, 10 July 2003 08:41:48 UTC