Re: URx Questions from Patrick Stickler on 2002-01-25 (www-archive@w3.org from January 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Fri, 25 Jan 2002 12:55:03 +0200
To: ext Mark Baker <distobj@acm.org>, <www-archive@w3.org>
Message-ID: <B8770627.C4C3%patrick.stickler@nokia.com>
On 2002-01-24 6:13, "ext Mark Baker" <distobj@acm.org> wrote:


>> Furthermore, I find that far more people that I encounter,
>> particularly those working on semantic web applications,
>> subscribe to the classical view, particularly software
>> engineers building web applications that use URIs extensively,
>> and who want/need a logical and formal taxonomy of URI classes.
> 
> That's unfortunate, because now that the W3C and IETF have agreed to
> use the contemporary view, the nomenclature, and the direction of
> future work on URIs will follow it.

Well, I think that there is sufficient contention about the recent
"clarification" that both the W3C and IETF will be hard pressed to
impose it upon the world at large. It may, unfortunately, flavor
some impending work, but I thing that there's going to be alot
of friction in moving in the classicial view's direction.

>>> I see it this way (which is also the way that Tim Berners-Lee,
>>> Dan Connolly, and Roy Fielding see it - the three people most
>>> responsible for the Web as we know it);
>> 
>> With all due respect to all three individuals, and many others,
>> I think that they are missing something fundamentally important.
>> (Even the gods can be wrong now and then)
> 
> That's certainly possible.  But when these three agree, I have not yet
> *ever* found them to be wrong.

There's always a first time ;-)

And I find that the contemporary view (and classical view)
disregards any notion of non-resolvable identifier (URP) as
well as the need by software (not humans) to differentiate
between true access problems and non-accessible resources.

I.e., the contemporary view does not address the needs of
semantic web agents -- which is amazing, since the SW is
Tim BL's latest "thing".

>> I see the "contemporary view" as a transient fad that
>> will pass, leaving the classical view to continue on
>> its merry way towards a global semantic web.
> 
> It is doubtful, because all future IETF/W3C work will use it.

Only if the "customers" of IETF and W3C agree, and there
seems to be a growing population who, now that they are
beginning to understand the debate, are not too happy with
the fuzzy, ambiguous, and high-overhead approach outlined
by the contemporary view.

It seems that the only folks these days who are true proponents
of the view are either "the three" or their devoted disciples.
Hours of discussion at e.g. XML 2001 showed either total
ignorance of the issue (just eat what the W3C/IETF feed you)
or preferance of the classicial view -- and numerous times
folks were "shocked" or "amazed" when I explained what the
contemporary view really means for web applications.

>> Just create your names using UUIDs, describe them as you like,
>> and for those that denote web resources, use DDDS to map them to
>> some address. Eh?
>> 
>> After all, a name is just an opaque identifier that "is" what
>> you say it is.
> 
> That's right.  The only issue is that http: as most of the same
> properties as uuid:, plus it's so well deployed and understood.
> I don't see a reason to go with an alternative that has no
> advantages over the incumbent.

Because 'http:' URLs are not temporarally unique. It's that
simple. If you want non-fragile, persistent identifiers, you
can't rely on 'http:' URLs.

Also, it's an issue of whether the resoulution agency is
explicit in the URI or whether it is left undefined, for
determination at time of interpretation based on contextual
information.

So, those are two very valid and deciding factors for
choosing something other than 'http:' URLs for many
applications -- and there are other factors besides.
 
>> Furthermore, are the proponents of the contemporary view completely
>> *blind* to the confusion that exists in the larger masses of web
>> users regarding 'http:' and other URLs that don't resolve because
>> they don't denote digital resources (e.g. XML Namespaces, vocabulary
>> terms, etc. etc.) as well as the confusion between vocabulary URLs
>> and schema URLs and the total incompatability with such an approach
>> for multiple schemas using the same vocabulary?! It appears so.
> 
> I'm part of that community.  There is definitely lots of folks wondering
> why it is that they're using HTTP URIs, but I've only ever seen one
> example of people having problems because of it (that was when
> Netscape removed their RSS DTD and the RSS processors were all
> validating and didn't cache the DTD - duh!).

In the XML world, regarding the (perceived) relation between namespace
URIs, schema identity, schema location, vocabularies, etc. it is a
*huge* mess. Likewise, in the RDF world, differentiation between web
page, owner of web page, and reification of URIs is a big mess.

It seems that (and this may be way off base) that most of the
participants in the classical versus contemporary debate are
working only with browsers -- and that there is little input from
folks working in other application areas which have very different
(even if partially intersecting) needs.

 
>>> I can do this for more than just markbaker.ca URIs.  You and I can
>>> have a conversation about http://www.ibm.com without invoking GET.
>> 
>> No. You can only have a conversation about the web resource accessible
>> at http://www.ibm.com, which is neither the URI 'http://www.ibm.com'
>> nor the company 'IBM Inc.'. You can achieve this fundamental
>> distinction by using URI schemes that embody the key semantics:
>> 
>>    http://www.ibm.com      = a web resource
>>    uri:http://www.ibm.com  = the URI for a web resource
>>    auth://ibm.com          = a (semantic) web authority/entity
>> 
>> Now, and only now, can we actually discuss these three things
>> in a clear and consistent manner.
> 
> In plain english, you and I can presumably manage to agree on what we're
> talking about.  I wasn't clear above, but we could have decided upon
> either of those things you listed there.
> 
> Now, the big issue is how is *software* supposed to know what we're
> talking about.  Using a new URI scheme for these concepts is one way, of
> course.  The problem with it is that this knowledge has to be hard coded
> into URI processors.

Absolutely not. That knowledge can be defined externally, by schema
(e.g. RDF or otherwise) which can be retrieved and consulted at whatever
frequency the application (or human controlling the application) feels
is optimal.

> So what happens when we want to make additions or
> changes to the taxonomy?  How do we deploy them?  Do we require that
> everybody download new software?

No. Just fetch the latest schema(s).
 
> What TimBL suggests is that we keep the URI as opaque as possible, and
> treat any extended information (assertions) about them as just another
> resource on the Web.

But without explicit differentiation between URI schemes by common
semantics and purpose, such global economy of definition can never
be attained -- rather, such knowledge has to be defined for *every*
instance of a URI.

> Yes, all of this is quite accurate.  It's just not practically
> extensible.

I completely disagree. In fact, the whole point is that it *is*
practically extensible and scalable.

The contemporary view is like saying that the semantics of every
XML instance should be defined for each instance, rather than
for a document type for which there is a single global schema.

I.e., it's madness and terrible engineering.
 
>> Now, playing devil's advocate to my own arguments, I will
>> concede that one could have different 'http:' URLs to
>> capture the distinctions provided by my separate URI schemes,
>> but there still remains the problem that in such a scenario
>> URLs would be used for non-digital, non-accesible resources,
>> and thus, the fair and resonable expectation by both a human
>> and an application that a URL provides access to a web
>> resource is violated.

See my definition of the "Neo-Classical View" posted to the URN
and URI lists. I think it provides a satisfactory solution for
this.
 
>> Again, how is an application (or person) supposed to know
>> that a failure to resolve is intended/expected rather than
>> due to some actual problem accessing a web resource?
> 
> HTTP response codes say all that needs saying, I believe.

But only *if* a resource is expected to be retreivable. There
is no response code I am aware of that says "this resource
is not a retrievable resource" -- and even if there were, you
would have to define that response for *every* non-retrievable
URL, a completely unacceptable and untenable overhead.

> W3C policy on
> this is reasonable; any w3.org URI used in a specification *must* be
> resolvable to something that defines what it means.

But then you are getting something that is *not* the resource itself,
and the application cannot know that -- unless it understands what
the returned resource says.

I.e., the contemporary view, and the W3C policy is intended
for humans, not for software agents. It does not and cannot
meet the needs of semantic web agents.
 
> For others, this is a good policy, because you're right, what if
> somebody is told that URI means something, but it returns a 404 when
> resolved?  There's an inconsistency there, but it is easily resolved
> by putting a few words at the other end.  It's quite a cheap process.

No, it is a terribly expensive process -- as you must then define
some standardized ontology that expresses the fact that a non-retrievable
resource was not returned, but what was returned was not the resource
but another resource that explains the non-retrievability of that
first resource -- but then, what if you want an agent to actually
retrieve that explaination, but not interpret it as an explaination
about the first resource!

Just define URI Classes for retrievable versus non retrievable
resources, and then an application knows right up front, before
even trying to resolve, whether that URI denotes a retrievable
resource or not. *That* is much cheaper.

>>> Is "foo://www-markbaker-ca/James/" an URL or an URN?  You don't know,
>> 
>> You *could* know, if you said something like
>> 
>> <rdf:Description rdf:about="foo:">
>>    <rdfs:subClassOf rdf:resource="voc://ietf.org/URI-Taxonomy/URI/URN"/>
>> </rdf:Description>
> 
> Yes.  But until you know more about it, what do you call it?  A URI.
> And you only know something is an URN if it's in the urn: scheme.

I don't accept that. I can state that any URI scheme is a URN
scheme, not only the 'urn:' scheme.

Neither the W3C nor IETF can forbid the definition of URI schemes
that are ascribed the semantics and behavior of URNs.


>> Of course, since your son actually *isn't* a web resource, that
> 
> Sure he is.  He's a "thing with identity", which is the only
> requirement for something to be a Web resource in the http: URI
> scheme.

No. A resource may be anything. But a *web* resource must be
digital and accessible. Of course, the RFC's do not make this
either clear nor consistent, contradicting themselves even in
the very same RFC, but the distinction is valid and necessary.
 
>> *You* may understand that URI to denote your son. And some other
>> human may be able to discern from its mnemonic characteristics
>> that it likely denotes a human, but a computer just sees bits.
> 
> Right.  And those bits may express an assertion that he's a human:
> 
> <http://www.markbaker.ca/James/> rdfs:subClassOf :Person

Again, I'm not saying that you can't use a URL to denote a
non-digital resource, but you will confuse both humans and
applications which expect URLs to be dereferencable to
the resource that is named by them, and you cannot (at least
for the present) dereference your son from a URL -- only
a description about him, or some other resource that is
not your son.

Folks misuse URLs every day, but that is why there are so
many problems in areas such as XML namespaces and RDF.

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com
Received on Friday, 25 January 2002 05:59:55 UTC