Re: which layer for URI processing?

At 04:02 PM 5/24/00 -0400, John Cowan wrote:
>"Simon St.Laurent" wrote:
>
>> I'd appreciate it if you could explain why you it is so critical that lower
>> layers of processing handle the considerable amount of effort involved in
>> treating URIs _as URIs_ rather than as strings for purposes of comparison,
>
>What "considerable amount of effort"?  Here's some Perl code to do the whole
>RFC 2396 resolution.  Given the base URI as an argument, it reads URI
>references from the standard input and sends resolved forms to the standard 
>output.
>[...]
>
>This would be easy to translate into C or any other assembly language.  :-)

Thanks for the code, John.  I don't think anyone looks forward to
integrating that with their existing parsers.  It also leaves open
questions like Larry Masinter's:
LM>This would suggest that you avoid having two namespaces,
LM>one http://www.w3.org/blah and another http://WWW.W3.ORG/blah
LM>since even though the two URIs are equivalent when treated
LM>as uniform resource locators, they're not equivalent as
LM>namespace names.  It isn't really practical to enumerate
LM>all of the 'allowed' vs. 'disallowed' forms, or even to mandate
LM>that all URIs used should be 'canonicalized' in some form.

>Let's suppose that we have an XML 1.0 + Namespaces
>parser that interns all namespace names; in other words, the strings
>returned as namespace names are guaranteed to be the same object iff they
have
>the same text.  This satisfies the Namespace Rec as written.
>
>Now suppose that an RDF decoder is layered over this parser.  It uses
>namespace names to locate RDF schemas for the RDF vocabularies in its
>input.  (This need not mean that it just accesses the namespace name
>as an URL to fetch the schema; there may be some kind of indirection here
>without affecting my point.)  It would like to store the schemas in a
>hashtable keyed on the namespace names, to minimize schema-fetching.
>
>This will not work under the status quo, because the namespace name
>"foo" used in two different documents will correspond to two different
>RDF schemas, but the XML parser will intern "foo" as a single string.

I'd have to describe that as an RDF layer that isn't doing the work
(absolutization) that it needs to do, not a flaw in the proposed approach.

Again, the example presents a broken upper layer, not a broken lower layer.
 This may be an argument against interning names, but that doesn't feel
like nearly the same question.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com

Received on Wednesday, 24 May 2000 19:39:39 UTC