- From: John Cowan <jcowan@reutershealth.com>
- Date: Wed, 24 May 2000 16:02:56 -0400
- To: "Simon St.Laurent" <simonstl@simonstl.com>, "xml-uri@w3.org" <xml-uri@w3.org>
"Simon St.Laurent" wrote: > I'd appreciate it if you could explain why you it is so critical that lower > layers of processing handle the considerable amount of effort involved in > treating URIs _as URIs_ rather than as strings for purposes of comparison, What "considerable amount of effort"? Here's some Perl code to do the whole RFC 2396 resolution. Given the base URI as an argument, it reads URI references from the standard input and sends resolved forms to the standard output. #!/usr/bin/perl $base = shift @ARGV; ($bscheme, $bauth, $bpath, $bquery, $bfrag) = $base =~ m%^([a-z0-9+.-]+:)?(//[^/?#]+)?([^?#]+)?(\?[^?]+)?(#.*)?$%; $bpath2 = $bpath; $bpath2 =~ s%[^/]+$%%; # base path without final component while (<>) { chomp; if ($_ eq "" || /^#/) { print "[current document (not necessarily $base)]$_\n"; next; } ($scheme, $auth, $path, $query, $frag) = m%^([a-z0-9+.-]+:)?(//[^/?#]+)?([^?#]+)?(\?[^?]+)?(#.*)?$%; if ($scheme) { print $_, "\n"; # absolute URI next; } $auth = $bauth unless $auth; # network-path reference $scheme = $bscheme; if (substr($path, 0, 1) ne "/") { #relative-path reference $path = $bpath2 . $path; $path =~ s%\./%%g; # remove . segment $path =~ s%/\.%%g; $path =~ s%[^/]+/\.\./%%g; # remove .. segment $path =~ s%[^/]+/\.\.$%%g; } print $scheme, $auth, $path, $query, $frag, "\n"; } This would be easy to translate into C or any other assembly language. :-) > and why higher layers (like RDF and other models) can't be trusted with > that responsibility. Here's a concrete example. Let's suppose that we have an XML 1.0 + Namespaces parser that interns all namespace names; in other words, the strings returned as namespace names are guaranteed to be the same object iff they have the same text. This satisfies the Namespace Rec as written. Now suppose that an RDF decoder is layered over this parser. It uses namespace names to locate RDF schemas for the RDF vocabularies in its input. (This need not mean that it just accesses the namespace name as an URL to fetch the schema; there may be some kind of indirection here without affecting my point.) It would like to store the schemas in a hashtable keyed on the namespace names, to minimize schema-fetching. This will not work under the status quo, because the namespace name "foo" used in two different documents will correspond to two different RDF schemas, but the XML parser will intern "foo" as a single string. -- Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)
Received on Wednesday, 24 May 2000 16:03:32 UTC