Re: detailed critique?

Sam X. Sun (ssun@CNRI.Reston.VA.US)
Sat, 28 Feb 1998 04:04:12 -0500


Message-ID: <003801bd4427$d89ca600$d7019784@ssun2.CNRI.Reston.Va.US>
From: "Sam X. Sun" <ssun@CNRI.Reston.VA.US>
To: "Larry Masinter" <masinter@parc.xerox.com>,
Cc: <uri@Bunyip.Com>
Date: Sat, 28 Feb 1998 04:04:12 -0500
Subject: Re: detailed critique?

>...
>
>LD:
>>I believe everything up to here could be agreed on in a uniform syntax
>>document.  However, as last this was presented (Roy's document, unedited),
>>this was presented in the light of a specific implementation path, not
>>an architecture.    What seems to be clearer is that the architecture
>>is common across all URI schemes, not necessarily the implementation path
>>laid out in that document, and I'd argue that, in the larger scheme of
>>things, is what makes it an inadequate document at this time.
>
>Could you please give some examples of where draft-fielding-uri-syntax-02
>is "in the light of a specific implementation path" and how that might be
>corrected?
>


I would like to jump in and give an example of why I think it is "in the
light of a specific implementation".

As discussed earlier, I believe that the treatment of "#segment" portion in
URI syntax is defined following the "http:" URL implementation, which is the
way implemented in libwww.lib, that is:

1. Get the URI reference.

2. Strip out the "#segment" part, which take place in the Core module of
libwww.lib (http://www.w3.org/Library/src/HTParse.c).

3. Hand the "simplified" result to scheme specific handler, or Access
Modules in terms of libwww.lib.

Essentially, this means that the process of "#segment" portion is not scheme
dependent, but is handled in a "uniform" way, the way it is handled by the
"http:" URL. And other URI scheme specific handlers will never see the
"#segment" portion at all.

My suggestion is to loose up the restriction of the URI syntax, and leave
the "#segment" to be handled by scheme specific handler. In terms of
libwww.lib, this means that the HTParse.c in Core Module should not cut off
the "#segment" part, but pass it up for the Access Modules to decide what to
do with it.

I get the feeling that the currently URI specification follows closely to
the URL specification, which in turn follows closely to "http:" URL
implementation, as represented by the libwww.lib implementation
(http://www.w3.org/Library/src/). The way current URI draft suggests to
process the "#segment" portion is one of many things inherited from "http:"
URL implementation. New URI schemes may want to have their "#segment"
portion to be processed differently, and the current URI specification
doesn't allow that.

Further, I guess the reason to chop off the "#segment" portion by the
"http:" URL implementation is because the "http:" URL is assumed to result
into a HTML document, which may have the Anchor defined in it. This is not
true for telnet or email URLs, nor will it be true for many other URI
schemes in the future. For them, the idea of "#segment" or Anchor may never
mean anything.

In other words, when we enter a URI "foo:aaa#bbb", we expect the entire
"aaa#bbb" to be processed by the "foo" module, not just "aaa" part of it.
And there is real world demand on this. For example, when we were trying to
define a URI namespace for SICI, which uses "#" character heavily in its
naming convention, we found that not only we couldn't map it into "http:"
URL namespace, neither could we map it "legally" to any new URI namespace,
because they are all defined following the "http:" convention.

Regards,
Sam
ssun@cnri.reston.va.us