Re: URI API design [was: URI Test Suite] from Al Gilman on 2001-08-14 (uri@w3.org from August 2001)

From: Al Gilman <asgilman@iamdigex.net>
Date: Tue, 14 Aug 2001 09:20:27 -0400
To: Dan Connolly <connolly@w3.org>, Mark Nottingham <mnot@akamai.com>
Cc: Aaron Swartz <aswartz@upclink.com>, uri@w3.org
Message-Id: <Version.32.20010812121753.03fd6f00@pop.iamdigex.net>
At 10:47 AM 2001-08-12 , Dan Connolly wrote:
>Mark Nottingham wrote:
>> 
>> I've started sketching out a class-based URI module to replace the
>> function-based urlparse one distributed with Python... don't know how
>> much time I'll have to work on it, but if you (or anyone else) is
>> interested, we could give it a go.
>
>I've got a few thoughts on URI API design that I haven't managed
>to code up. But while we're talking about it...
>
>Developers tend to learn about URIs from APIs, and I'd like
>to clarify some things from that perspective.
>
>For example, a URI object shouldn't have any state. 
>Several APIs
>bundle URI parsing with network access, putting GET and POST
>methods on the same object as getFragID. Bad news.

Please say getFrag.  You don't know that the Frag is an ID until you have
recovered the resource and determined its type by inspection.  The 'fragment'
that is the heuristic reason for the naming of this terminal in the parsing
model is _a fragment of the URI-reference string_, not a "fragment" of the
resource.  It is just what follows the '#'.  In general.  It is commonly used
for an ID to indicate a proper sub-object (not general fragment) of the
recovered value of the indicated resource.  But that's not definitive, i.e.
not
universal.

So in an OO context getFrag is stateful because the class of the object
returned -- what you can do with it -- depends on the state variable
knowResourceRecoveredValueType.

>So I'd prefer a URIOracle class that knows how to parse
>URIs; its interface is exposed with static methods. (this
>is pretty much the same thing as a python module with functions).
>
>Another opportunity I'd like to exploit is explaining the
>difference between when it's OK to peek into which parts of a URI.
>
>At one level, the only methods are:
> URIOracle.getFragID(aURIRef): # returns fragid
> URIOracle.combine(absBaseURI, aURIRef) # returns absolute URIref
> URIOracle.refTo(fromHere, toThere) # URI "subtraction"

This level is pertinent to the topic of URIref methods, not URI methods,
precisely.  These are two closely related classes, but the abstract URI
comprised of the equivalence class of all strings provable to indicate the
same
resource (by equivalence under the escaping rules, etc.) is worth regarding as
a separate class from the URIref that one finds in the HREF of a hyperlink,
for
example.  The URI is fully qualified and needs no context.  The URIref appears
in context and may be relative, depending for its interpretation on a BASE
available from the context.

> (and maybe some escaping/unescaping methods...
> and maybe something for encoding form arguments...
> gotta think about that).
>
>At this level, you can't peek in enough to tell the difference
>between one scheme and another. ...

If you can't tell the scheme, you are not dealing with URIs.  GetScheme is
perhaps the sole universal proper method for URIs.  Everything else hangs on
it.

URIs expose the class of their indicated resources by means of the scheme
component.  That is the first, most important production in the reference
model
for defining the world in which we use URIs.  If you haven't captured that,
start over.

ftp: URLs indicate resources which have a have a GET method.  [no PUT]

mailto: URLs indicate resources which have a PUT method.  [no GET]

data: URLs indicate resources which have a PARSE method.  [no GET or PUT]

to handle a URIref one contextualizes and normalizes to obtain the associated
URI and case on scheme to determine the applicable proper methods of the
resource indicated.  More information on which of these methods is
indicated by
some activation may be available from the context of the URIref, as for
example
when it appears as the ACTION for a FORM.  

Al

>...              This level corresponds to
>the application and/or presentation objects in TimBL's
>diagrams of the web model
>
>  <http://www.w3.org/DesignIssues/Model>http://www.w3.org/DesignIssues/Model
>
>Then there's a separate interface for use by code that does
>network access; at this level, you can parse the scheme,
>the host, the username/password, the path segments, etc.
>
>Anyway... as I say, I haven't worked out the details. I
>have a formal specification of these interfaces in progress...
>
> 
<http://www.w3.org/XML/9711theory/URIclient.lsl>http://www.w3.org/XML/9711th
eory/URIclient.lsl
> 
<http://www.w3.org/XML/9711theory/URI.lsl>http://www.w3.org/XML/9711theory/U
RI.lsl
>
>
>-- 
>Dan Connolly, W3C
<http://www.w3.org/People/Connolly/>http://www.w3.org/People/Connolly/
>
Received on Tuesday, 14 August 2001 09:02:57 UTC