RE: URI components question

Thanks to all,

so, if I am right:

1 An uri generic parser as maximum could identify the components and expose
  them in escaped form.
2 Query is opaque to the uri parser, following the general rule of
  exposing it in escaped form is the maximum that can be done.
  Additional issuses seen are the separator that might change and also the
  encoding of the parameterd (application/x-www-form-urlencoded example)
3 Scheme-specific API is needed to compose and decompose query component
  instances,for example an HttpQuery class might help in composing and
  decomposing an Http query string.
4 The meaning of the components depend on the uri scheme and 
implementations,
  apart perhaps for the scheme component that associates an URI to an
  implementation.
5 Path is not totally opaque since it can be resolved with a relative
  reference and partecipates in normalization with some rules generic
  for all URIs
6 Path is made of a non-empty sequence of segments. Segments can be empty.
  Path segments are separated by character [/], escaped form can be used to
  encode the char [/] in a path segment.  Segments content is opaque to the
  uri specification.
7 There are some other characters together with [/] that are allowed inside
  paths in unescaped form without having a meaning  for a generic uri, these
  are left to scheme implementations for implementing scheme specific rules.
9 scheme specific implementations are needed to (furtherly) syntax-check the
  sub-components of the uri, at least for all but scheme.
  Same can be said for normalization.
10 Also path segments cannot be generally made available in unescaped form,
  a scheme specific implementation is needed to build each segment in a form
  that obeys to the URI syntax rules possibly using some of the
  no-escape-will-be-perfomed characters available as scheme specific 
parameters.
  The meaning of these special characters will be different from their 
escaped form.
  For example, if through FTP we want to access a folder named [a;b]
  (From the RFC1738 and I think is possible) we must write
  <ftp://example.org/foo/a%3Bb;type=d>
  if now we unescape the last segment it becomes [a;b;type=d] so unescaping 
we
  cannot distinguish anymore from the  [;] part of the FTP path and the [;]
  part of the FTP type command. So an api with unescaped segments  must 
surely
  be scheme-specific. The FTP API to build an FTP path described by Jeremy
  in 4) of [1] is a good example.

As a confirm of what said, I would like to ask: a colon in the first segment 
of
a relative reference without slash can be written as <./a:b> in *any case* 
and
<a%3ab> *if not not meant to be used as control char in the considered 
scheme
specific implementation*  ?

I thought a generic uri parser could be a components separator for a further
scheme specific processing of the components. The exception in [2] of the 
FTP
uri with a '?' in the path in unescaped form breaks also this assumption.
Remains the fact of recognizing the scheme and the fact that the URI
is an URI reference or not.
After this analysys I think this is the only thing that a generic URI parser
should do before giving the ball to a scheme specific implementation: 
finding
if there is a scheme so that a suitable uri parser implementation can be
selected or/and finding out if instead the uri is a relative reference so
that the same implementation of the current base uri can be selected.

The algorithm is already there:

RFC 3986 4.1 [..] If the URI-reference's prefix does not match the syntax of
a scheme followed by its colon separator, then the URI-reference
is a relative reference [..]

Maybe also fragment and isFragmentOnly attribute (no deference) could be
extracted. Becomes similiar to an opaque uri. Haven't thought about 
authority
and opacity yet. Do exist schemes that might use opaque uris in some
situations and hierarchical in others ?
Or opacity is  real uri-scheme-implementation specific ?
Are there any exception (like that one of the ? in FTP path) to authority
that make parsing scheme specific ?

Normalization to be effective is scheme specific. Relative references
resolution implementation might be uri-generic once the authority is known
and the segments sequence is created (after that the [?] inside the segment
of the FTP path of [2] has been hidden in the opaque value of a segment by 
an
FTP URL parser) ?

Michele Vivoda

----

[1] http://lists.w3.org/Archives/Public/uri/2006Jan/0029.html
[2] http://lists.w3.org/Archives/Public/uri/2006Jan/0010.html

Received on Monday, 30 January 2006 02:15:44 UTC