- From: michele vivoda <michelevivoda@hotmail.com>
- Date: Mon, 30 Jan 2006 02:15:40 +0000
- To: uri@w3.org
Thanks to all, so, if I am right: 1 An uri generic parser as maximum could identify the components and expose them in escaped form. 2 Query is opaque to the uri parser, following the general rule of exposing it in escaped form is the maximum that can be done. Additional issuses seen are the separator that might change and also the encoding of the parameterd (application/x-www-form-urlencoded example) 3 Scheme-specific API is needed to compose and decompose query component instances,for example an HttpQuery class might help in composing and decomposing an Http query string. 4 The meaning of the components depend on the uri scheme and implementations, apart perhaps for the scheme component that associates an URI to an implementation. 5 Path is not totally opaque since it can be resolved with a relative reference and partecipates in normalization with some rules generic for all URIs 6 Path is made of a non-empty sequence of segments. Segments can be empty. Path segments are separated by character [/], escaped form can be used to encode the char [/] in a path segment. Segments content is opaque to the uri specification. 7 There are some other characters together with [/] that are allowed inside paths in unescaped form without having a meaning for a generic uri, these are left to scheme implementations for implementing scheme specific rules. 9 scheme specific implementations are needed to (furtherly) syntax-check the sub-components of the uri, at least for all but scheme. Same can be said for normalization. 10 Also path segments cannot be generally made available in unescaped form, a scheme specific implementation is needed to build each segment in a form that obeys to the URI syntax rules possibly using some of the no-escape-will-be-perfomed characters available as scheme specific parameters. The meaning of these special characters will be different from their escaped form. For example, if through FTP we want to access a folder named [a;b] (From the RFC1738 and I think is possible) we must write <ftp://example.org/foo/a%3Bb;type=d> if now we unescape the last segment it becomes [a;b;type=d] so unescaping we cannot distinguish anymore from the [;] part of the FTP path and the [;] part of the FTP type command. So an api with unescaped segments must surely be scheme-specific. The FTP API to build an FTP path described by Jeremy in 4) of [1] is a good example. As a confirm of what said, I would like to ask: a colon in the first segment of a relative reference without slash can be written as <./a:b> in *any case* and <a%3ab> *if not not meant to be used as control char in the considered scheme specific implementation* ? I thought a generic uri parser could be a components separator for a further scheme specific processing of the components. The exception in [2] of the FTP uri with a '?' in the path in unescaped form breaks also this assumption. Remains the fact of recognizing the scheme and the fact that the URI is an URI reference or not. After this analysys I think this is the only thing that a generic URI parser should do before giving the ball to a scheme specific implementation: finding if there is a scheme so that a suitable uri parser implementation can be selected or/and finding out if instead the uri is a relative reference so that the same implementation of the current base uri can be selected. The algorithm is already there: RFC 3986 4.1 [..] If the URI-reference's prefix does not match the syntax of a scheme followed by its colon separator, then the URI-reference is a relative reference [..] Maybe also fragment and isFragmentOnly attribute (no deference) could be extracted. Becomes similiar to an opaque uri. Haven't thought about authority and opacity yet. Do exist schemes that might use opaque uris in some situations and hierarchical in others ? Or opacity is real uri-scheme-implementation specific ? Are there any exception (like that one of the ? in FTP path) to authority that make parsing scheme specific ? Normalization to be effective is scheme specific. Relative references resolution implementation might be uri-generic once the authority is known and the segments sequence is created (after that the [?] inside the segment of the FTP path of [2] has been hidden in the opaque value of a segment by an FTP URL parser) ? Michele Vivoda ---- [1] http://lists.w3.org/Archives/Public/uri/2006Jan/0029.html [2] http://lists.w3.org/Archives/Public/uri/2006Jan/0010.html
Received on Monday, 30 January 2006 02:15:44 UTC