- From: Mykyta Yevstifeyev <evnikita2@gmail.com>
- Date: Tue, 09 Aug 2011 12:56:32 +0300
- To: Randall Sawyer <srandallsawyer@gmail.com>
- CC: public-iri@w3.org
- Message-ID: <4E410450.2060609@gmail.com>
08.08.2011 22:02, Randall Sawyer wrote: > > First, I will answer question 2: No, this idea leaves RFC 3986 > unaffected. > Randall, I think you can get more feedback on uri@w3.org list. Now, if you think RFC 3986 shouldn't be changed to accommodate your facility, where do you plan to put it? > That being said, I can answer question 1 thus: Use the rules for > normalization in section 6 of RFC 3986 as a basis for classification > as in: > <All valid paths are case-insensitive>, or > <All valid paths ends in '.html'>, or > <All valid paths contain only the characters [A-Za-z0-9_]>, etc. > > The whole point is to minimize speculation in the normalization > procedure - increasing accuracy. This will also facilitate > canonicalization - reducing redundancy. > How will this facilitate actual handling of path? Mykyta > Randall > > > On Aug 8, 2011 1:53 AM, "Mykyta Yevstifeyev" <evnikita2@gmail.com > <mailto:evnikita2@gmail.com>> wrote: > > > Randall, > > > > > > Just two questions to clarify: > > > > > > 1. How do you plan to classify path formats? > > > 2. Wouldn't it require changing RFC 3986? > > > > > > (Broadly speaking, any additional information regarding the URI > > > processing may be put as part of the path in the form of > ";param=values" > > > like in 'ftp' URIs > > > > (http://tools.ietf.org/html/draft-yevstifeyev-ftp-uri-scheme-05#section-3.1). > > > > But unless you find the answer to question 1, the idea doesn't > seem to > > > be sufficient enough to employ this way.) > > > > > > Mykyta > > > > > > 08.08.2011 7:56, Randall Sawyer wrote: > > >> > > >> Hello, All! > > >> > > >> Only recently have I stumbled upon the need to parse and normalize > > >> URLs for a couple of projects I'm working on. In doing my research - > > >> including reading all of rfc3986 and part of A. Barth's "Parsing > URLs > > >> for Fun and Profit" - I find it frustrating the amount of effort > > >> required to anticipate and correct malformed URLs. I have a > > >> suggestion as to how content-providers and client-developers may > > >> voluntarily make their services and products work better > together. [I > > >> have searched the archives for something like this, and have not > found > > >> any so far.] > > >> > > >> What I have in mind is something comparable to SGML/XML validation. > > >> Just as a *ML document may contain a declaration at the top stating > > >> that it is compliant with a specific template, what if we made it > > >> possible for an organization to declare that every existent path on > > >> their site is compliant with a specific path-syntax template? > > >> > > >> Imagine going to visit a city - and instead of just running in head > > >> long, hoping you'll be able to catch on to the local customs - you > > >> first pause at the gates long enough to read the placard listing the > > >> local customs. > > >> > > >> The former case is very much like the status quo of parsing and > > >> correcting each path segment, hoping for success. If a browser - on > > >> the other hand - was provided a set of guidelines as to the > > >> characteristics of a normalized path on that site, then computation > > >> time decreases, and access to content is facilitated. > > >> > > >> I already anticipate some issues: > > >> 1) Where to put the placard, and what to name it. These need to be > > >> the same for every site - or perhaps some universally named > meta-data > > >> pointing TO the placard. [By 'placard', I mean path-syntax-template] > > >> > > >> 2) Declared compliance is not the same as actual compliance - same > > >> goes for an *ML file, though. That is the responsibility of the > > >> author(ity). > > >> > > >> 3) What if a content-provider decides to opt for a path syntax which > > >> covers MOST, but NOT ALL, of its existing paths? The template then > > >> would need to also include a list of exceptional paths (perhaps > using > > >> a wildcard if the offending path is an upper level directory). > > >> > > >> Any thoughts? Is this desirable? Would it potentially interfere with > > >> existing protocols or standards? > > >> > > >> Randall > > >> > > > > > > > > On Aug 8, 2011 1:53 AM, "Mykyta Yevstifeyev" <evnikita2@gmail.com > <mailto:evnikita2@gmail.com>> wrote: > > Randall, > > > > Just two questions to clarify: > > > > 1. How do you plan to classify path formats? > > 2. Wouldn't it require changing RFC 3986? > > > > (Broadly speaking, any additional information regarding the URI > > processing may be put as part of the path in the form of > ";param=values" > > like in 'ftp' URIs > > > (http://tools.ietf.org/html/draft-yevstifeyev-ftp-uri-scheme-05#section-3.1). > > > But unless you find the answer to question 1, the idea doesn't seem to > > be sufficient enough to employ this way.) > > > > Mykyta > > > > 08.08.2011 7:56, Randall Sawyer wrote: > >> > >> Hello, All! > >> > >> Only recently have I stumbled upon the need to parse and normalize > >> URLs for a couple of projects I'm working on. In doing my research - > >> including reading all of rfc3986 and part of A. Barth's "Parsing URLs > >> for Fun and Profit" - I find it frustrating the amount of effort > >> required to anticipate and correct malformed URLs. I have a > >> suggestion as to how content-providers and client-developers may > >> voluntarily make their services and products work better together. [I > >> have searched the archives for something like this, and have not found > >> any so far.] > >> > >> What I have in mind is something comparable to SGML/XML validation. > >> Just as a *ML document may contain a declaration at the top stating > >> that it is compliant with a specific template, what if we made it > >> possible for an organization to declare that every existent path on > >> their site is compliant with a specific path-syntax template? > >> > >> Imagine going to visit a city - and instead of just running in head > >> long, hoping you'll be able to catch on to the local customs - you > >> first pause at the gates long enough to read the placard listing the > >> local customs. > >> > >> The former case is very much like the status quo of parsing and > >> correcting each path segment, hoping for success. If a browser - on > >> the other hand - was provided a set of guidelines as to the > >> characteristics of a normalized path on that site, then computation > >> time decreases, and access to content is facilitated. > >> > >> I already anticipate some issues: > >> 1) Where to put the placard, and what to name it. These need to be > >> the same for every site - or perhaps some universally named meta-data > >> pointing TO the placard. [By 'placard', I mean path-syntax-template] > >> > >> 2) Declared compliance is not the same as actual compliance - same > >> goes for an *ML file, though. That is the responsibility of the > >> author(ity). > >> > >> 3) What if a content-provider decides to opt for a path syntax which > >> covers MOST, but NOT ALL, of its existing paths? The template then > >> would need to also include a list of exceptional paths (perhaps using > >> a wildcard if the offending path is an upper level directory). > >> > >> Any thoughts? Is this desirable? Would it potentially interfere with > >> existing protocols or standards? > >> > >> Randall > >> > > > >
Received on Tuesday, 9 August 2011 09:56:33 UTC