Re: Idea: Authority-declared sub-syntax for URL paths from Mykyta Yevstifeyev on 2011-08-09 (public-iri@w3.org from August 2011)

From: Mykyta Yevstifeyev <evnikita2@gmail.com>
Date: Tue, 09 Aug 2011 12:56:32 +0300
To: Randall Sawyer <srandallsawyer@gmail.com>
CC: public-iri@w3.org
Message-ID: <4E410450.2060609@gmail.com>
08.08.2011 22:02, Randall Sawyer wrote:
>
> First, I will answer question 2:  No, this idea leaves  RFC 3986 
> unaffected.
>

Randall,

I think you can get more feedback on uri@w3.org list.

Now, if you think RFC 3986 shouldn't be changed to accommodate your 
facility, where do you plan to put it?

> That being said, I can answer question 1 thus:  Use the rules for 
> normalization in section 6 of RFC 3986 as a basis for classification 
> as in:
> <All valid paths are case-insensitive>, or
> <All valid paths ends in '.html'>, or
> <All valid paths contain only the characters [A-Za-z0-9_]>, etc.
>
> The whole point is to minimize speculation in the normalization 
> procedure - increasing accuracy.  This will also facilitate 
> canonicalization - reducing redundancy.
>

How will this facilitate actual handling of path?

Mykyta

> Randall
>
> > On Aug 8, 2011 1:53 AM, "Mykyta Yevstifeyev" <evnikita2@gmail.com 
> <mailto:evnikita2@gmail.com>> wrote:
> > > Randall,
> > >
> > > Just two questions to clarify:
> > >
> > > 1. How do you plan to classify path formats?
> > > 2. Wouldn't it require changing RFC 3986?
> > >
> > > (Broadly speaking, any additional information regarding the URI
> > > processing may be put as part of the path in the form of 
> ";param=values"
> > > like in 'ftp' URIs
> > > 
> (http://tools.ietf.org/html/draft-yevstifeyev-ftp-uri-scheme-05#section-3.1). 
>
> > > But unless you find the answer to question 1, the idea doesn't 
> seem to
> > > be sufficient enough to employ this way.)
> > >
> > > Mykyta
> > >
> > > 08.08.2011 7:56, Randall Sawyer wrote:
> > >>
> > >> Hello, All!
> > >>
> > >> Only recently have I stumbled upon the need to parse and normalize
> > >> URLs for a couple of projects I'm working on. In doing my research -
> > >> including reading all of rfc3986 and part of A. Barth's "Parsing 
> URLs
> > >> for Fun and Profit" - I find it frustrating the amount of effort
> > >> required to anticipate and correct malformed URLs. I have a
> > >> suggestion as to how content-providers and client-developers may
> > >> voluntarily make their services and products work better 
> together. [I
> > >> have searched the archives for something like this, and have not 
> found
> > >> any so far.]
> > >>
> > >> What I have in mind is something comparable to SGML/XML validation.
> > >> Just as a *ML document may contain a declaration at the top stating
> > >> that it is compliant with a specific template, what if we made it
> > >> possible for an organization to declare that every existent path on
> > >> their site is compliant with a specific path-syntax template?
> > >>
> > >> Imagine going to visit a city - and instead of just running in head
> > >> long, hoping you'll be able to catch on to the local customs - you
> > >> first pause at the gates long enough to read the placard listing the
> > >> local customs.
> > >>
> > >> The former case is very much like the status quo of parsing and
> > >> correcting each path segment, hoping for success. If a browser - on
> > >> the other hand - was provided a set of guidelines as to the
> > >> characteristics of a normalized path on that site, then computation
> > >> time decreases, and access to content is facilitated.
> > >>
> > >> I already anticipate some issues:
> > >> 1) Where to put the placard, and what to name it. These need to be
> > >> the same for every site - or perhaps some universally named 
> meta-data
> > >> pointing TO the placard. [By 'placard', I mean path-syntax-template]
> > >>
> > >> 2) Declared compliance is not the same as actual compliance - same
> > >> goes for an *ML file, though. That is the responsibility of the
> > >> author(ity).
> > >>
> > >> 3) What if a content-provider decides to opt for a path syntax which
> > >> covers MOST, but NOT ALL, of its existing paths? The template then
> > >> would need to also include a list of exceptional paths (perhaps 
> using
> > >> a wildcard if the offending path is an upper level directory).
> > >>
> > >> Any thoughts? Is this desirable? Would it potentially interfere with
> > >> existing protocols or standards?
> > >>
> > >> Randall
> > >>
> > >
> > >
>
> On Aug 8, 2011 1:53 AM, "Mykyta Yevstifeyev" <evnikita2@gmail.com 
> <mailto:evnikita2@gmail.com>> wrote:
> > Randall,
> >
> > Just two questions to clarify:
> >
> > 1. How do you plan to classify path formats?
> > 2. Wouldn't it require changing RFC 3986?
> >
> > (Broadly speaking, any additional information regarding the URI
> > processing may be put as part of the path in the form of 
> ";param=values"
> > like in 'ftp' URIs
> > 
> (http://tools.ietf.org/html/draft-yevstifeyev-ftp-uri-scheme-05#section-3.1). 
>
> > But unless you find the answer to question 1, the idea doesn't seem to
> > be sufficient enough to employ this way.)
> >
> > Mykyta
> >
> > 08.08.2011 7:56, Randall Sawyer wrote:
> >>
> >> Hello, All!
> >>
> >> Only recently have I stumbled upon the need to parse and normalize
> >> URLs for a couple of projects I'm working on. In doing my research -
> >> including reading all of rfc3986 and part of A. Barth's "Parsing URLs
> >> for Fun and Profit" - I find it frustrating the amount of effort
> >> required to anticipate and correct malformed URLs. I have a
> >> suggestion as to how content-providers and client-developers may
> >> voluntarily make their services and products work better together. [I
> >> have searched the archives for something like this, and have not found
> >> any so far.]
> >>
> >> What I have in mind is something comparable to SGML/XML validation.
> >> Just as a *ML document may contain a declaration at the top stating
> >> that it is compliant with a specific template, what if we made it
> >> possible for an organization to declare that every existent path on
> >> their site is compliant with a specific path-syntax template?
> >>
> >> Imagine going to visit a city - and instead of just running in head
> >> long, hoping you'll be able to catch on to the local customs - you
> >> first pause at the gates long enough to read the placard listing the
> >> local customs.
> >>
> >> The former case is very much like the status quo of parsing and
> >> correcting each path segment, hoping for success. If a browser - on
> >> the other hand - was provided a set of guidelines as to the
> >> characteristics of a normalized path on that site, then computation
> >> time decreases, and access to content is facilitated.
> >>
> >> I already anticipate some issues:
> >> 1) Where to put the placard, and what to name it. These need to be
> >> the same for every site - or perhaps some universally named meta-data
> >> pointing TO the placard. [By 'placard', I mean path-syntax-template]
> >>
> >> 2) Declared compliance is not the same as actual compliance - same
> >> goes for an *ML file, though. That is the responsibility of the
> >> author(ity).
> >>
> >> 3) What if a content-provider decides to opt for a path syntax which
> >> covers MOST, but NOT ALL, of its existing paths? The template then
> >> would need to also include a list of exceptional paths (perhaps using
> >> a wildcard if the offending path is an upper level directory).
> >>
> >> Any thoughts? Is this desirable? Would it potentially interfere with
> >> existing protocols or standards?
> >>
> >> Randall
> >>
> >
> >
Received on Tuesday, 9 August 2011 09:56:33 UTC