"Web Address processing" (ABNF, processing proposal)

Hi,

I was looking at 
<http://tools.ietf.org/html/draft-ietf-iri-3987bis-00#section-7.2>, 
starting with the ABNF:

      href-ucschar  = " " / "<" / ">" / '"' / "{" / "}" / "|"
                       / "\" / "^" / "`" / %x0-1F / %x7F-D7FF
                       / %xE000-FFFD / %x10000-10FFFF
      href-pct-form = pct-encoded | "%"
      href-path-sep = "/" | "\"
      href-strip    =

Nits:

- it mixes RFC2616- and RFC5234-style ABNF ("|" vs "/")

- '"' doesn't work in RFC 5234 syntax, it needs to be the character 
code, or DQUOTE

- href-strip is undefined: it's not clear to me that it's actually going 
to be used (more below)

If we adopt the RFC 5234 predefined rules, href-ucschar can be rewritten as:

  CTL / SP / DQUOTE / "<" / ">" / "\" / "^" / "`" / "{" / "|" / "}" / 
%x80-D7FF / %xE000-FFFD / %x100000-10FFFF

...we might even want to name the production for

  %x80-D7FF / %xE000-FFFD / %x100000-10FFFF

globally.


Moving away from editorial issues:

I'd really like to discuss whether we can collapse more of LEIRI and 
HREF into a single definition.

- the ABNFs do not look different (yet)

- preprocessing (dropping leading and trailing whitespace) IMHO doesn't 
need to part of the definition of the protocol element

- preprocessing (stripping certain characters): is this really needed? 
Not convinced about that.

This would leave us with:

- special handling of non-ASCII characters in the query part

...which should me manageable.

Best regards, Julian

Received on Friday, 12 February 2010 12:52:15 UTC