HTTP Headers: Parsing and Normalizing Case for moki

Are you sitting comfortably? Then I'll begin. [1]

1. Structure of HTTP Header Field Values

It seems to be left to the creativity of the inventor of the header.

There are some patterns. 

a. The value is a URI
b. The value is a Date
c. The value is an ETag or set of ETags, separated by spaces
d. The value is a space separated list of product/comments
e. As follows which seems to be derived from MIME, ignoring white space
for a moment (based on similar in HTTP in RDF):

message-header  = field-name ":" [ field-value ]
field-value     = [ header-element ] *( "," [ header-element ] )
header-element  = element-name [ "=" [ element-value ] ] *( ";" [ param
] )
param           = param-name [ "=" [ param-value ] ]
param-value     = (token | quoted-string)

i.e. 

A field value is
  an element optionally followed by comma separated elements. 

An element is:
  a name                     e.g. Cache-Control: nocache
  - optionally followed by a value
                             e.g. Cache-Control: post-check=0
  - optionally followed by any number of ; separated parameters

A parameter is
  A name                     
  - optionally followed by a value 
                             e.g. Accept: text/plain;q=0.1

2. Normalization Rules:

a. In all cases normalize the HTTP Header Field name to lower-case.
[That could be Camel-Case to match the style of the spec)

b. for case 1 e. above

i.   normalize element and parameter values (following =) unless it is
quoted
ii.  remove the quotes on quoted strings.
iii. un-escape escaped characters e.g. \" in quoted strings

c. for uris canonicalize ???

3. Parsing Rules

p -- parse according to the pattern in 1.e.
px -- parse as authentication request
py -- parse as authorization credentials
uri -- a URI do not parse 
E[*] -- ETag [sequence], treat as a comma separated list, do not parse
values
date -- do not parse ?? parse as an HTTP date and turn into a W3C date??
ppc -- parse as a space separated list of HTTP product and HTTP comment
values
ps - parse as a space separated list.

4. HTTP Headers and their Parsing and normalization rules

HTTP defined Headers:

Accept: p,n
Accept-Charset: p,n
Accept-Encoding: p,n
Accept-Language: p,n
Accept-Ranges: p,n
Age: p,n
Allow: p
Authorization: py
Cache-Control: p,n
Connection: p,n
Content-Encoding: p,n
Content-Language: p,n
Content-Length: p,n
Content-Location: uri
Content-MD5: -
Content-Range: p,n
Content-Type: p, n
Date: date
ETag: E
Expect: p, n
Expires: date
From: email
Host: p, n
If-Match: E
If-Modified-Since: date
If-None-Match: E*
If-Range: E/date
If-Unmodified-Since: date
Last-Modified: date
Location: uri
Max-Forwards: p
Pragma: p, n
Proxy-Authenticate: px
Proxy-Authorization: py
Range: p, n
Referer: uri
Retry-After: date
Server: ppc             
TE: p, n
Trailer: p, n
Transfer-Encoding: p, n
Upgrade: p
User-Agent: pc
Vary: p, n
Via: ppc
Warning: ps
WWW-Authenticate:  px

Other Headers

HTTP in RDF [2] lists headers found in [RFC 4229] as follows. We could
define parseing rules for them or we could leave them unparsed and
unnormalized. We don't actually used any of their values, I think. In
any case, headers that are not recognized should be unparsed and
unnormalised other than as specified by HTTP for leading and trailing
white space.

    * accept-additions representing an Accept-Additions header (defined
in [RFC 2324]),
    * accept-features representing an Accept-Features header (defined in
[RFC 2295]),
    * alternates representing an Alternates header (defined in [RFC
2295]),
    * authentication-info representing an Authentication-Info header
(defined in [RFC 2617]),
    * a-im representing an A-IM header (defined in [RFC 3229]),
    * compliance representing a Compliance header (defined in [OPTIONS
messages]),
    * content-base representing a Content-Base header (defined in [RFC
2068]),
    * content-disposition representing a Content-Disposition header
(defined in [RFC 2183]),
    * content-id representing a Content-ID header (defined in [DRP]),
    * content-script-type representing a Content-Script-Type header
(defined in [HTML4]),
    * content-style-type representing a Content-Style-Type header
(defined in [HTML4]),
    * content-transfer-encoding representing a Content-Transfer-Encoding
header (defined in [ObjectHeaders]),
    * content-version representing a Content-Version header (defined in
[RFC 2068]),
    * cookie representing an Cookie header (defined in [RFC 2965]),
    * cookie2 representing an Cookie2 header (defined in [RFC 2965]),
    * cost representing a Cost header (defined in [ObjectHeaders]),
    * c-ext representing a C-Ext header (defined in [RFC 2774]),
    * c-man representing a C-Man header (defined in [RFC 2774]),
    * c-opt representing a C-Opt header (defined in [RFC 2774]),
    * c-pep representing a C-PEP header (defined in [PEP]),
    * c-pep-info representing a C-PEP-Info header (defined in [PEP]),
    * dav representing a DAV header (defined in [RFC 2518]),
    * default-style representing a Default-Style header (defined in
[HTML4]),
    * delta-base representing a Delta-Base header (defined in [RFC
3229]),
    * depth representing a Depth header (defined in [RFC 2518]),
    * derived-from representing a Derived-From header (defined in [RFC
2068]),
    * destination representing a Destination header (defined in [RFC
2518]),
    * differential-id representing a Differential-ID header (defined in
[DRP]),
    * digest representing a Digest header (defined in [RFC 3230]),
    * ext representing an Ext header (defined in [RFC 2774]),
    * getprofile representing a GetProfile header (defined in
[Ops-OverHTTP]),
    * if representing an If header (defined in [RFC 2518]),
    * im representing an IM header (defined in [RFC 3229]),
    * label representing a Label header (defined in [RFC 3253]),
    * link representing a Link header (defined in [RFC 2068]),
    * lock-token representing a Lock-Token header (defined in [RFC
2518]),
    * man representing a Man header (defined in [RFC 2774]),
    * message-id representing a Message-ID header (defined in
[ObjectHeaders]),
    * meter representing a Meter header (defined in [RFC 2227]),
    * negotiate representing an Negotiate header (defined in [RFC
2295]),
    * non-compliance representing a Non-Compliance header (defined in
[OPTIONS messages]),
    * opt representing an Opt header (defined in [RFC 2774]),
    * optional representing an Optional header (defined in [WIRE]),
    * ordering-type representing an Ordering-Type header (defined in
[RFC 3648]),
    * overwrite representing an Overwrite header (defined in [RFC
2518]),
    * p3p representing a P3P header (defined in [P3P]),
    * pep representing a PEP header (defined in [PEP]),
    * pep-info representing a PEP-Info header (defined in [PEP]),
    * pics-label representing a PICS-Label header (defined in
[PICSLabels]),
    * position representing a Position header (defined in [RFC 3648]),
    * profileobject representing a ProfileObject header (defined in
[Ops-OverHTTP]),
    * protocol representing a Protocol header (defined in [PICSLabels]),
    * protocol-info representing a Protocol-Info header (defined in
[JEPI]),
    * protocol-query representing a Protocol-Query header (defined in
[JEPI]),
    * protocol-request representing a Protocol-Request header (defined
in [PICSLabels]),
    * proxy-authentication-info representing a Proxy-Authentication-Info
header (defined in [RFC 2617]),
    * proxy-features representing a Proxy-Features header (defined in
[Proxy Notification]),
    * proxy-instruction representing a Proxy-Instruction header (defined
in [Proxy Notification]),
    * public representing a Public header (defined in [RFC 2068]),
    * refresh representing a Refresh header (defined in [EDD]),
    * resolution-hint representing a Resolution-Hint header (defined in
[WIRE]),
    * resolver-location representing a Resolver-Location header (defined
in [WIRE]),
    * safe representing a Safe header (defined in [RFC 2310]),
    * security-scheme representing a Security-Scheme header (defined in
[RFC 2660]),
    * setprofile representing a SetProfile header (defined in
[Ops-OverHTTP]),
    * set-cookie representing a Set-Cookie header (defined in [RFC
2109]),
    * set-cookie2 representing a Set-Cookie2 header (defined in [RFC
2965]),
    * soapaction representing a SoapAction header (defined in
[SOAP1.1]),
    * status-uri representing a Status-URI header (defined in [RFC
2518]),
    * subok representing a SubOK header (defined in [DupSup]),
    * subst representing a Subst header (defined in [DupSup]),
    * surrogate-capability representing a Surrogate-Capability header
(defined in [EdgeArch]),
    * surrogate-control representing a Surrogate-Control header (defined
in [EdgeArch]),
    * tcn representing a TCN header (defined in [RFC 2295]),
    * timeout representing a Timeout header (defined in [RFC 2518]), and
    * title representing a Title header (defined in [ObjectHeaders]),
    * ua-color representing a UA-Color header (defined in [UA
Attributes]),
    * ua-media representing a UA-Media header (defined in [UA
Attributes]),
    * ua-pixels representing a UA-Pixels header (defined in [UA
Attributes]),
    * ua-resolution representing a UA-Resolution header (defined in [UA
Attributes]),
    * ua-windowpixels representing a UA-Windowpixels header (defined in
[UA Attributes]), and
    * uri representing a URI header (defined in [RFC 2068]).
    * variant-vary representing a Variant-Vary header (defined in [RFC
2295]), and
    * version representing a Version header (defined in
[ObjectHeaders]).
    * want-digest representing a Want-Digest header (defined in [RFC
3230]).

Jo

[1] http://www.turnipnet.com/radio/lwm.wav
[2] http://www.w3.org/TR/HTTP-in-RDF/

Received on Wednesday, 23 May 2007 18:46:28 UTC