RE: I-D ACTION:draft-nottingham-http-link-header-01.txt

[CC'd to ietf-http-wg. I think it is better to continue the conversation
over there as most of my comments are not specific to Atom.]

Mark Nottingham wrote:
> The draft does not advocate removing links from Atom 
> documents to put them in headers; rather, the common
> use case is repeating them in headers, so that they
> can be easily discovered and processed.

With HTML, I've never seen the Link header used that way; it has always been
used to add new links to the document (usually style sheets that vary
depending on the UA).

When processing Atom documents, we are usually more interested in
atom:title, atom:content/@src (if any), and atom:content/@type. In AtomPub
we also often want atom:link/@rel='edit' and atom:link/@rel='edit-media'.
Since the Link header can only store the atom:link elements, we are almost
always going to have to parse the document anyway. If parsing Atom (or HTML)
is problematic for the software application then the application shouldn't
have chosen to store everything in Atom (or HTML) documents. :)

> > For all those reasons, I actually think it makes a lot more 
> > sense for the Link header registry to be mutually exclusive
> > with the HTML and Atom registry, instead of attempting to
> > merge them all together.
> 
> You're the first person to suggest that. I think we can get 
> to a place where there's alignment between the specs without
> abusing the semantics of existing relations. It's certainly
> worth trying...

It seems like a lot of effort just to (re-)define all the link relations in
a format-agnostic way without being overly vague. It is probably even more
work to convince everybody (especially the HTML WG) to agree to the result.
I think it would be nice if the same link relation identifiers meant the
same thing in Atom as they do in HTML. However, for most of the existing
registrations, I don't see the advantage to also making them available in
the HTTP message header. 

Last Friday I implemented support for the Link header in a simple AtomPub
application. Now I will take an even stronger stance: its use should not be
encouraged at all. It is much simpler to process hyperlinks that use the
"Relation: URI" syntax like Location and Content-Location than it is to
process hyperlinks that use the Link header. For example, writing Python
middleware or Apache mod_rewrite/mod_headers rules to filter/add/remove
links is much harder using the Link header than when using the
Location-header approach:

1. There is too much flexibility in the syntax of the "rel" parameter. For
example, the following all mean the same thing:
      rel=edit
      rel="edit"
      rel="\e\d\i\t"
      rel="http://www.iana.org/assignments/link-relations.html#edit"
      ....
If you want to be able to catch all variations, then you have to write a
pretty nasty regular expression.

2. The Link header mixes unrelated information into the same header field.
Consequently, in order to process specific types of links, you have to parse
the Link header field into parts, process the parts that you are interested
in, and put it all back together.

3. The "rev" mechanism makes processing unnecessarily difficult. You have to
be careful to note whenever rev=A means the same thing of rel=B when you are
attempting to process the header.

I think a better alternative to a single "Link" header is to define a
standard for multiple Link-like headers:

[Relation]-Links: #(URI-Reference LWS *(; param=value LWS))

For example, an "edit" link would be:

Edit-Links: http://foo.org

This could be done by changing the registration rules for HTTP headers so
that header fields with a "-Links" suffix must have the above syntax, with
the definitions of the "media", "type", and "title" parameters to be the
fixed to be the same as in HTML 4 (or 5) and Atom 1.0. Each link header
would have to define the processing rules for when multiple links are
provided, and applications must be prepared to handle multiple links of the
same type, even when they are not expected (that is why I chose "-Links"
instead of "-Link").

Try to write a mod_headers rule or Python WSGI middleware that filters out
all the links with a particular type. Using the "-Links" header syntax, it
is just "del environ[HTTP_RELATION_NAME_LINKS]" in Python and "unset
RELATION-NAME-LINKS" in mod_headers. The Link header version requires a some
tricky parsing in Python. I think it is actually impossible to process the
Link header correctly using Apache's mod_headers.

I think the "-Links" header idea allows for uniform syntax (like the Link
header) while still being extremely easy to process.

Thoughts?

- Brian

Received on Monday, 28 April 2008 03:40:14 UTC