Content type for /site-meta (or HTTP header fragment format)

Context

The /site-meta proposal (a known-location solution for site metadata) [1]
includes a simple XML format for representing site metadata directly or via
links. In discussing the proposal and the appropriate format for the list of
meta resources, John Panzer suggested using a simpler text format [2]
directly based on the content of the Link header [3].

While I see the value of an XML format for this data, and was the main
supported of it, I now strongly support the idea of using a super-simple
text-based document. Partially because it fits better with the current
use-cases, and partially because I am an editor of a "competing" XML format
which covers this use case (XRDS/XRD) but is too complex to be positioned as
the default form.

I would like /site-meta to list a single text-based format with a clear
Content-type associated with it. I also want the spec to explicitly allow
user-agents to request other representations of the /site-meta resource with
the default being the super-simple-text-based version. One such
representation (I expect to be widely supported) will be
application/xrd+xml.


Some Questions (and answers)

- Should the /site-meta text format be restricted to a set of links or
provide an easy path for extensions of some other kinds of records?

While I can't come come up with compelling use cases for /site-meta to
directly include other metadata, it is likely someone else will in the
future. By replacing each record in John's proposal:

---
/robots.txt rel="robots"
/p3p.xml rel="privacy"
http://other.example.net/example rel="http://example.com/rel"
---

with actual Link headers:

---
Link: </robots.txt>; rel="robots"
Link: </p3p.xml>; rel="privacy"
Link: <http://other.example.net/example>; rel="http://example.com/rel"
---

other record types can be added in the future. This also means the same code
used to read Link headers (or HTTP headers in general) can be used for this
format. This also plays nicely with the idea of equating links in /site-meta
to Links in individual resources' HTTP response headers.

- Should /site-meta define its own content type, use an existing content
type, or define a new generic content type?

If we take the route of using an HTTP-header-like format for /site-meta, is
there value in making this format generally available for other resources.
RFC 2616 offers a similar construct in the form of message/http. It seems
that as long as the document can be considered a valid HTTP request or
response, we can use this content type.

So /site-meta can be considered a body-less HTTP response with Link headers.
The question is, is such a header-fragment allowed in a message/http
document? It is not clear if in this use-case, the Date header may be
omitted, which is otherwise required for a valid response header. The Date
header makes little sense in this context and should be omitted. Note that
the HTTP header for GET /site-meta must still include Date.


In Conclusion

1. The idea of allowing multiple representations for /site-meta resources
suggests the use of a more generic content type for the default (and the
only required) representation than application/site-meta.

2. There is value in using a single mechanism for metadata discovery, either
for an individual resource (via HTTP Link header or HTML/ATOM Link element)
and for a domain authority (via /site-meta list of links). Using the exact
same semantics between HTTP Link and /site-meta links seems productive.

3. Preparing for some unknown need for extending /site-meta while not
increasing complexity (assuming Link header structured is simple enough)
seems like a good idea.


Action Items

* Change /site-meta draft to use the Link header format instead of the
current XML proposal.
* If allowed, use message/http as the default content type for /site-meta.
If not, register a new content type, preferably something like
application/http-header-fragment, or just application/site-meta.
* Clarify that the content of /site-meta does not describe any actual
resource or URI, but the abstract concept of 'web site' or 'domain
authority', expressed as an HTTP header. In practice, it is still just a
registry for resource locations to avoid more known-location solutions.

Thoughts?

EHL


[1] http://tools.ietf.org/html/draft-nottingham-site-meta-00
[2]
http://www.abstractioneer.org/2008/11/one-site-meta-to-rule-them-all.html
[3] http://tools.ietf.org/html/draft-nottingham-http-link-header-02

Received on Friday, 28 November 2008 22:32:53 UTC