/site-meta Support for URI-mapped Links

A Bit of Background

My involvement in the /site-meta proposal came directly out of my work on
web discovery [1]. This goal is to provide a uniform and easily
implementable method for locating resource descriptors.

With the development of interoperability specifications comes the need to
enable compliant services and resources to declare their conformance to
these specifications. There is a growing need to describe resources in a way
that does not depend on their internal structure (i.e. HTML, ATOM, XML,
etc.) or even the availability of an HTTP-accessible representation of these
resources served by an HTTP server).

XRDS/XRD and POWDER are two descriptor formats, and OAuth, OpenID,
OpenSocial, PortableContacts are some specifications with such discovery
needs.

---

After a long review [1] of the available methods for attaching descriptors
to resources, three methods were identified as part of the solution:

1. <Link> Element in HTML, XHTML, ATOM, and other XML-based schemas with
support for an equivalent element. Solves HTML/ATOM regardless of transport
(which is still likely to be HTTP).

2. HTTP Link response-header [2]. Solves HTTP URIs with an HTTP
representation available (if the resource is not HTTP-resolvable, it doesn't
matter that the URI uses the HTTP scheme - there is no place to put the
header).

3. Template-based mapping from the resource URI to the descriptor URI.
Applicable to any URI scheme - basically the catch-all solution.

In order to implement the URI-mapping method, we need to define (1) a
vocabulary to deconstruct the resource URI, (2) a template format to
construct the descriptor URI, (3) a filtering method to apply templates to
specific URI schemes, and (4) a place to store these maps.


Main Questions

1. Is /site-meta the correct location for storing such maps?

* The main reason for the existence of these maps is that they describe the
location (links) of resources without access to HTTP header (404 or non-http
URIs), or where performance prohibit the inclusion of Link headers
(http://yahoo.com<https://mail.ex1.secureserver.net/owa/redir.aspx?C=cc75fe81921d457583420bebbbebfc13&URL=http%3a%2f%2fyahoo.com>,
http://google.com<https://mail.ex1.secureserver.net/owa/redir.aspx?C=cc75fe81921d457583420bebbbebfc13&URL=http%3a%2f%2fgoogle.com>
).

* The maps define a site-wide link which is applied at an individual
resource level, but is clearly related to the 'web site' as a whole.

* The map can be viewed as a Link relationship where the 'href' attribute is
replaced with a 'template' attribute which is used to constructed a Link for
any given URI within that authority space (which is described by the
/site-meta document).

* If /site-meta is note the right place, it calls for /site-meta to link to
another document which will include these maps. This of course require the
definition of a new rel type and a new content type for the document listing
the maps.

* DNS has been suggested as an alternative location. A DNS approach suffers
mainly form complexity and security (lack of DNS-SEC deployment).

2. If /site-meta is simplified to a text format with a list of Link headers
(without the 'Link: ' prefix), how can such URI maps be included?

* The original (unpublished) proposal included a new XML element next to
<meta> called <resource-map> which will replace the 'href' attribute with
'template' attribute where the map is stored.

* With the new proposed format [3] for /site-meta, there is no longer an
'href' attribute. Instead, the linked URI is listed as the first record in
each line. Since there is not URI defined for maps, it is not clear how the
new format can accommodate this in a clean way (start with a space? Include
only key="value"; pairs?).

3. How rich should the common vocabulary for deconstructing and
reconstructing the resource URI into the descriptor URI be?

* The simplest proposal includes a single 'uri' variable which can be used
in a template with a prefix/suffix combination:

    http://meta.example.net?resource={uri}<https://mail.ex1.secureserver.net/owa/redir.aspx?C=cc75fe81921d457583420bebbbebfc13&URL=http%3a%2f%2fmeta.example.net%3fresource%3d%7buri%7d>

* A richer proposal includes 'uri' and all the terms listed in RFC 3986 [4]:

         foo://example.com:8042/over/there?name=ferret#nose
         \_/   \______________/\_________/ \_________/ \__/
          |           |            |            |        |
       scheme     authority       path        query   fragment

where authority can be further broken down into 'domain' and 'port'.

* An even richer approach includes custom support for mailto URIs:

    scheme:user@domain

* Going nuts can include breaking the query into a list of key/value pairs,
breaking the path into a list of path components ('/' separated), etc. There
seems to be agreement that this is going too far.

4. Which URI Templates format to use?

* Current IETF I-D draft [5] for URI templates has expired with very little
community support or indication of changes from the authors.

* A simpler (but still complex) straw man proposal [6] was posted to the URI
list by R. Fielding but has not been submitted as a draft at this point.

* For the majority of use cases using the above vocabulary (list variables
excluded - i.e. No going nuts), all that is needed is a simple variable
substitution mechanism (using {}) and some way to indicate whether a
variable should or should not be percent encoded. A simple format can look
like this:

    Resource URI:
http://example.com/resource<https://mail.ex1.secureserver.net/owa/redir.aspx?C=cc75fe81921d457583420bebbbebfc13&URL=http%3a%2f%2fexample.com%2fresource>
    Template: http://meta.{domain}/map?{%uri}<https://mail.ex1.secureserver.net/owa/UrlBlockedError.aspx>
    Descriptor URI:
http://meta.example.com/map?http%3A%2F%2Fexample.net%2Fresource<https://mail.ex1.secureserver.net/owa/redir.aspx?C=cc75fe81921d457583420bebbbebfc13&URL=http%3a%2f%2fmeta.example.com%2fmap%3fhttp%253A%252F%252Fexample.net%252Fresource>

Variables prefixed with '%' are percent-encoded before inserted into the
template. The other proposals posted defines that all variables are percent
encoded by default and left unchanged if prefixed by '+'. There is no real
difference between the two approached.

* An even simpler (but ugly) approach to the above example is define an
encoded and unecoded variable for each term (i.e. 'uri' and 'encoded-uri').

5. To what level should the map be filtered or assigned to a subset of URIs
under the same authority?

* There are generally two kinds of filters that can be applied to URI maps.
One is based on the URI scheme and is required for proper support for
mapping both http and mailto URIs (an important use-case in the identity
world, such as OpenID). This is due to the different vocabulary used for
each scheme.

* The other filter is based on the value of the path and potentially query
parts. This seems as an overkill for the currently known use-cases. The
assumption is that if there is such a need to provide descriptors to such
complex use cases, these resources should either be separated into sub
domains (each with a separate /site-meta document), or all maps should use
just the 'uri' variable and use a CGI script to redirect to the proper
descriptor.


Proposal

Include direct support for URI-maps in /site-meta. Use a Link record without
the URI (basically a record with only key="value"; pairs) with two
additional parameters: 'template' (to hold the URI template) and 'scheme'
(to hold the optional scheme filter).

The URI template will use a simple {} variable substitution with '+' prefix
indicating that no percent encoding should be performed. The proposed
vocabulary includes: uri, scheme, authority, domain, port, path, query,
fragment, and username (for mailto URIs).


EHL

[1] http://www.hueniverse.com/hueniverse/2008/09/discovery-and-h.html<https://mail.ex1.secureserver.net/owa/redir.aspx?C=cc75fe81921d457583420bebbbebfc13&URL=http%3a%2f%2fwww.hueniverse.com%2fhueniverse%2f2008%2f09%2fdiscovery-and-h.html>
[2] http://tools.ietf.org/html/draft-nottingham-http-link-header-02<https://mail.ex1.secureserver.net/owa/redir.aspx?C=cc75fe81921d457583420bebbbebfc13&URL=http%3a%2f%2ftools.ietf.org%2fhtml%2fdraft-nottingham-http-link-header-02>
[3] http://lists.w3.org/Archives/Public/www-talk/2008NovDec/0002.html<https://mail.ex1.secureserver.net/owa/redir.aspx?C=cc75fe81921d457583420bebbbebfc13&URL=http%3a%2f%2flists.w3.org%2fArchives%2fPublic%2fwww-talk%2f2008NovDec%2f0002.html>
[4] http://tools.ietf.org/html/rfc3986#section-3<https://mail.ex1.secureserver.net/owa/redir.aspx?C=cc75fe81921d457583420bebbbebfc13&URL=http%3a%2f%2ftools.ietf.org%2fhtml%2frfc3986%23section-3>
[5] http://tools.ietf.org/html/draft-gregorio-uritemplate-03<https://mail.ex1.secureserver.net/owa/redir.aspx?C=cc75fe81921d457583420bebbbebfc13&URL=http%3a%2f%2ftools.ietf.org%2fhtml%2fdraft-gregorio-uritemplate-03>
[6] http://lists.w3.org/Archives/Public/uri/2008Sep/0007.html<https://mail.ex1.secureserver.net/owa/redir.aspx?C=cc75fe81921d457583420bebbbebfc13&URL=http%3a%2f%2flists.w3.org%2fArchives%2fPublic%2furi%2f2008Sep%2f0007.html>

Received on Sunday, 30 November 2008 01:36:58 UTC